From: Zachary Vance Date: Sat, 23 Apr 2016 18:50:53 +0000 (-0700) Subject: Add http://ghtorrent.org/ X-Git-Url: https://git.za3k.com/?a=commitdiff_plain;h=cae9b1d87285f0c66ee95f4ecaccbf8214ef9cde;p=za3k.git Add http://ghtorrent.org/ --- diff --git a/github.html b/github.html index 3841c69..74950e9 100644 --- a/github.html +++ b/github.html @@ -26,6 +26,7 @@ http://za3k.com/github/repos-<X>0000-<X>9999.json.gz

Github Timeline

The Events Timeline is emphemeral, and being successfully recorded by githubarchive.org. A second person running the same program in case of downtime would be a plus.

+

(New!) http://ghtorrent.org/ is downloading the same timeline, and also fetching fuller historical data.

Estimates on archiving repositories

I selected 1000 random repositories from the above list, removing 427 forks. I then checked out all repositories. The total size was 4.3G, with or without compression. It was around 3 GB for a shallow checkout. If we assume forks take no space, this means an average github repository takes up 4.3M. Omitting the largest repositories may improve this estimate, but I didn't run further tests.