From: Zachary Vance Date: Fri, 20 Nov 2015 06:52:14 +0000 (-0800) Subject: Add estimates X-Git-Url: https://git.za3k.com/?a=commitdiff_plain;h=8e943207f4368938ed3005c6d41b98af27f54878;p=za3k.git Add estimates --- diff --git a/github.html b/github.html index d54edd5..c9d39b6 100644 --- a/github.html +++ b/github.html @@ -1,9 +1,10 @@ -Github Archive +Github Backup -

Currently no one has archived github.com. This webpage is about progress toward that.

+

Currently no one has backed up github.com (aside from Github). This webpage is about progress toward that. If you have 150-200TB of disk space and really good internet, please contact me about getting a copy of github.

-I host the metadata for the repositories: +

List of Respositories

+I host some metadata about github's repositories. This includes a lot of basic data about the repository, but NOT the issues, any wiki, downloads, or the git repository: +

List of Gists

Metadata for gists is currently unavailable from github, but I'm working with them to make it public.

-Additional information: +

Github Timeline

+

The Events Timeline is emphemeral, and being successfully recorded by githubarchive.org. A second person running the same program in case of downtime would be a plus.

+ +

Estimates on archiving repositories

+

I selected 1000 random respoitories from the above list, removing 427 forks. I then checked out all repositories. The total size was 4.3G, with or without compression. It was around 3 GB for a shallow checkout. If we assume forks take no space, this means an average github repository takes up 4.3M. Omitting the largest repositories may improve this estimate, but I didn't run further tests. I haven't checked, but the issue taken up by metadata like issues should be very small in comparison.

+

If there are 35,000,000 repositories on github at an average size of 4.3M each, that multiplies out to around 150TB data total.

+ +

Additional information