<p>Currently no one has backed up github.com (aside from Github). This webpage is about progress toward that. If you have 150-200TB of disk space and really good internet, please <a href="https://za3k.com">contact me</a> about getting a copy of github.</p>
<h3>List of Respositories</h3>
-I host some metadata about github's repositories. This includes a lot of basic data about the repository, but NOT the issues, any wiki, downloads, or the git repository. As of Nov 2015, github has 28 million repositories.
+I host some metadata about github's repositories. This includes a lot of basic data about the repository, but NOT the issues, any wiki, downloads, or the git repository. As of Nov 2015, github has 28.3 million repositories.
<ul>
- <li><p>Full repository metadata is available in JSON format. The format is explained on the <a href="https://developer.github.com/v3/repos/#list-all-public-repositories">github API</a>.</p>
+ <li>You can grab greatly abbreviated metadata (recommended) as <a href="https://za3k.com/github/repos.json.gz">JSON</a>. This includes the repository name and URL, whether it is a fork (and what of), and a short description.</li>
+ <li>You can get a txt file of just the repo names: <a href="https://za3k.com/github/repos.txt.gz">txt</a> (676M uncompressed, 332M compressed).</li>
+ <li><p>Finally, full repository metadata is available in JSON format. The format is explained on the <a href="https://developer.github.com/v3/repos/#list-all-public-repositories">github API</a>.</p>
<p>The files are available in batches of 10,000 at <pre>http://za3k.com/github/repos-<X>0000-<X>9999.json
http://za3k.com/github/repos-<X>0000-<X>9999.json.gz</pre>
To download all files, run <pre>
- for x in {0..4700}; do \
+ for x in {0..5000}; do \
echo "https://za3k.com/github/repos-${x}0000-${x}9999.json.gz"; \
done | wget -N -i -
</pre>
- The combined size of these files is <b>15G compressed</b>, 168G uncompressed. Files are grouped by github's internal id; since some repositories are deleted or privated, each file contains less than 10,000 repositories.
+ The combined size of these files is <b>9.7G compressed</b>, 115G uncompressed. Files are grouped by github's internal id; since some repositories are deleted or privated, each file contains less than 10,000 repositories.
</li>
- <li>You can grab greatly abbreviated metadata (recommended) as <a href="https://za3k.com/github/repos.json">JSON</a>. This includes the repository name and URL, a short description, whether it is a fork (and what of), and the approximate size of the repository.</li>
- <li>Finally, you can get a txt file of just the repo names: <a href="https://za3k.com/github/repos.txt">txt</a>.</li>
<li>This data was downloaded using a <a href="https://github.com/za3k/github-backup">custom tool</a> I wrote. My tool gets the data from the github API v3, with as little modification as possible.</li>
</ul>