From: Zachary Vance Date: Fri, 20 Nov 2015 06:58:14 +0000 (-0800) Subject: Clarify size X-Git-Url: https://git.za3k.com/?a=commitdiff_plain;h=4d3be49039dc5584e3428c461c0787154d6649d9;p=za3k.git Clarify size --- diff --git a/github.html b/github.html index b0d77cd..85c473a 100644 --- a/github.html +++ b/github.html @@ -15,7 +15,7 @@ for x in {0..100}; do \ wget "http://za3k.com/github/repos-$((x*10000))-$(((x+1)*10000)).json.gz"; \ done - These files are around 10G compressed, 100G uncompressed. Files are grouped by github's internal id; since some repositories are deleted or privated, each file contains less than 10,000 repositories. + The combined size of these files is 10G compressed, 100G uncompressed. Files are grouped by github's internal id; since some repositories are deleted or privated, each file contains less than 10,000 repositories.
  • You can grab greatly abbreviated metadata (recommended) as JSON. This includes the repository name and URL, a short description, whether it is a fork (and what of), and the approximate size of the repository.
  • Finally, you can get a txt file of just the repo names: txt.