# README
GitHub Enumeration Tool
This tool is used to reliably enumerate projects on GitHub.
The output of this tool is can be used as an input for the criticality_score
tool, or for input for the collect_signals
worker.
Example
$ export GITHUB_TOKEN=ghp_x # Personal Access Token Goes Here
$ enumerate_github \
-start 2008-01-01 \
-min-stars=10 \
-workers=1 \
-out=github_projects.txt
Install
$ go install github.com/ossf/criticality_score/cmd/enumerate_github
Usage
$ enumerate_github [FLAGS]...
The URL for each repository is written to the output. By default stdout
is used
for output.
FLAGS
are optional. See below for documentation.
Authentication
A comma delimited environment variable with one or more GitHub Personal Access Tokens must be set
Supported environment variables are GITHUB_AUTH_TOKEN
, GITHUB_TOKEN
,
GH_TOKEN
, or GH_AUTH_TOKEN
.
Example:
$ export GITHUB_TOKEN=ghp_abc,ghp_123
Flags
Output flags
-out FILE
specify theFILE
to use for output. By defaultstdout
is used.-append
appends output toFILE
if it already exists.-force
overwritesFILE
if it already exists and-append
is not set.-format {text|scorecard}
indicates the format to use for output.text
is used by default and consists of one URL per line.scorecard
outputs a CSV file compatible with the scorecard project.
If FILE
exists and neither -append
nor -force
is set the command will fail.
Date flags
-start date
the start date to enumerate back to. Must be at or after2008-01-01
. Defaults to2008-01-01
.-end date
the end date to enumerate from. Defaults to today's date.
Query/Star flags
-min-stars int
only enumerates repositories with this or more of stars Defaults to10
.-query string
sets the base query to use for enumeration. Defaults tois:public
. See GitHub's search help for more detail.-require-min-stars
abort execution if-min-stars
can't be reached during enumeration. If not set some repositories created on a certain date may not be included.-star-overlap int
the number of stars to overlap between queries. Defaults to5
. A an overlap is used to avoid missing repositories whose star count changes during enumeration.
Misc flags
-log level
set the level of logging. Can bedebug
,info
(default),warn
orerror
.-workers int
the total number of concurrent workers to use. Default is1
.-help
displays help text.
How It Works
Refer to Milestone 1 for details on the algorithm.
Q&A
Q: What is the lowest practical setting for -min-stars
10 has been successfully tested, although lower may be possible.
TODO -- more detail
Q: How long does it take?
A single GitHub Personal Access Token took about 4 hours to return all projects with >= 20 stars.
Faster performance can be achieved with more Personal Access Tokens and additional workers.
Q: How many workers should I use?
Generally, use 1 worker for each Personal Access Token.
More workers than tokens may result in secondary rate limits.
It is possible that more restricted searches will succeed with more workers per token.
Development
Rather than installing the binary, use go run
to run the command.
For example:
$ go run ./cmd/enumerate_github [FLAGS]...
Limiting the data allows for runs to be completed quickly. For example:
$ go run ./cmd/enumerate_github \
-log=debug \
-start=2022-06-14 \
-end=2022-06-21 \
-min-stars=20