This tool uses HTTP GET only, and does not clear scrolls (which would probably use DELETE) so this tool works with read-only servers, that only allow GET.

Install

$ go install github.com/miku/esdump/cmd/esdump@latest

Or via a release.

Usage

esdump uses the elasticsearch scroll API to stream documents to stdout. First
written to extract samples from https://search.fatcat.wiki (a scholarly
communications preservation and discovery project).

    $ esdump -s https://search.fatcat.wiki -i fatcat_release -q 'web archiving'

Usage of esdump:
  -i string
        index name (default "fatcat_release")
  -ids string
        a path to a file with one id per line to fetch
  -l int
        limit number of documents fetched, zero means no limit
  -mq string
        path to file, one lucene query per line
  -q string
        lucene syntax query to run, example: 'affiliation:"alberta"' (default "*")
  -s string
        elasticsearch server (default "https://search.fatcat.wiki")
  -scroll string
        context timeout (default "5m")
  -size int
        batch size (default 1000)
  -v    show version
  -verbose
        be verbose

Performance data points

925636 docs in 4m47.460217252s (3220 docs/s)