# README
esdump
Stream docs from Elasticsearch to stdout for ad-hoc data mangling using the Scroll API. Just like solrdump, but for elasticsearch.
Libraries can use both GET and POST requests to issue scroll requests.
- elasticsearch-py uses POST
- esapi uses GET
This tool uses HTTP GET only, and does not clear scrolls (which would probably use DELETE) so this tool works with read-only servers, that only allow GET.
Install
$ go install github.com/miku/esdump/cmd/esdump@latest
Or via a release.
Usage
esdump uses the elasticsearch scroll API to stream documents to stdout. First
written to extract samples from https://search.fatcat.wiki (a scholarly
communications preservation and discovery project).
$ esdump -s https://search.fatcat.wiki -i fatcat_release -q 'web archiving'
Usage of esdump:
-i string
index name (default "fatcat_release")
-ids string
a path to a file with one id per line to fetch
-l int
limit number of documents fetched, zero means no limit
-mq string
path to file, one lucene query per line
-q string
lucene syntax query to run, example: 'affiliation:"alberta"' (default "*")
-s string
elasticsearch server (default "https://search.fatcat.wiki")
-scroll string
context timeout (default "5m")
-size int
batch size (default 1000)
-v show version
-verbose
be verbose
Performance data points
925636 docs in 4m47.460217252s (3220 docs/s)