Categorygithub.com/lmullen/chronam-ocr-debatcher
repositorypackage
0.0.1
Repository: https://github.com/lmullen/chronam-ocr-debatcher.git
Documentation: pkg.go.dev

# README

Build Status

Chronicling America OCR debatcher

This program takes paths to .tar.bz2 batches of OCR files from the Chronicling America bulk data downloads. It converts each batch into a CSV file, which you can load into a database or do whatever you like with. It will process the batches concurrently.

Usage:

./chronam-ocr-debatcher [--processes=8] <path/to/a/batch.tar.bz2 ...>

You can download binaries from the releases page.