IndexPdfFilesOrReaders returns a BlevePdf and a bleve.Index over the PDF contents referenced by the io.ReaderSeeker's in `rsList` if `rsList` is not empty, or the PDF filenames in `pathList` if `rsList` is not empty.

IndexPdfFilesUsingReaders

IndexPdfFilesUsingReaders creates a bleve+BlevePdf index for `pathList`.

PagePositionsFromTextMarks

PagePositionsFromTextMarks converts extractor.TextMarkArray `textMarks` to a more compact PagePositions.

PageSizePt

PageSizePt returns the width and height of `page` in points.

PdfOpenFile

PdfOpenFile opens PDF file `inPath` and attempts to handle null encryption schemes.

PdfOpenFileLazy

PdfOpenFile opens PDF file `inPath` lazily and attempts to handle null encryption schemes.

PdfOpenReader

PdfOpenReader opens the PDF file accessed by `rs` and attempts to handle null encryption schemes.

ProcessPDFPagesFile

ProcessPDFPagesFile runs `processPage` on every page in PDF file `inPath`.

ProcessPDFPagesReader

ProcessPDFPagesReader runs `processPage` on every page in PDF file opened in `rs`.

SearchPersistentPdfIndex

SearchPersistentPdfIndex performs a bleve search on the persistent index in `persistDir`/bleve for `term` and returns up to `maxResults` matches.

# Constants

BorderWidth

BorderWidth is the width of rectangle sides in points.

ShadowWidth

ShadowWidth is the with of the shadow on the inside and outside of the rectangles.

# Variables

CheckConsistency

CheckConsistency should be set true to regularly check the BlevePdf consistency.

Debug

Debug can be set true to enable debug level logging.

ErrNoMatch

ErrNoMatch indicates there was no match for a bleve hit.

ErrNoPositions

ErrNoMatch indicates there was no match for a bleve hit.

ExposeErrors

ExposeErrors can be set to true to not recover from errors in library functions.

Trace

Trace can be set true to enable debug level logging.

# Structs

BlevePdf

BlevePdf links a bleve index over texts to the PDF files that the texts were extracted from, using the hashDoc {file hash: DocPositions} map.

DocPageText

DocPageText contains doc:page indexes, the PDF page number and the text extracted from a PDF page.

DocPositions

DocPositions is used to the link per-document data in a bleve index to the PDF file that the data was extracted from.

ExtractList

ExtractList is a list of PDF file:page inputs that are to be marked up then combined in a specificed order.

IDText

IDText is what bleve sees for each page of a PDF file.

PagePositions

PagePositions is used to link per-document data in a bleve index to the PDF file the data was extracted from.

PdfMatchSet

PdfMatchSet is the result of a search over a PdfIndex.

PdfPageMatch

PdfPageMatch describes the search results for a PDF page returned from a search over a PDF index.

PDFPageProcessor

PDFPageProcessor is used for processing a PDF file one page at a time.

Phrase

No description provided by the author

Span

Span gives the offsets in extracted text that span a phrase.

# README

doclib

# Functions

# Constants

# Variables

# Structs