# README

doclib

doclib implements the bleve + unidoc interfaces

# Functions

BlevePdfFromHIPDs creates a BlevePdf from its seralized form `hipds`.
CreateExtractList returns an empty *ExtractList with `maxPages` maximum number of pages and `maxPerPage` maximum rectangles per page.
CreatePDFPageProcessorFile creates a PDFPageProcessor for reading the PDF file `inPath`.
CreatePDFPageProcessorReader creates a PDFPageProcessor for reading the PDF file referenced by `rs`.
ExportBleveMem serializes bleve index `index` to a byte slice.
ExtractPageTextMarks returns the extracted text and corresponding TextMarks on page `page`.
ImportBleveMem deserializes `data` to a bleve.Index.
IndexPdfFilesOrReaders returns a BlevePdf and a bleve.Index over the PDF contents referenced by the io.ReaderSeeker's in `rsList` if `rsList` is not empty, or the PDF filenames in `pathList` if `rsList` is not empty.
IndexPdfFilesUsingReaders creates a bleve+BlevePdf index for `pathList`.
PagePositionsFromTextMarks converts extractor.TextMarkArray `textMarks` to a more compact PagePositions.
PageSizePt returns the width and height of `page` in points.
PdfOpenFile opens PDF file `inPath` and attempts to handle null encryption schemes.
PdfOpenFile opens PDF file `inPath` lazily and attempts to handle null encryption schemes.
PdfOpenReader opens the PDF file accessed by `rs` and attempts to handle null encryption schemes.
ProcessPDFPagesFile runs `processPage` on every page in PDF file `inPath`.
ProcessPDFPagesReader runs `processPage` on every page in PDF file opened in `rs`.
SearchPersistentPdfIndex performs a bleve search on the persistent index in `persistDir`/bleve for `term` and returns up to `maxResults` matches.

# Constants

BorderWidth is the width of rectangle sides in points.
ShadowWidth is the with of the shadow on the inside and outside of the rectangles.

# Variables

CheckConsistency should be set true to regularly check the BlevePdf consistency.
Debug can be set true to enable debug level logging.
ErrNoMatch indicates there was no match for a bleve hit.
ErrNoMatch indicates there was no match for a bleve hit.
ExposeErrors can be set to true to not recover from errors in library functions.
Trace can be set true to enable debug level logging.

# Structs

BlevePdf links a bleve index over texts to the PDF files that the texts were extracted from, using the hashDoc {file hash: DocPositions} map.
DocPageText contains doc:page indexes, the PDF page number and the text extracted from a PDF page.
DocPositions is used to the link per-document data in a bleve index to the PDF file that the data was extracted from.
ExtractList is a list of PDF file:page inputs that are to be marked up then combined in a specificed order.
IDText is what bleve sees for each page of a PDF file.
PagePositions is used to link per-document data in a bleve index to the PDF file the data was extracted from.
PdfMatchSet is the result of a search over a PdfIndex.
PdfPageMatch describes the search results for a PDF page returned from a search over a PDF index.
PDFPageProcessor is used for processing a PDF file one page at a time.
No description provided by the author
Span gives the offsets in extracted text that span a phrase.