Categorygithub.com/client9/plaintext
modulepackage
0.0.0-20180109203002-5bf47e7c0c45
Repository: https://github.com/client9/plaintext.git
Documentation: pkg.go.dev

# README

plaintext

Build Status Go Report Card GoDoc Coverage license

Extract human languages in plain UTF-8 text from computer code and markup

The output is (or should be) line-preserving, meaning, no new lines are added or subtracted.

<p>
foo
</p>

becomes


foo

# Packages

No description provided by the author

# Functions

Collapse merges duplicative whitespace It's not very smart but can be useful to clean up some output.
ExtractorByFilename returns an plaintext extractor based on filename heuristic.
InspectImageAlt is a sample for options WIP.
NewGolangText creates a new extractor.
NewHTMLText creates a new HTMLText extractor, using options.
NewIdentity creates an identity-extractor.
NewMarkdownText creates a new extractor.
NewScriptText creates a new file extractor.
StraightQuotes converts maybe fancy typographical characters into their ASCII equivalent.
StripTemplate is a WIP on remove golang template markup from a file.

# Structs

GolangText extracts plaintext from Golang and other similar C or Java like files Need to study.
HTMLText extracts plain text from HTML markup.
Identity provides a pass-through plain text extractor.
MarkdownText extracts plain text from markdown sources.
ScriptText extract plaintext from "generic script" languages that use the '#' character to denote a comment line It's not so smart.

# Interfaces

Extractor is an interface for extracting plaintext.