modulepackage
0.0.0-20151220083925-43f8fb998d06
Repository: https://github.com/aarzilli/sandblast.git
Documentation: pkg.go.dev
# README
Library that uses Readability-like heuristics to extract text from an HTML document.
Example:
import "golang.org/x/net/html"
…
node, err := html.Parse(bytes.NewReader(raw_html))
if err != nil {
log.Fatal("Parsing error: ", err)
}
title, text := sandblast.Extract(node)
fmt.Printf("Title: %s\n%s", title, text)
…
See also example/extract.go
, a command line utility to extract text from a URL.
# Functions
Returns the body of resp as a decoded string, detecting its encoding.
No description provided by the author
No description provided by the author
No description provided by the author
# Constants
Not implemented.
Keeps link destinations for links embedded inside text blocks.
Not implemented.
Not implemented.
# Type aliases
No description provided by the author