modulepackage
0.0.0-20190422094628-0f3b4a11b312
Repository: https://github.com/philipjkim/goreadability.git
Documentation: pkg.go.dev
# README
goreadability
goreadability is a tool for extracting the primary readable content of a webpage. It is a Go port of arc90's readability project, based on ruby-readability.
From v2.0 goreadability uses opengraph tag values if exists. You can disable opengraph lookup and follow the traditional readability rules by setting Option.LookupOpenGraphTags
to false
.
Install
go get github.com/philipjkim/goreadability
Example
// URL to extract contents (title, description, images, ...)
url := "https://en.wikipedia.org/wiki/Lego"
// Default option
opt := readability.NewOption()
// You can modify some option values if needed.
opt.ImageRequestTimeout = 3000 // ms
content, err := readability.Extract(url, opt)
if err != nil {
log.Fatal(err)
}
log.Println(content.Title)
log.Println(content.Description)
log.Println(content.Images)
Testing
go test
# or if you want to see verbose logs:
DEBUG=true go test -v
Command Line Tool
TODO
Related Projects
- ruby-readability is the base of this project.
- fastimage finds the type and/or size of a remote image given its uri, by fetching as little as needed.
Potential Issues
TODO
License
# Functions
Debug enables debug logging of the operations done by the library.
Extract requests to reqURL then returns contents extracted from the response.
ExtractFromDocument returns Content when extraction succeeds, otherwise error.
NewOption returns the default option.