Categorygithub.com/jakewarren/metascraper
modulepackage
0.0.0-20191111171752-9153a56cb0ae
Repository: https://github.com/jakewarren/metascraper.git
Documentation: pkg.go.dev

# README

Metascraper

Metascraper is a web scraping utility. It transforms valid HTML markup into a hierarchy of Go structs. In addition to capturing the raw HTML at the given endpoint, metascraper will pull out meta tags from the page's head, and also extracts schema.org metadata embedded in the document body.

Usage

p, err := metascraper.Scrape(url)
if err != nil {
    log.Fatal(err)
}
log.Println(p.Title)
pretty.Print(p.MetaData())
pretty.Print(p.SchemaData())

See API documentation

Released under the MIT License

# Functions

AttrMap parses the attributes of the current element into a friendly map.
Scrape creates a new page and populates its fields from the content found at the given URL.

# Structs

ItemProp represents a simple schema.org itemprop.
ItemScope represents a schema.org itemscope.
Meta represents a `meta` tag in the head of an HTML document.
MetaReader implements the TokenReader interface; it maintains the necessary state for extracting structured metadata from a stream of HTML tokens.
Page represents an HTML document with metadata.
PageReader implements the TokenReader interface; it maintains the necessary state for extracting the body text and page title from a token stream.
ReaderList implements the TokenReader interface over a slice of TokenReaders.
SchemaReader implements the TokenReader interface; it maintains the necessary state for extracting schema.org metadata from the body of an HTML document.

# Interfaces

TokenReader presents a lightweight version of the usual SAX parser interface, with methods for handling the typical events in a token stream.