# README
Scraper
Is a straightforward Go web-scraper with a simple, flexible interface, inspired by BeautifulSoup.
Quickstart
-
Create a
Scraper
from anyio.ReadCloser
compatible type:// http.Response.Body response, _ := http.Get("URL goes here") page, _ := scraper.NewFromBuffer(response.Body) // os.File fileHandle, _ := os.Open("file name goes here") page, _ := NewFromBuffer(fileHandle)
-
Construct a
Scraper.Filter
with one or more criteria:filter := scraper.Filter{ Tag: "div", Attributes: scraper.Attributes{ "id": "div-1", "class": "tp-modal", }, }
-
Use the
Filter
to run a concurrent search on yourScraper
page.
Every returned element is aScraper
page that can be searched:for element := range page.FindAll(filter) { for link := range element.FindAll(Filter{Tag:"a"}) { fmt.Printf("URL: %v found under %v", link.Attributes()["href"], element.Type()) } }
Next steps
Find and FindOne implementationsConcurrent scrapingResilience for broken pages (BeautifulSoup-esque)- Support for wildcards in attributes
- Tests
- Full documentation
# Packages
No description provided by the author
# Functions
No description provided by the author
No description provided by the author
NewFromBuffer instantiates a new Scraper instance from a given `http.Response` (net/http).
NewFromNode instantiates a new Scraper instance from a given `html.Node` (golang.org/x/net/html).
No description provided by the author
# Structs
No description provided by the author
Filter is the input to the Scraper's Find methods.
Scraper is the base type used to scrape content.
# Interfaces
Target represents a scope that can be parsed into structured data or rendered as such.
# Type aliases
Attributes specifies tag attributes to be searched for using the Scraper's Find methods.