Categorygithub.com/quittymr/scraper

modulepackage

0.0.3

Repository: https://github.com/quittymr/scraper.git

Documentation: pkg.go.dev

# README

Scraper

Is a straightforward Go web-scraper with a simple, flexible interface, inspired by BeautifulSoup.

Quickstart

Create a Scraper from any io.ReadCloser compatible type:

// http.Response.Body
response, _ := http.Get("URL goes here")
page, _ := scraper.NewFromBuffer(response.Body)

// os.File
fileHandle, _ := os.Open("file name goes here")
page, _ := NewFromBuffer(fileHandle)

Construct a Scraper.Filter with one or more criteria:

filter := scraper.Filter{
   Tag: "div",
   Attributes: scraper.Attributes{
      "id":    "div-1",
      "class": "tp-modal",
   },
}

Use the Filter to run a concurrent search on your Scraper page.
Every returned element is a Scraper page that can be searched:

for element := range page.FindAll(filter) {
   for link := range element.FindAll(Filter{Tag:"a"}) {
      fmt.Printf("URL: %v found under %v", link.Attributes()["href"], element.Type())
   }
}

Next steps

~~Find and FindOne implementations~~
~~Concurrent scraping~~
~~Resilience for broken pages (BeautifulSoup-esque)~~
Support for wildcards in attributes
Tests
Full documentation

# Packages

No description provided by the author

# Functions

ContentMissingError

No description provided by the author

MarshallingError

No description provided by the author

NewFromBuffer instantiates a new Scraper instance from a given `http.Response` (net/http).

NewFromNode instantiates a new Scraper instance from a given `html.Node` (golang.org/x/net/html).

No description provided by the author

# Structs

No description provided by the author

Filter is the input to the Scraper's Find methods.

Scraper is the base type used to scrape content.

# Interfaces

Target represents a scope that can be parsed into structured data or rendered as such.

# Type aliases

Attributes specifies tag attributes to be searched for using the Scraper's Find methods.