Categorygithub.com/quittymr/scraper
modulepackage
0.0.3
Repository: https://github.com/quittymr/scraper.git
Documentation: pkg.go.dev

# README

Scraper

Is a straightforward Go web-scraper with a simple, flexible interface, inspired by BeautifulSoup.

Quickstart

  1. Create a Scraper from any io.ReadCloser compatible type:

    // http.Response.Body
    response, _ := http.Get("URL goes here")
    page, _ := scraper.NewFromBuffer(response.Body)
    
    // os.File
    fileHandle, _ := os.Open("file name goes here")
    page, _ := NewFromBuffer(fileHandle)
    
  2. Construct a Scraper.Filter with one or more criteria:

    filter := scraper.Filter{
       Tag: "div",
       Attributes: scraper.Attributes{
          "id":    "div-1",
          "class": "tp-modal",
       },
    }
    
  3. Use the Filter to run a concurrent search on your Scraper page.
    Every returned element is a Scraper page that can be searched:

    for element := range page.FindAll(filter) {
       for link := range element.FindAll(Filter{Tag:"a"}) {
          fmt.Printf("URL: %v found under %v", link.Attributes()["href"], element.Type())
       }
    }
    

Next steps

  • Find and FindOne implementations
  • Concurrent scraping
  • Resilience for broken pages (BeautifulSoup-esque)
  • Support for wildcards in attributes
  • Tests
  • Full documentation

# Packages

No description provided by the author

# Functions

No description provided by the author
No description provided by the author
NewFromBuffer instantiates a new Scraper instance from a given `http.Response` (net/http).
NewFromNode instantiates a new Scraper instance from a given `html.Node` (golang.org/x/net/html).
No description provided by the author

# Structs

No description provided by the author
Filter is the input to the Scraper's Find methods.
Scraper is the base type used to scrape content.

# Interfaces

Target represents a scope that can be parsed into structured data or rendered as such.

# Type aliases

Attributes specifies tag attributes to be searched for using the Scraper's Find methods.