package
0.0.0-20200612205719-d33463d17312
Repository: https://github.com/slotix/dataflowkit.git
Documentation: pkg.go.dev

# Functions

AllowedByRobots checks if scraping of specified URL is allowed by robots.txt.
AssembleRobotstxtURL robots.txt URL from URL.
getCrawlDelay retrieves Crawl-delay directive from robots.txt.
LoggingMiddleware logs Service endpoints.
No description provided by the author
NewHTTPClient returns an Fetch Service backed by an HTTP server living at the remote instance.
RobotstxtData generates robots.txt url, retrieves its content through API fetch endpoint.
Start func launches Parsing service.

# Constants

Base fetcher is used for downloading html web page using Go standard library's http.
Headless chrome is used to download content from JS driven web pages.

# Structs

BaseFetcher is a Fetcher that uses the Go standard library's http client to fetch URLs.
ChromeFetcher is used to fetch Java Script rendeded pages.
No description provided by the author
Config provides basic configuration.
FetchService implements service with empty struct.
HTMLServer represents the web service that serves up HTML.
LogCodec captures the output from writing RPC requests and reading responses on the connection.
No description provided by the author
Request struct contains request information sent to Fetchers.

# Interfaces

No description provided by the author
Fetcher is the interface that must be satisfied by things that can fetch remote URLs and return their contents.
Service defines Fetch service interface.

# Type aliases

ServiceMiddleware defines a middleware for a Fetch service.
Type represents types of fetcher.