package
0.0.0-20200612205719-d33463d17312
Repository: https://github.com/slotix/dataflowkit.git
Documentation: pkg.go.dev
# Functions
AllowedByRobots checks if scraping of specified URL is allowed by robots.txt.
AssembleRobotstxtURL robots.txt URL from URL.
getCrawlDelay retrieves Crawl-delay directive from robots.txt.
LoggingMiddleware logs Service endpoints.
No description provided by the author
NewHTTPClient returns an Fetch Service backed by an HTTP server living at the remote instance.
RobotstxtData generates robots.txt url, retrieves its content through API fetch endpoint.
Start func launches Parsing service.
# Structs
BaseFetcher is a Fetcher that uses the Go standard library's http client to fetch URLs.
ChromeFetcher is used to fetch Java Script rendeded pages.
No description provided by the author
Config provides basic configuration.
FetchService implements service with empty struct.
HTMLServer represents the web service that serves up HTML.
LogCodec captures the output from writing RPC requests and reading responses on the connection.
No description provided by the author
Request struct contains request information sent to Fetchers.
# Type aliases
ServiceMiddleware defines a middleware for a Fetch service.
Type represents types of fetcher.