modulepackage
0.0.0-20190310164430-3db1f35a3352
Repository: https://github.com/alpinemarmot/pulse.git
Documentation: pkg.go.dev
# README
Pulse
Pulse is a crawler build on top of gocolly/colly
Features:
- Expose all golly/colly options to a yml configuration
- Create rule(s) that export crawling data to MongoDB
Installation
Go modules must be enabled
$ go build
Usage
$ pulse [-q][--no-logging] [-c configFile] [url entrypoint]
$ pulse -c conf.yml https://www.example.com
Configuration example
see default.yml
Grab HTML data
This rule below will add to mongodb collection "images" the value of src
attribute for all tag img
. The context-attr
is also added as images metadata.
collection: "images"
tag: "img"
attr: "src"
context-attr: "alt"
You can also grab html attributes with a selector
instead of tag
.
collection: "images-test"
selector: "img[data-src]"
attr: "data-src"
context-attr: "alt"
More infos about selector here: PuerkitoBio/goquery
# Packages
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
# Functions
No description provided by the author