Categorygithub.com/AlpineMarmot/pulse
modulepackage
0.0.0-20190310164430-3db1f35a3352
Repository: https://github.com/alpinemarmot/pulse.git
Documentation: pkg.go.dev

# README

Pulse

Pulse is a crawler build on top of gocolly/colly

Features:

  • Expose all golly/colly options to a yml configuration
  • Create rule(s) that export crawling data to MongoDB

Installation

Go modules must be enabled

$ go build

Usage

$ pulse [-q][--no-logging] [-c configFile] [url entrypoint]

$ pulse -c conf.yml https://www.example.com

Configuration example

see default.yml

Grab HTML data

This rule below will add to mongodb collection "images" the value of src attribute for all tag img. The context-attr is also added as images metadata.

collection: "images"
tag: "img"
attr: "src"
context-attr: "alt"

You can also grab html attributes with a selector instead of tag.

collection: "images-test"
selector: "img[data-src]"
attr: "data-src"
context-attr: "alt"

More infos about selector here: PuerkitoBio/goquery

# Packages

No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author
No description provided by the author

# Functions

No description provided by the author

# Structs

No description provided by the author
No description provided by the author
No description provided by the author