# README
go-crawler
A simple crawler library implement with Golang, some features same as Scrapy(a python crawler framework)
usage
import "github.com/qhzhyt/go-crawler"
func parse(res *crawler.Response, ctx *crawler.Context) {
title := res.CSS("a").Attrs("href")
ctx.Emit(map[string]interface{}{"title": title})
}
func StartRequests(ctx *crawler.Context) []*crawler.Request {
return crawler.GetURLS(
"http://www.baidu.com/",
"http://www.qq.com/"
)
}
func main() {
crawler.NewCrawler("test").
WithStartRequest(startRequest).
WithDefaultCallback(parse).
OnItem(func(item interface{}, ctx *crawler.Context) interface{} {
fmt.Println(item)
return nil
}).
WithSettings(&crawler.Settings{RequestDelay: 1000}).
Start(true)
}
# Packages
Package htmlquery provides extract data from HTML documents using XPath expression.
# Functions
DefaultPipeLines 默认的pipelines.
DefaultSettings 创建默认Setting.
FormRequest form post request.
FuncPipeline 仅提供一个函数的pipeline.
GetRequest GetRequest.
GetURL GET url.
GetURLs GET url.
NewCrawler 创建一个爬虫.
NewRequest NewRequest.
NewResponse 创建Response func NewResponse(content []byte) *Response { return &Response{Selector: htmlquery.NewSelector(content), Body: content} } NewResponse create a Response from http.Response.
PostRequest basic post request.
No description provided by the author
# Structs
Context 爬虫执行上下文.
CrawlEngine 爬取引擎.
Crawler 爬虫本体.
No description provided by the author
MongoPipeline 默认mongodb pipeline.
Request Crawler的请求.
Response Crawler的响应.
Settings 爬虫配置.
# Interfaces
ItemPipeline pipeline接口.
# Type aliases
Args is http post form.
Cookies request or response Cookies.
Headers request or response headers.
No description provided by the author
ItemPipelineFunc 处理item的函数.
Meta request or response Meta.
No description provided by the author
No description provided by the author
ResponseCallback ResponseCallback.