关于网页爬虫:爬取前端渲染网站网站vuereact

最近公司写爬虫然而对于前端渲染的网站(vue,react)
然而 chromedp selenium等又太重了
于是用puppeteer koa2 写了一个通用服务
https://github.com/dollarkillerx/marionette
docker 运行

docker run --name marionette -d -p3000:3000 dollarkiller/marionette:latest

简略说一下这个服务的Restful API

GET /ssr?q=http://google.com

返回 respcode, html body, cookie 均为指标网站的返回

咱们当初应用Go来调用下这个API
Go的http client 举荐一下我本人写的urllib
https://github.com/dollarkillerx/urllib

httpCode, bytes, err = urllib.Get("http://0.0.0.0:3000/ssr").Querys("q","http://google.com").Byte()

这个站点使用 Akismet 来减少垃圾评论。了解你的评论数据如何被处理。

评论