简单的node爬虫存入excel数据分析

30次阅读

共计 1315 个字符，预计需要花费 4 分钟才能阅读完成。

github 地址：https://github.com/lll618xxx/…

思否社区文章太多？哪个是我想要的？对比点赞数 or 对比标题
不用害怕，自己动手用 node 来实现爬虫，麻麻再也不用担心我学习选择困难症啦！

const superagent = require('superagent')
const cheerio = require('cheerio')
const xlsx = require('node-xlsx')
const fs = require('fs')
const options = require('./options')

superagent.get(options.url)
    .then(res => {
        const bufferdata = [{
            name: 'sheet1',
            data: [options.attr.map((item, index, arr) => {return arr[index][2]
            })]
        }]
       
        const $ = cheerio.load(res.text);
        
        $(options.ele).each((index, item) => {let arr = []
            options.attr.forEach((v, i, a) => {arr.push(a[i][1] ? $(item).find(a[i][0]).attr(a[i][1]) : $(item).find(a[i][0]).text())
            })
            bufferdata[0].data.push(arr)
        })
       
        fs.writeFile(options.excelPath, xlsx.build(bufferdata), (err) =>{if (err) throw err;
            console.log('写入 Excel 成功');
        })
    })
    .catch(err => {console.log(err)
    });

核心的代码仅仅只有 36 行哦！

const path = require('path')

// 定义爬虫的页面
const url = 'https://segmentfault.com/hottest/monthly'
// 定义 excel 存放的路径
const excelPath = path.join(__dirname, 'result.xlsx')
// 定义元素范围
const ele =  'div.wrapper div.news-list div.news__item-info' 
// 定义数据属性 ['具体元素'，'属性', '别名']
const attr = [['a', 'href', '链接'],
    ['span.votes-num', '',' 点赞数 '],
    ['h4.news__item-title', '',' 标题名字 '],
    ['span.author a', '',' 作者名字 '],
]

npm i

cd node-reptile-simple && node index.js

url 定义爬虫的页面
excelPath 定义 excel 存放的路径
ele 定义元素范围
attr 定义数据属性 ['具体元素'，'属性', '别名']

可以去 github 查看更完整的内容
爬的不仅仅是思否，只有你想不到的，没有我做不到的！

正文完

excel node.js 网页爬虫

发表至： javascript

2019-07-23

0

关于javascript:Array重新认识一下

关于javascript:华为联机对战下载运行华为官方Unity示例代码提示鉴权失败并返回错误码100114

重拾JSX

关于javascript:如何用JavaScripte和HTML-实现一整套的考试答题卡和成绩表

canvas之人脸美白

简单的node爬虫存入excel数据分析

入门级的 node 爬虫

核心代码

配置代码

安装依赖

运行项目

配置项(options.js)

截图

Just My Socks（注册教程内含优惠码）

简单的node爬虫存入excel数据分析

入门级的 node 爬虫

核心代码

配置代码

安装依赖

运行项目

配置项(options.js)

截图

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）