关于golang:利用掘金部分数据实现-Elasticsearch-示例项目

爬取掘金的热门举荐页面局部信息作为示例数据保留到 es 中进行查问。

本我的项目中对 es 根本的创立、查问和删除操作均有简略实现。

我的项目地址：thep0y/juejin-hot-es-example

简介

查问时应用命令行进行，示例我的项目的命令如下：

juejin allows you to index and search hot-recommended article's titlesUsage:  juejin [command]Available Commands:  delete      Delete item with id  help        Help about any command  index       Index juejin hot-recommended articles into Elasticsearch  search      Search juejin hot recommended articlesFlags:  -h, --help           help for juejin  -i, --index string   Index name (default "juejin")Use "juejin [command] --help" for more information about a command.

可选命令为index、search和delete。

其中index也有可选命令：

      --pages int   The count of pages you want to crawl (default 5)      --setup       Create Elasticsearch index

本我的项目应用的是本地 es ，举荐用 docker 创立，es 中须要装置 ik 中文分词插件。

1 创立索引

go run main.go index --setup

默认会依据我的项目中指定的 mapping 创立索引，并爬取存储 5 页、共 100 条信息。

后果如下所示：

8:10PM INF Creating index with mapping8:10PM INF Starting the crawl with 0 workers at 0 offset8:10PM INF Stored doc Article ID=6957974706943164447 title="算法篇01、排序算法"8:10PM INF Stored doc Article ID=6953868764362309639 title="如何解决浏览器的断网状况？"...8:10PM INF Skipping existing doc ID=69577265786923417918:10PM INF Skipping existing doc ID=69579251184293642558:10PM INF Skipping existing doc ID=69538687643623096398:10PM INF Skipping existing doc ID=69579819126695199038:10PM INF Skipping existing doc ID=69530591195614412878:10PM INF Skipping existing doc ID=6955336007839383588...8:10PM INF Stored doc Article ID=6957930535574306847 title="Node系列-阻塞和非阻塞的了解"8:10PM INF Stored doc Article ID=6956602138201948196 title="《前端畛域的转译打包工具链》上篇"8:10PM INF Stored doc Article ID=6957982556885090312 title="JS篇：事件流"

终端后果截图：

因为每页有 20 条，共爬 5 页，所以实践上应存储 100 条信息，但其中可能会存在几条反复信息，所以最初保留时可能会小于 100 条。

2 爬取 10 页

go run main.go index --pages 10

运行这条命令时，不会再创立索引，而是间接开始爬虫，因为只是示例我的项目，所以没有减少起始页和最终页的抉择，只提供最终页码作为可选参数。

运行后果与上大节基本相同：

3 查问

查问时，应用的是词组查问，中文更适宜应用词组查问，不然每个查问词被拆分成单字查问，后果个别不是咱们想要的。

go run main.go search 前端

查问到的后果中会将查问词高亮显示：

4 删除

go run main.go delete [id]

如：

对已删除的 id 再执行删除操作：