关于elasticsearch:logstash导入movielens测试数据

55次阅读

共计 1138 个字符,预计需要花费 3 分钟才能阅读完成。

1. movielens 数据

https://grouplens.org/dataset…
学习训练,应用最小数据集即可:
(ml-latest-small)[https://files.grouplens.org/d…]

2. logstash 配置文件:

在 logstash/conf 目录下拷贝一份 logstash-sample.conf 文件,命名为:logstash-movies.conf,内容如下:

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  file {
    path => "/export/_backup/elk_bak/ml-latest-small/movies.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  csv {
    separator => ","
    columns => ["id", "content", "genre"]
  }
  
  mutate {split => { "genre" => "|"}
    remove_field => ["path", "host", "@timestamp", "message"]
  }
  
  mutate {split => { "content" => "("}
    add_field => {"title" => "%{[content][0]}"}
    add_field => {"year" => "%{[content][1]}"}
  }
  
  mutate {
    convert => {"year" => "integer"}
    strip => ["title"]
    remove_field => ["path", "host", "@timestamp", "content"]
  }
}

output {
  elasticsearch {hosts => ["http://localhost:9200"]
    index => "movies"
    document_id => "%{id}"
    #user => "user"
    #password => "password"
  }
  stdout {}}

3. 执行导入

bin/logstash -f config config/logstash-movies.conf
执行须要等一会!
而后控制台输入内容,如下

......
{
          "id" => "193609",
       "genre" => [[0] "Comedy"
    ],
       "title" => "Andrew Dice Clay: Dice Rules",
    "@version" => "1",
        "year" => 1991
}

待控制台不再输入,ctrl+ c 进行即可

4. kibana 检查数据是否导入 index

index 治理中呈现所导入的索引,即胜利!

正文完
 0