关于数据库:一文搞定-Apache-SeaTunnel-231-全流程部署使用

https://dlcdn.apache.org/incubator/seatunnel/2.3.1/apache-seatunnel-incubating-2.3.1-bin.tar.gz

下载结束之后上传到服务器下面并解压

# 解压到了 /opt/module 目录下
tar -zxvf apache-seatunnel-incubating-2.3.1-bin.tar.gz -C /opt/module

在 apache 的仓库下载相应的 connector, 下载时每个 jar 包在不同的门路上面，放到 /seatunnel-2.3.1/connectors/seatunnel 目录下

https://repo.maven.apache.org/maven2/org/apache/seatunnel/

connector-assert-2.3.1.jar
connector-cdc-mysql-2.3.1.jar
connector-console-2.3.1.jar # 自带的
connector-doris-2.3.1.jar
connector-elasticsearch-2.3.1.jar
connector-fake-2.3.1.jar # 自带的
connector-file-hadoop-2.3.1.jar
connector-file-local-2.3.1.jar
connector-hive-2.3.1.jar
connector-iceberg-2.3.1.jar
connector-jdbc-2.3.1.jar
connector-kafka-2.3.1.jar
connector-redis-2.3.1.jar

配置装置 seatunnel 的插件

vim  seatunnel-2.3.1/config/plugin_config

调用装置脚本的时候会在 maven 的地方仓库下载对应的 jar 包，尽量少放，下载太慢了, 我放了这些

--connectors-v2--
connector-assert
connector-cdc-mysql
connector-jdbc
connector-fake
connector-console
--end--

sh bin/install-plugin.sh 2.3.1

整个过程十分慢 … 应该是从 maven 地方仓库下载货色

应用 hive 的话须要将这 3 个 jar 放入到 seatunnel-2.3.1/lib 目录下：

hive-exec-2.3.9.jar
# 下载链接
# https://repo.maven.apache.org/maven2/org/apache/hive/hive-exec/2.3.9/hive-exec-2.3.9.jar
# 留神这里是 hive-exec-2.3.9.jar，不要从你的 hive 的 lib 目录下拷贝最新的 jar 包，就用这个

seatunnel-hadoop3-3.1.4-uber-2.3.1.jar  
# 下载链接
# https://repo.maven.apache.org/maven2/org/apache/seatunnel/seatunnel-hadoop3-3.1.4-uber/2.3.1/seatunnel-hadoop3-3.1.4-uber-2.3.1.jar  


seatunnel-hadoop3-3.1.4-uber-2.3.1-optional.jar
# 下载链接
# hhttps://repo.maven.apache.org/maven2/org/apache/seatunnel/seatunnel-hadoop3-3.1.4-uber/2.3.1/seatunnel-hadoop3-3.1.4-uber-2.3.1-optional.jar

两头因为其余缘故我拷贝了一个 hive 框架 /lib 目录下的 libfb303-0.9.3.jar 放到 seatunnel 的lib目录下了。

应用 mysql 的话须要将 mysql 的驱动拷贝过去，应该是须要 8 系列的 mysql 驱动，我这里应用的是mysql-connector-java-8.0.21.jar

seatunnel-2.3.1/config/v2.batch.config.template

env {
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
}

source {
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}


sink {Console {}
}

运行命令

cd /opt/module/seatunnel-2.3.1
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e lcoal

运行胜利的话会能够在 console 看到打印的测试数据

我新建了一个用来放运行配置的目录/opt/module/seatunnel-2.3.1/job

vim mysql_2console.conf

mysql_2console.conf

env {
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
}
source{
    Jdbc {
        url = "jdbc:mysql://hadoop102/dim_db?useUnicode=true&characterEncoding=utf8&useSSL=false"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "111111"
        query = "select * from dim_basicdata_date_a_d where date <'2010-12-31'"
    }
}

sink {Console {}
}

查问的是一张日期维表的数据

建表语句：

CREATE DATABASE dim_db DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

drop table if exists  dim_db.dim_basicdata_date_a_d;
create table if not exists dim_db.dim_basicdata_date_a_d
(`date`          varchar(40) comment '日期',
    `year`          varchar(40) comment '年',
    `quarter`       varchar(40) comment '季度（1/2/3/4）',
    `season`        varchar(40) comment '节令（秋季 / 冬季 / 秋季 / 夏季）',
    `month`         varchar(40) comment '月',
    `day`           varchar(40) comment '日',
    `week`          varchar(40) comment '年内第几周',
    `weekday`       varchar(40) comment '周几（1- 周一 /2- 周二 /3- 周三 /4- 周四 /5- 周五 /6- 周六 /7- 周日）',
    `is_workday`    varchar(40) comment '是否是工作日（1- 是,0- 否）',
    `date_type`     varchar(40) comment '节假日类型（工作日 / 法定下班[调休]/ 周末 / 节假日）',
    `update_date`   varchar(40) comment '更新日期'
);

能够本人插入几条数据试试

运行命令

cd /opt/module/seatunnel-2.3.1
./bin/seatunnel.sh --config ./job/mysql_2console.conf  -e local

创立一张 hive 表

CREATE database db_hive;

drop table if exists  db_hive.dim_basicdata_date_a_d;
create table if not exists db_hive.dim_basicdata_date_a_d
(
    `date`          string comment '日期',
    `year`          string comment '年',
    `quarter`       string comment '季度（1/2/3/4）',
    `season`        string comment '节令（秋季 / 冬季 / 秋季 / 夏季）',
    `month`         string comment '月',
    `day`           string comment '日',
    `week`          string comment '年内第几周',
    `weekday`       string comment '周几（1- 周一 /2- 周二 /3- 周三 /4- 周四 /5- 周五 /6- 周六 /7- 周日）',
    `is_workday`    string comment '是否是工作日（1- 是,0- 否）',
    `date_type`     string comment '节假日类型（工作日 / 法定下班[调休]/ 周末 / 节假日）',
    `update_date`   string comment '更新日期'
);

自行插入几条数据

创立配置文件hive_2console.conf

env {
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
}
source{
  Hive {
    table_name = "db_hive.dim_basicdata_date_a_d"
    metastore_uri = "thrift://hadoop102:9083"
  }
}

sink {Console {}
}

这里我应用的 hive 连贯形式是 jdbc 拜访元数据，所以 metastore_uri = "jdbc:hive2://hadoop102:10000" 也能够失常应用。

hive-site.xml批改配置文件, 有可能你曾经配置好了

    <!-- 为了不便连贯，采纳直连的形式连贯到 hive 数据库，正文掉上面三条配置信息 -->
    <!-- 指定存储元数据要连贯的地址 -->

        <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hadoop102:9083</value>
    </property>


    <!-- 指定 hiveserver2 连贯的 host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
    </property>

    <!-- 指定 hiveserver2 连贯的端口号 -->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>

运行命令

cd /opt/module/seatunnel-2.3.1
./bin/seatunnel.sh --config ./job/hive_2console.conf -e local

创立配置文件

dim_basicdate_mysql_2hive.conf

env {
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
}
source{
    Jdbc {
        url = "jdbc:mysql://hadoop102/dim_db?useUnicode=true&characterEncoding=utf8&useSSL=false"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "root"
        password = "111111"
        query = "select * from dim_basicdata_date_a_d"
    }
}

sink {
    Hive {
        table_name = "db_hive.dim_basicdata_date_a_d"
        metastore_uri = "thrift://hadoop102:9083"
    }
}

运行命令

cd /opt/module/seatunnel-2.3.1
./bin/seatunnel.sh --config ./job/dim_basicdate_mysql_2hive.conf-e local

Apache SeaTunnel 是一个分布式、高性能、易扩大、用于海量数据（离线 & 实时）同步和转化的数据集成平台

仓库地址：
https://github.com/apache/seatunnel

网址：
https://seatunnel.apache.org/

Proposal：
https://cwiki.apache.org/confluence/display/INCUBATOR/SeaTunn…

Apache SeaTunnel 下载地址：
https://seatunnel.apache.org/download

衷心欢送更多人退出！

咱们置信，在「Community Over Code」（社区大于代码）、「Open and Cooperation」（凋谢合作）、「Meritocracy」（精英治理）、以及「多样性与共识决策」等 The Apache Way 的指引下，咱们将迎来更加多元化和容纳的社区生态，共建开源精力带来的技术提高！

咱们诚邀各位有志于让外乡开源立足寰球的搭档退出 SeaTunnel 贡献者小家庭，一起共建开源!

提交问题和倡议：
https://github.com/apache/seatunnel/issues

奉献代码：
https://github.com/apache/seatunnel/pulls

订阅社区开发邮件列表 :
dev-subscribe@seatunnel.apache.org

开发邮件列表：
dev@seatunnel.apache.org

退出 Slack:
https://join.slack.com/t/apacheseatunnel/shared_invite/zt-1kc…

关注 Twitter:
https://twitter.com/ASFSeaTunnel

本文由白鲸开源科技提供公布反对！

关于数据库:一文搞定-Apache-SeaTunnel-231-全流程部署使用

1 部署

1.1 下载解压

1.2 下载对应的 connector

1.3 装置 seatunnel

⭐1.4 补充一些 jar 包

2 测试样例

2.1 官网 demo fake to console

2.2 mysql to console

2.3 hive to console

2.4 mysql to hive

3 关注 Apache SeaTunnel