关于数据迁移:流批一体数据交换引擎解决方案

市场需求 产品概述 利用场景 可行性 产品价值 参考资料 [收费下载](https://github.com/hw2499/etl-engine/releases) [etl-engine使用手册](https://github.com/hw2499/etl-engine) [etl-crontab使用手册](https://github.com/hw2499/etl-engine/wiki/etl-crontab%E8%B0%83%E5%BA%A6) [嵌入脚本开发](https://github.com/hw2499/etl-engine/wiki/%E5%B5%8C%E5%85%A5%E8%84%9A%E6%9C%AC%E5%BC%80%E5%8F%91) [etl-engine配置样例](https://github.com/hw2499/etl-engine/wiki/etl-engine%E4%BD%BF%E7%94%A8%E6%A0%B7%E4%BE%8B)文章起源https://pan.baidu.com/s/1eMkM85RPqI76VaJVReVDJQ?pwd=k4dy

May 11, 2023 · 1 min · jiezi

关于数据迁移:技术分享-OMS-初识

作者:高鹏 DBA,负责我的项目日常问题排查,广告位长期出租 。 本文起源:原创投稿 *爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。 本文次要贡献者:进行OMS源码剖析的@操盛春(北分之光) 一、走进OMS本文以OMS社区版3.3.1为例 咱们能够从官网的地址上获取到它的架构图,这是它大略的样子: 能够看到一个OMS数据迁徙工具它蕴含了很多的组件,有DBCat、Store、Connector、JDBCWriter、Checker 和 Supervisor 等,组件的各个性能这里就不copy官网的叙述了,毕竟有手就行。接下来说点官网上没有的。 之前领导让我进行OMS性能测试时要顺便打印火焰图看一下,剖析迁徙过程中工夫都用在了哪里,然而当我登录OMS容器后看到很多相干java过程,一时间分不清哪个过程是做什么的,那么接下里咱们就对这些过程逐个阐明 1.Ghana-endpoint[ActionTech ~]# ps uax | grep Ghana-endpointroot 61 3.1 0.5 20918816 1582384 pts/0 Sl Feb07 1756:47 java -Dspring.config.location=/home/ds/ghana/config/application-oms.properties -server -Xms1g -Xmx2g -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xloggc:/home/admin/logs/ghana/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/ghana -jar /home/ds/ghana/boot/Ghana-endpoint-1.0.0-executable.jarGhana-endpoint负责提供OMS 后盾治理界面、调度TaskJob和StructTaskJob程序 tips: StructTaskJob:构造迁徙任务调度程序 TaskJob: TaskJob::scheduledTask(),负责正向切换步骤的子工作执行的调度TaskJob::scheduleMigrationProject(),负责构造迁徙我的项目所有步骤初始化 & 工作执行进度监控的调度2.commons-daemon(CM)[ActionTech ~]# ps uax | grep commons-daemonroot 50 297 1.7 25997476 4711620 pts/0 Sl Feb07 163685:09 java -cp /home/ds/cm/package/deployapp/lib/commons-daemon.jar:/home/ds/cm/package/jetty/start.jar -server -Xmx4g -Xms4g -Xmn4g -Dorg.eclipse.jetty.util.URI.charset=utf-8 -Dorg.eclipse.jetty.server.Request.maxFormContentSize=0 -Dorg.eclipse.jetty.server.Request.maxFormKeys=20000 -DSTOP.PORT=8089 -DSTOP.KEY=cm -Djetty.base=/home/ds/cm/package/deployapp org.eclipse.jetty.start.MainCM集群治理过程,为OMS 治理后盾过程提供接口,用于创立拉取增量日志、全量迁徙、增量同步、全量校验等工作,以及获取这些工作的执行进度 ...

April 18, 2023 · 2 min · jiezi

关于数据迁移:流批一体数据交换引擎-etlengine

互联网诞生之初尽管数据量暴增,单日事实表条数达千万级别, 但客户需要场景更多是“t+1”模式,只需对当日、当周、当月数据进行剖析,这些诉求仅离线剖析就可满足。 随着大数据畛域一直倒退,企业对于业务场景的诉求也从离线的满足转到高实时性的要求,数栈产品也在这一过程中进行着一直的迭代降级,随之诞生了kafka+flink组合 ,同时kafka + etl-engine(交融计算的加持)组合也实现了轻量级的流式计算引擎。 流计算与批计算比照数据时效性流式计算实时、低提早,流式计算适宜以“t+0”的模式出现业务数据;批计算非实时、高提早,批计算适宜以“t+1”的模式出现业务数据;数据特色流式计算数据个别是动态数据,数据是随时产生的;批计算数据个别是静态数据,数据当时曾经存储在各种介质中。利用场景流式计算利用在实时场景,如:业务监控、实时举荐等。批计算利用在离线计算场景,如:数据分析、离线报表等。运行形式流式计算的工作是阻塞式的,始终继续运行中。批计算的工作是一次性实现即完结。 etl-engine 实现流式计算etl-engine 反对通过本身提供的 ”kafka生产节点“进行音讯生产,并在生产数据流(音讯流)的同时调用本身提供的“交融查问API”,实现将多种数据源的维表数据读取到内存中,而后将音讯流与多个维表数据进行各种关联查问,最初输入交融查问后果集到指标源,罕用在将多个维表数据与实时音讯流关联后转换成一个大宽表的场景。 交融查问语法交融查问语法遵循ANSI SQL规范,与惯例MySQL查问语法很类似。 反对对多种类别数据库之间读取的数据进行交融查问。反对音讯流数据传输过程中动静产生的数据与多种类型数据库之间的流计算查问。交融查问语法遵循ANSI SQL规范。select 反对SELECT [DISTINCT] field [, field ...] FROM table [, table ...] [WHERE condition] [GROUP BY field [, field ...] ] [HAVING condition] [ORDER BY order_item [, order_item ...] ] [LIMIT number_of_records [OFFSET number_of_records ] ]table 反对 INNER JOINLEFT JOINRIGHT JOINOUTER JOINCROSS JOIN后续会具体介绍交融查问反对的内容。 参考资料 [收费下载](https://github.com/hw2499/etl-engine/releases) [etl-engine使用手册](https://github.com/hw2499/etl-engine) [etl-crontab使用手册](https://github.com/hw2499/etl-engine/wiki/etl-crontab%E8%B0%83%E5%BA%A6) [嵌入脚本开发](https://github.com/hw2499/etl-engine/wiki/%E5%B5%8C%E5%85%A5%E8%84%9A%E6%9C%AC%E5%BC%80%E5%8F%91) [etl-engine配置样例](https://github.com/hw2499/etl-engine/wiki/etl-engine%E4%BD%BF%E7%94%A8%E6%A0%B7%E4%BE%8B)

April 8, 2023 · 1 min · jiezi

关于数据迁移:分布式-利用-TiDB-DM-将数据从-MySQL-迁移至-DBLE

作者:沈光宇 爱可生南区 DBA 团队成员,次要负责 MySQL 故障解决和性能优化。对技术执着,为客户负责。 本文起源:原创投稿 *爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。 一、背景某客户MySQL实例中数据较大(上T的数据量),须要将MySQL中数据全量/增量疾速迁徙至DBLE。TiDB DM反对多线程数据导出导入及增量同步,因而抉择了TiDB DM为作数据迁徙工具。本文以此案例为背景,介绍应用TiDB DM将数据从MySQL迁徙至DBLE的办法及遇到的一些问题。 二、数据迁徙示意图及服务器环境(1)数据迁徙示意图如下: (2)服务器环境如下:服务器IP角色端口版本备注10.186.65.83dm master8261/8291V2.0.7tiup装置节点10.186.65.118dm worker8262V2.0.7 10.186.65.14SRC MySQL4490MySQL 8.0.18 10.186.65.4DBLE8066DBLE 3.21.10.6 10.186.65.61DBLE datanode4408MySQL 8.0.25 10.186.65.65DBLE datanode4408MySQL 8.0.25 三、装置 TiDB DM(1)筹备环境及装置 tiup#别离在dm master和worker节点主机创立tidb用户,明码为dmadmin,并配置sudoshell> useradd tidbshell> echo "dmadmin" | passwd tidb --stdinshell> echo "tidb ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/tidb#切换至tidb用户,生成密钥,并做ssh信赖,dm集群内机器都须要做ssh信赖shell> su - tidbshell> ssh-keygen -t rsashell> ssh-copy-id -i ~/.ssh/id_rsa.pub tidb@10.186.65.118shell> ssh-copy-id -i ~/.ssh/id_rsa.pub tidb@10.186.65.83#装置tiup,在tidb用户下shell> curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | shshell> source .bash_profileshell> which tiup~/.tiup/bin/tiup#装置dmctlshell> tiup install dm dmctl:v2.0.7component dm version v1.10.2 is already installeddownload https://tiup-mirrors.pingcap.com/dmctl-v2.0.7-linux-amd64.tar.gz 26.92 MiB / 26.92 MiB 100.00% 10.01 MiB/s(2)装置dm-cluster#生成配置模版shell> tiup dm template > dm_topology.yaml#批改后配置文件如下:shell> cat dm_topology.yaml |grep -v ^[#-] |grep -v ^$global: user: "tidb" ssh_port: 22 deploy_dir: "/home/tidb/dm/deploy" data_dir: "/home/tidb/dm/data"master_servers: - host: 10.186.65.83worker_servers: - host: 10.186.65.118monitoring_servers: - host: 10.186.65.83grafana_servers: - host: 10.186.65.83alertmanager_servers: - host: 10.186.65.83 #装置、并启动DM集群shell> tiup dm deploy dm-test v2.0.7 ./dm_topology.yaml --user tidb -i /home/tidb/.ssh/id_rsashell> tiup dm start dm-test#查看DM集群[tidb@10_186_65_83 ~]$ tiup dm display dm-testtiup is checking updates for component dm ...Starting component `dm`: /home/tidb/.tiup/components/dm/v1.10.2/tiup-dm display dm-testCluster type: dmCluster name: dm-testCluster version: v2.0.7Deploy user: tidbSSH type: builtinGrafana URL: http://10.186.65.83:3000ID Role Host Ports OS/Arch Status Data Dir Deploy Dir-- ---- ---- ----- ------- ------ -------- ----------10.186.65.83:9093 alertmanager 10.186.65.83 9093/9094 linux/x86_64 Up /home/tidb/dm/data/alertmanager-9093 /home/tidb/dm/deploy/alertmanager-909310.186.65.83:8261 dm-master 10.186.65.83 8261/8291 linux/x86_64 Healthy|L /home/tidb/dm/data/dm-master-8261 /home/tidb/dm/deploy/dm-master-826110.186.65.118:8262 dm-worker 10.186.65.118 8262 linux/x86_64 Free /home/tidb/dm/data/dm-worker-8262 /home/tidb/dm/deploy/dm-worker-826210.186.65.83:3000 grafana 10.186.65.83 3000 linux/x86_64 Up - /home/tidb/dm/deploy/grafana-300010.186.65.83:9090 prometheus 10.186.65.83 9090 linux/x86_64 Up /home/tidb/dm/data/prometheus-9090 /home/tidb/dm/deploy/prometheus-9090Total nodes: 5四、DBLE端配置(1)db.xml<?xml version="1.0"?><dble:db xmlns:dble="http://dble.cloud/"> <dbGroup name="dbGroup1" rwSplitMode="0" delayThreshold="-1"> <heartbeat>select 1</heartbeat> <dbInstance name="host_1" url="10.186.65.61:4408" user="sgy" password="QfnHoIeIYL7ZT+EdJMYNuiLw6glbx2hEyrqxK+uxFPU8vja8vZHCLEKvKBOt1vOJdXCCgW7wNidJaTYWORDaUg==" maxCon="2000" minCon="50" primary="true" usingDecrypt="true"></dbInstance> </dbGroup> <dbGroup name="dbGroup2" rwSplitMode="0" delayThreshold="-1"> <heartbeat>select 1</heartbeat> <dbInstance name="host_2" url="10.186.65.65:4408" user="sgy" password="d+rG/82+4h21ARS8D6Gu5MIFQ2UBC0h+ZS0EI7bvWAtTJUBsKuFY8+AVZtW1pZk+qtISoI2WpVt29Z9eqMRP5A==" maxCon="2000" minCon="50" primary="true" usingDecrypt="true"></dbInstance> </dbGroup></dble:db>#注:用户明码为加密后的字符串(2)sharding.xml<?xml version="1.0"?><dble:sharding xmlns:dble="http://dble.cloud/" > <schema name="dm_meta" sqlMaxLimit="-1"> <singleTable name="mysql_sync_to_dble_loader_checkpoint" shardingNode="dm_meta" /> <singleTable name="mysql_sync_to_dble_syncer_checkpoint" shardingNode="dm_meta" /> </schema> <schema name="sbtest" sqlMaxLimit="-1"> <shardingTable name="sbtest1" shardingNode="dn1,dn2,dn3,dn4" sqlMaxLimit="-1" shardingColumn="id" function="pro_func_jumpHash"></shardingTable> <shardingTable name="sbtest2" shardingNode="dn1,dn2,dn3,dn4" sqlMaxLimit="-1" shardingColumn="id" function="pro_func_jumpHash"></shardingTable> <shardingTable name="t1" shardingNode="dn1,dn2,dn3,dn4" sqlMaxLimit="-1" shardingColumn="c1" function="pro_func_jumpHash"></shardingTable> <shardingTable name="t2" shardingNode="dn1,dn2,dn3,dn4" sqlMaxLimit="-1" shardingColumn="c1" function="pro_func_jumpHash"></shardingTable> </schema> <shardingNode name="dn1" dbGroup="dbGroup1" database="dh_dn1"></shardingNode> <shardingNode name="dn2" dbGroup="dbGroup1" database="dh_dn2"></shardingNode> <shardingNode name="dn3" dbGroup="dbGroup2" database="dh_dn3"></shardingNode> <shardingNode name="dn4" dbGroup="dbGroup2" database="dh_dn4"></shardingNode> <shardingNode name="dm_meta" dbGroup="dbGroup1" database="dm_meta"/> <function name="pro_func_jumpHash" class="jumpStringHash"> <property name="partitionCount">4</property> <property name="hashSlice">0:0</property> </function></dble:sharding>注:须要提前配置好DM同步时保留数据的schema及两张表,表名别离是dm同步工作名 + _loader_checkpoint 即表mysql_sync_to_dble_loader_checkpointdm同步工作名 + _syncer_checkpoint 即表mysql_sync_to_dble_syncer_checkpoint(3)user.xml<?xml version="1.0"?><dble:user xmlns:dble="http://dble.cloud/"> <managerUser name="root" password="RYQdYYnzbcZlDuhV4LhJxFTM8kbU3d0y183LU+FgRWzNscvvvFPcdmeEKMdswMyzIaA+kObcozliHEYAlT0AjA==" usingDecrypt="true"></managerUser> <shardingUser name="sz" schemas="sbtest,dm_meta" password="jyaXmGFU+mdTGbUZIVbdEcVwItLPI+Yjxdq4wkOzhAYRB29WGZJd0/PkTJh3ky/v4E2yYoqgUzJXPzPRPiQF0Q==" usingDecrypt="true" readOnly="false" maxCon="1000"></shardingUser></dble:user>#注:用户明码为加密后的字符串(4)登陆 DBLE 治理端执行创立物理数据库命令shell> /data/mysql/base/5.7.36/bin/mysql -uroot -padmin -h 10.186.65.4 -P 9066dble> create database @@shardingNode = 'dn$1-4';Query OK, 1 row affected (0.06 sec)dble> create database @@shardingNode = 'dm_meta';Query OK, 1 row affected (0.01 sec)五、在源端 MySQL 压测创立测试数据#执行prepare,初始化100W行数据 /usr/share/sysbench/oltp_insert.lua --mysql-db=sbtest --mysql-host=10.186.65.14 --mysql-port=4490 --mysql-user=sgy --mysql-password=admin --tables=2 --table-size=1000000 --threads=16 --time=60 --report-interval=1 --max-requests=0 --percentile=95 --mysql-ignore-errors=1062 preparesysbench 1.0.17 (using system LuaJIT 2.0.4)Initializing worker threads...Creating table 'sbtest1'...Creating table 'sbtest2'...Inserting 1000000 records into 'sbtest2'Inserting 1000000 records into 'sbtest1'Creating a secondary index on 'sbtest2'...Creating a secondary index on 'sbtest1'...#利用sysbench一直写入数据shell> /usr/share/sysbench/oltp_read_write.lua --mysql-db=sbtest --mysql-host=10.186.65.14 --mysql-port=4490 --mysql-user=sgy --mysql-password=admin --tables=2 --table-size=2000000 --threads=16 --time=6000 --report-interval=1 --max-requests=0 --percentile=95 --mysql-ignore-errors=1062 runsysbench 1.0.17 (using system LuaJIT 2.0.4)Running the test with following options:Number of threads: 16Report intermediate results every 1 second(s)Initializing random number generator from current timeInitializing worker threads...Threads started![ 1s ] thds: 16 tps: 585.44 qps: 11916.49 (r/w/o: 8360.33/1282.63/2273.53) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00[ 2s ] thds: 16 tps: 744.18 qps: 14819.02 (r/w/o: 10379.16/1609.36/2830.50) lat (ms,95%): 41.85 err/s: 0.00 reconn/s: 0.00[ 3s ] thds: 16 tps: 605.11 qps: 12144.20 (r/w/o: 8500.54/1321.24/2322.42) lat (ms,95%): 47.47 err/s: 0.00 reconn/s: 0.00[ 4s ] thds: 16 tps: 773.91 qps: 15558.16 (r/w/o: 10881.71/1737.79/2938.65) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00六、启动数据同步工作(1)配置数据源#应用dmctl工具生成明码加密字符串,每次执行都会产生不同的加密字符串shell> tiup dmctl encrypt 'admin'dmXgktpuF18RP1mDN/B2UkA6pDN9PQ==#mysql数据库配置文件如下shell> cat mysql_source_14_4490.yamlsource-id: "mysql_source_14_4490"from: host: "10.186.65.14" port: 4490 user: "sgy" password: "dmXgktpuF18RP1mDN/B2UkA6pDN9PQ==" #创立数据源shell> tiup dmctl --master-addr 10.186.65.83:8261 operate-source create ./mysql_source_14_4490.yamlStarting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr 10.186.65.83:8261 operate-source create ./mysql_source_14_4490.yaml{ "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql_source_14_4490", "worker": "dm-10.186.65.118-8262" } ]}#查看数据源shell> tiup dmctl --master-addr 10.186.65.83:8261 operate-source show{ "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql_source_14_4490", "worker": "dm-10.186.65.118-8262" } ]}(2)配置同步工作#同步配置文件如下shell> cat mysql_sync_to_dble.yamlname: "mysql_sync_to_dble" #同步工作名task-mode: "all" #全量 + Binlog 实时同步clean-dump-file: truetimezone: "Asia/Shanghai"ignore-checking-items: ["auto_increment_ID"]target-database: host: "10.186.65.4" #DBLE主机IP地址 port: 8066 #DBLE流量端口 user: "sz" #连贯DBLE的用户 password: "KRfSNtdxe63tehpnCYoCz0ABdUGivg==" #连贯DBLE用户明码,用tiup dmctl encrypt 'password'生成mysql-instances:- source-id: "mysql_source_14_4490" block-allow-list: "global" mydumper-config-name: "global" loader-config-name: "global" syncer-config-name: "global"block-allow-list: global: do-dbs: ["sbtest"] #只迁徙sbtest库mydumpers: global: extra-args: "-B sbtest" #只dump sbtest库loaders: global: pool-size: 8syncers: global: worker-count: 8#更具体配置阐明参考官网文档:https://docs.pingcap.com/zh/tidb-data-migration/v2.0/task-configuration-file-full#%E5%AE%8C%E6%95%B4%E9%85%8D%E7%BD%AE%E6%96%87%E4%BB%B6%E7%A4%BA%E4%BE%8B#查看配置文件shell> tiup dmctl:v2.0.7 --master-addr=10.186.65.83:8261 check-task ./mysql_sync_to_dble.yamlStarting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr=10.186.65.83:8261 check-task ./mysql_sync_to_dble.yaml{ "result": true, "msg": "check pass!!!"}#启动同步工作shell> tiup dmctl:v2.0.7 --master-addr=10.186.65.83:8261 start-task ./mysql_sync_to_dble.yamlStarting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr=10.186.65.83:8261 start-task ./mysql_sync_to_dble.yaml{ "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "source": "mysql_source_14_4490", "worker": "dm-10.186.65.118-8262" } ]}#在dm-worker节点查看导出的SQL文件, shell> ls -lh /home/tidb/dm/deploy/dm-worker-8262/dumped_data.mysql_sync_to_dbletotal 384M-rw-r--r-- 1 tidb tidb 187 Aug 5 14:04 metadata-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest1.0000000000000.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest1.0000000010000.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest1.0000000020000.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest1.0000000030000.sql-rw-r--r-- 1 tidb tidb 369 Aug 5 14:04 sbtest.sbtest1-schema.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest2.0000000000000.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest2.0000000010000.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest2.0000000020000.sql-rw-r--r-- 1 tidb tidb 48M Aug 5 14:04 sbtest.sbtest2.0000000030000.sql-rw-r--r-- 1 tidb tidb 369 Aug 5 14:04 sbtest.sbtest2-schema.sql-rw-r--r-- 1 tidb tidb 152 Aug 5 14:04 sbtest-schema-create.sql#创立库SQL文件shell> head /home/tidb/dm/deploy/dm-worker-8262/dumped_data.mysql_sync_to_dble/sbtest-schema-create.sql/*!40101 SET NAMES binary*/;CREATE DATABASE `sbtest` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin */ /*!80016 DEFAULT ENCRYPTION='N' */;#创立表SQL文件shell> head /home/tidb/dm/deploy/dm-worker-8262/dumped_data.mysql_sync_to_dble/sbtest.sbtest1-schema.sql/*!40101 SET NAMES binary*/;CREATE TABLE `sbtest1` ( `id` int(11) NOT NULL AUTO_INCREMENT, `k` int(11) NOT NULL DEFAULT '0', `c` char(120) COLLATE utf8mb4_bin NOT NULL DEFAULT '', `pad` char(60) COLLATE utf8mb4_bin NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`)) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;#dump生成批量插入数据SQL文件shell> head /home/tidb/dm/deploy/dm-worker-8262/dumped_data.mysql_sync_to_dble/sbtest.sbtest1.0000000000000.sql/*!40101 SET NAMES binary*/;INSERT INTO `sbtest1` (`id`,`k`,`c`,`pad`) VALUES(1,498670,'31451373586-15688153734-79729593694-96509299839-83724898275-86711833539-78981337422-35049690573-51724173961-87474696253','98996621624-36689827414-04092488557-09587706818-65008859162'),(2,497778,'21472970079-70972780322-70018558993-71769650003-09270326047-32417012031-10768856803-14235120402-93989080412-18690312264','04776826683-45880822084-77922711547-29057964468-76514263618'),(3,498956,'49376827441-24903985029-56844662308-79012577859-40518387141-60588419212-24399130405-42612257832-29494881732-71506024440','26843035807-96849339132-53943793991-69741192222-48634174017'),(4,518727,'85762858421-36258200885-10758669419-44272723583-12529521893-95630803635-53907705724-07005352902-43001596772-53048338959','37979424284-37912826784-31868864947-42903702727-96097885121'),(5,502480,'24805466175-85245528617-94635882649-46305216925-28637832581-03224489581-68883711727-95491561683-91969681472-12022277774','19288959552-55556468076-14192290426-55457672510-18043372364'),(6,500774,'52892836230-54177743992-01821871718-48412537487-30066596248-87215430797-00375777469-64498831720-58542556455-90784765418','59487960480-08453890592-99628797439-16757639138-29377916560'),(7,501466,'85820931248-14475640036-11980694501-86588543167-31029306229-09626867980-90685354565-02350460358-25863585366-53793794448','26081374730-86321700986-51212137094-30635959762-03880194434'),(8,497838,'81578049255-33453976301-67096870761-27658738403-30546242249-53677469854-26594573136-34292002037-52736825353-99165193170','64289062455-51067794311-09919261228-11533354367-07401173317'),(3)查看同步工作状态shell> tiup dmctl:v2.0.7 --master-addr=10.186.65.83:8261 query-status ./mysql_sync_to_dble.yamlStarting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.7/dmctl/dmctl --master-addr=10.186.65.83:8261 query-status ./mysql_sync_to_dble.yaml{ "result": true, "msg": "", "sources": [ { "result": true, "msg": "", "sourceStatus": { "source": "mysql_source_14_4490", "worker": "dm-10.186.65.118-8262", "result": null, "relayStatus": null }, "subTaskStatus": [ { "name": "mysql_sync_to_dble", "stage": "Running", "unit": "Sync", "result": null, "unresolvedDDLLockID": "", "sync": { "totalEvents": "425355", "totalTps": "1203", "recentTps": "2386", "masterBinlog": "(mysql-bin.000027, 76114275)", "masterBinlogGtid": "c7827165-bf89-11ec-92e6-02000aba410e:1-5091258", "syncerBinlog": "(mysql-bin.000026, 229462770)", "syncerBinlogGtid": "", "blockingDDLs": [ ], "unresolvedGroups": [ ], "synced": false, "binlogType": "remote", "secondsBehindMaster": "38" } } ] } ]}七、验证数据#暂停先前执行的sysbench程序,确保dm同步与源端无提早时,对表作count(*)#在源端MySQL,对表进行count(*)mysql> select count(*) from sbtest.sbtest1;+----------+| count(*) |+----------+| 1116471 |+----------+1 row in set (0.68 sec)mysql> select count(*) from sbtest.sbtest2;+----------+| count(*) |+----------+| 1117020 |+----------+1 row in set (0.54 sec)#通过dble对表进行count(*)dble> select count(*) from sbtest.sbtest1;+----------+| count(*) |+----------+| 1116471 |+----------+1 row in set (1.04 sec)dble> select count(*) from sbtest.sbtest2;+----------+| count(*) |+----------+| 1117020 |+----------+1 row in set (1.58 sec)#在源端数据进行更新操作mysql> update sbtest.sbtest1 set c=uuid() where id=20;Query OK, 1 row affected (0.01 sec)Rows matched: 1 Changed: 1 Warnings: 0mysql> select * from sbtest.sbtest1 where id=20;+----+--------+--------------------------------------+-------------------------------------------------------------+| id | k | c | pad |+----+--------+--------------------------------------+-------------------------------------------------------------+| 20 | 501448 | 24649f0f-14d1-11ed-b4f2-02000aba410e | 91052688950-96415657187-00012408429-12357288330-41295735957 |+----+--------+--------------------------------------+-------------------------------------------------------------+1 row in set (0.00 sec)#通过dble查问数据dble> select * from sbtest.sbtest1 where id=20;+----+--------+--------------------------------------+-------------------------------------------------------------+| id | k | c | pad |+----+--------+--------------------------------------+-------------------------------------------------------------+| 20 | 501448 | 24649f0f-14d1-11ed-b4f2-02000aba410e | 91052688950-96415657187-00012408429-12357288330-41295735957 |+----+--------+--------------------------------------+-------------------------------------------------------------+1 row in set (0.01 sec)八、注意事项(1)TiDB DM版本要求:TiDB DM v2.0.7(2)在进行数据量导入时须要敞开DBLE慢查问日志#因为数据同步在全量导入阶段会产生大量慢查问,须要在执行同步工作之前先敞开DBLE的慢查问日志shell> mysql --prompt='dble>' -uroot -p -h 10.186.65.4 -P 9066#查看慢查问日志状态,1示意开启,0示意敞开dble> show @@slow_query_log; +------------------+| @@slow_query_log |+------------------+| 1 |+------------------+1 row in set (0.00 sec) #全量数据导入前,敞开dble慢查问日志dble> disable @@slow_query_log;Query OK, 1 row affected (0.01 sec)disable slow_query_log success #待全量数据导入实现后,开启dble慢查问日志dble> enable @@slow_query_log;Query OK, 1 row affected (0.01 sec)enable slow_query_log success(3)查看源端 MySQL 实例上是否有运行相似 pt-kill 的程序#如果在源端MySQL实例上运行有pt-kill或相似的程序,DM同步工作在执行全量数据dump阶段,导出线程可能会被程序kill掉,导致全备失败。dm-worker.log日志如下:[ERROR] [subtask.go:311] ["unit process error"] [subtask=oms-sync-receiver] [unit=Dump] ["error information"="{\"ErrCode\":32001,\"ErrClass\":\"dump-unit\",\"ErrScope\":\"internal\",\"ErrLevel\":\"high\",\"Message\":\"mydumper/dumpling runs with error, with output (may empty): \",\"RawCause\":\"invalid connection\"}"]解决办法:先将pt-kill实用工具进行,而后再从新运行TiDM同步工作,待全量数据dump实现后在开启pt-kill。(4)迁徙时如需对指标表表构造进行批改,如源表有建设分区而指标表通过分片后不须要建设分区、批改表字符集(倡议应用utf8mb4)等,可先将表构造从源库导出,批改后导入 DBLE 或导入 DBLE 后在批改表构造,最初再开启 DM 同步工作。(5)源、指标表字段程序必须要统一,否则可能会导致数据不统一,两字段同为 varchar 并且长度不相等时,同步工作并不会报错,但写入指标表的值可能会被截断,如下图: ...

August 16, 2022 · 6 min · jiezi

关于数据迁移:DataWorks搬站方案Azkaban作业迁移至DataWorks

简介:DataWorks迁徙助手提供工作搬站性能,反对将开源调度引擎Oozie、Azkaban、Airflow的工作疾速迁徙至DataWorks。本文次要介绍如何将开源Azkaban工作流调度引擎中的作业迁徙至DataWorks上。 DataWorks迁徙助手提供工作搬站性能,反对将开源调度引擎Oozie、Azkaban、Airflow的工作疾速迁徙至DataWorks。本文次要介绍如何将开源Azkaban工作流调度引擎中的作业迁徙至DataWorks上。 反对迁徙的Azkaban版本 反对全副版本的Azkaban迁徙。 整体迁徙流程 迁徙助手反对开源工作流调度引擎到DataWorks体系的大数据开发工作迁徙的根本流程如下图所示。 针对不同的开源调度引擎,DataWorks迁徙助手会出一个相干的工作导出计划。整体迁徙流程为:通过迁徙助手调度引擎作业导出能力,将开源调度引擎中的作业导出;再将作业导出包上传至迁徙助手中,通过工作类型映射,将映射后的作业导入至DataWorks中。作业导入时可设置将工作转换为MaxCompute类型作业、EMR类型作业、CDH类型作业等。 Azkaban作业导出 Azkaban工具自身具备导出工作流的能力,有本人的Web控制台,如下图所示: 操作步骤:1.进入Project页面2.点击Flows,会列出Project上面所有的工作流(Flow)3.点击Download即可下载Project的导出文件Azkaban导出包格局原生Azkaban即可,导出包Zip文件外部为Azakaban的某个Project的所有工作(Job)和关系信息。 Azkaban作业导入 拿到了开源调度引擎的导出工作包后,用户能够拿这个zip包到迁徙助手的迁徙助手->工作上云->调度引擎作业导入页面上传导入包进行包剖析。 导入包剖析胜利后点击确认,进入导入工作设置页面,页面中会展现剖析进去的调度工作信息。 开源调度导入设置 用户能够点击高级设置,设置Azkaban工作与DataWorks工作的转换关系。不同的开源调度引擎,在高级设置外面的设置界面基本一致,如下图: 高级设置项介绍: • sparkt-submit转换为:导入过程会去剖析用户的工作是不是sparkt-submit工作,如果是的话,会将spark-submit工作转换为对应的DataWorks工作类型,比如说:ODPS_SPARK/EMR_SPARK/CDH_SPARK等 • 命令行 SQL工作转换为:开源引擎很多工作类型是命令行运行SQL,比如说hive -e, beeline -e, impala-shell等等,迁徙助手会依据用户抉择的指标类型做对应的转换。比方能够转换成ODPS_SQL, EMR_HIVE, EMR_IMPALA, EMR_PRESTO, CDH_HIVE, CDH_PRESTO, CDH_IMPALA等等 • 指标计算引擎类型:这个次要是影响的是Sqoop同步的目标端的数据写入配置。咱们会默认将sqoop命令转换为数据集成工作。计算引擎类型决定了数据集成工作的目标端数据源应用哪个计算引擎的project。 • Shell类型转换为:SHELL类型的节点在Dataworks依据不同计算引擎会有很多种,比方EMR_SHELL,CDH_SHELL,DataWorks本人的Shell节点等等。 • 未知工作转换为:对目前迁徙助手无奈解决的工作,咱们默认用一个工作类型去对应,用户能够抉择SHELL或者虚节点VIRTUAL • SQL节点转换为:DataWorks上的SQL节点类型也因为绑定的计算引擎的不同也有很多种。比方 EMR_HIVE,EMR_IMPALA、EMR_PRESTO,CDH_HIVE,CDH_IMPALA,CDH_PRESTO,ODPS_SQL,EMR_SPARK_SQL,CDH_SPARK_SQL等,用户能够抉择转换为哪种工作类型。 留神:这些导入映射的转换值是动态变化的,和以后我的项目空间绑定的计算引擎无关,转换关系如下。 导入至DataWorks + MaxCompute 导入至DataWorks + EMR 导入至DataWorks + CDH 执行导入 设置完映射关系后,点击开始导入即可。导入实现后,请进入数据开发中查看导入后果。 数据迁徙 大数据集群上的数据迁徙,可参考:DataWorks数据集成或MMA。原文链接本文为阿里云原创内容,未经容许不得转载。

April 25, 2021 · 1 min · jiezi