关于数据校验:技术分享-使用-syncdiffinspector-对两个-MySQL-进行数据校验

29次阅读

共计 12339 个字符,预计需要花费 31 分钟才能阅读完成。

作者:沈光宇

爱可生南区 DBA 团队成员,次要负责 MySQL 故障解决和性能优化。对技术执着,为客户负责。

本文起源:原创投稿

* 爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。


一、sync-diff-inspector 简介

sync-diff-inspector 是由 PingCAP 开源的数据校验工具,用于校验 MySQL/TiDB 中两份数据是否统一。

次要性能如下:

  • 比照表构造和数据
  • 如果数据不统一,则生成用于修复数据的 SQL 语句
  • 反对不同库名或表名的数据校验
  • 反对分库分表场景下的数据校验
  • 反对 TiDB 主从集群的数据校验
  • 反对从 TiDB DM 拉取配置的数据校验

sync-diff-inspector 的应用限度

  • 对于 MySQL 和 TiDB 之间的数据同步不反对在线校验,须要保障上下游校验的表中没有数据写入,或者保障某个范畴内的数据不再变更,通过配置 range 来校验这个范畴内的数据。
  • FLOAT、DOUBLE 等浮点数类型在 TiDB 和 MySQL 中的实现形式不同,在计算 checksum 时会别离取 6 位和 15 位有效数字。如果不应用该个性,须要设置 ignore-columns 疏忽这些列的查看。
  • 反对对不蕴含主键或者惟一索引的表进行校验,然而如果数据不统一,生成的用于修复的 SQL 可能无奈正确修复数据。

本文将介绍应用 sync-diff-inspector 工具对两个 MySQL 实例中的数据进行校验,两个 MySQL 实例之间应用 DTS 工具来同步数据。

二、sync-diff-inspector 工具下载安装

#sync-diff-inspector 已集成在 TiDB 工具包中,间接下载 TiDB 工具包即可
shell> wget https://download.pingcap.org/tidb-community-toolkit-v6.4.0-linux-amd64.tar.gz
shell> tar zxvf tidb-community-toolkit-v6.4.0-linux-amd64.tar.gz
shell> ls -lh tidb-community-toolkit-v6.4.0-linux-amd64 | grep sync_diff_inspector
-rwxr-xr-x 1 tidb tidb  98M Nov 17 11:41 sync_diff_inspector
shell> ./sync_diff_inspector  -V
App Name: sync_diff_inspector v2.0
Release Version: v6.4.0
Git Commit Hash: f7e65073b35538def61ae094cd4a8e57e705344b
Git Branch: heads/refs/tags/v6.4.0
UTC Build Time: 2022-11-04 07:21:08
Go Version: go1.19.2

三、sync-diff-inspector 工具应用示例

1. 配置文件通用局部

shell> cat config.toml
check-thread-count = 4             # 检查数据的线程数量
export-fix-sql = true              # 如果开启,若表数据存在不统一,则输入用于修复的 SQL 语句
check-struct-only = false          # 只比照表构造而不比照数据
[data-sources]                    
[data-sources.mysql1]              # 上游 MySQL 数据库配置(源端)host = "10.186.65.57"          
    port = 3306
    user = "sgy"
    password = "admin"
    route-rules = ["rule1"]        # 映射匹配规定, 通过配置相应的规定能够对单个、多个 schema 或 table 进行校验
                                   # 如有多个 rule 时,可配置成 ["rule1", "rule2"]
[data-sources.mysql2]              # 上游 MySQL 数据库配置(指标端)host = "10.186.65.89"
    port = 3309
    user = "sgy"
    password = "admin"

2. 基于 schema 的数据校验

  • 对单个 schema 进行数据校验
# 映射匹配规定局部, 须要将此局部放到置配置文件通用局部的前面
[routes]                               # 映射关系, 如上下游 schema 不同名可在此配置
[routes.rule1]
    schema-pattern = "sbtest"          # 匹配上游数据库的的库名
    target-schema = "sbtest"           # 匹配上游数据库的库名

[task]
    output-dir = "./output"
    source-instances = ["mysql1"]   # 上游数据库,内容是 data-sources 申明的惟一标识 id,分库分表场景下反对多个上游数据库,如:["mysql10", "mysql20"]
    target-instance = "mysql2"      # 上游数据库,内容是 data-sources 申明的惟一标识 id
    target-check-tables = ["sbtest.*"]    # 须要比对的上游数据库的表
    
 # 进行数据校验
shell> ./sync_diff_inspector  --config=./config.toml
A total of 8 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest`.`sbtest4`` ... equivalent     #表构造统一
Comparing the table structure of ``sbtest`.`sbtest5`` ... equivalent
.......................................................
Comparing the table data of ``sbtest`.`sbtest2`` ... equivalent          #表中数据统一
Comparing the table data of ``sbtest`.`sbtest8`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 8 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
  • 对多个 schema 进行数据校验
# 因为对多个 schema 进行数据校验,routes 蕴含了 rule1、rule2, 配置文件通用局部须要做以下批改
[data-sources.mysql1] 
    route-rules = ["rule1","rule2"]
    
#映射匹配规定局部, 须要将此局部放到配置文件通用局部的前面
[routes]
[routes.rule1]
    schema-pattern = "sbtest*"       #应用正则匹配 sbtest 结尾的 schema,如 sbtest,sbtest1,sbtest2,sbtest3
    target-schema = "sbtest*"

[routes.rule2]
    schema-pattern = "sgy"           #匹配 schema:sgy
    target-schema = "sgy"

[task]
    output-dir = "./output"
    source-instances = ["mysql1"]     # 上游数据库,内容是 data-sources 申明的惟一标识 id,分库分表场景下反对多个上游数据库,如:["mysql10", "mysql20"]
    target-instance = "mysql2"        # 上游数据库,内容是 data-sources 申明的惟一标识 id
    target-check-tables = ["sbtest*.*","sgy.*"]   #对源、指标实例中的 sgy 及以 sbtest 结尾的 schema 所有表进行校验

#进行数据校验
shell> ./sync_diff_inspector --config=./config.toml
A total of 24 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sgy`.`sbtest2`` ... equivalent
Comparing the table data of ``sgy`.`sbtest2`` ... equivalent
.......................................................
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest3`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest2`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest1`.`sbtest2`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 24 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'

3. 基于 table 的数据校验

  • 对单个 table 进行数据校验
# 映射匹配规定局部, 须要将此局部搁置到配置文件通用局部的前面
[routes]
[routes.rule1]
    schema-pattern = "sbtest"
    target-schema = "sbtest"

[task]
    output-dir = "./output"
    source-instances = ["mysql1"]      # 上游数据库,内容是 data-sources 申明的惟一标识 id,分库分表场景下反对多个上游数据库,如:["mysql10", "mysql20"] 
    target-instance = "mysql2"         # 上游数据库,内容是 data-sources 申明的惟一标识 id
    target-check-tables = ["sbtest.sbtest1"]      #只校验表:sbtest.sbtest1
 
#进行数据校验 
shell> ./sync_diff_inspector --config=./config.toml
A total of 1 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 1 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
  • 对多个 table 进行数据校验
# 对 schema 进行数据校验也是对多个 table 进行数据校验的一种,这里以指定多个具体表名为例
#因为对多个 table 进行数据校验,routes 蕴含了 rule1、rule2, 配置文件通用局部须要做以下批改
[data-sources.mysql1] 
    route-rules = ["rule1","rule2"]
    
#映射匹配规定局部, 须要将此局部搁置到配置文件通用局部的前面
[routes]
[routes.rule1]
    schema-pattern = "sbtest*"
    target-schema = "sbtest*"

[routes.rule2]
    schema-pattern = "sgy"
    target-schema = "sgy"

[task]
    output-dir = "./output"
    source-instances = ["mysql1"]       # 上游数据库,内容是 data-sources 申明的惟一标识 id,分库分表场景下反对多个上游数据库,如:["mysql10", "mysql20"]
    target-instance = "mysql2"          # 上游数据库,内容是 data-sources 申明的惟一标识 id
    #只校验 sbtest.sbtest8,sgy.sbtest4,sbtest1.sbtest1,sbtest2.sbtest2 这四个表
    target-check-tables = ["sbtest.sbtest8","sgy.sbtest4","sbtest1.sbtest1","sbtest2.sbtest2"]
    
#进行数据校验 
shell> ./sync_diff_inspector --config=./config.toml
A total of 4 tables need to be compared

Progress [>------------------------------------------------------------] 0% 0/0
Comparing the table structure of ``sbtest2`.`sbtest2`` ... equivalent
Comparing the table data of ``sbtest2`.`sbtest2`` ... equivalent
Comparing the table structure of ``sgy`.`sbtest4`` ... equivalent
Comparing the table structure of ``sbtest1`.`sbtest1`` ... equivalent
Comparing the table data of ``sgy`.`sbtest4`` ... equivalent
Comparing the table data of ``sbtest1`.`sbtest1`` ... equivalent
Comparing the table structure of ``sbtest`.`sbtest8`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest8`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 4 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'
  • 对单表进行范畴校验

#映射匹配规定局部, 须要将此局部搁置到配置文件通用局部的前面
[routes]
[routes.rule1]
    schema-pattern = "sbtest"
    target-schema = "sbtest"

[task]
    output-dir = "./output"
    source-instances = ["mysql1"]      # 上游数据库,内容是 data-sources 申明的惟一标识 id,分库分表场景下反对多个上游数据库,如:["mysql10", "mysql20"]
    target-instance = "mysql2"         # 上游数据库,内容是 data-sources 申明的惟一标识 id
    target-check-tables = ["sbtest.sbtest1"]      # 指定校验指标实例上的 sbtest.sbtest1 表
    target-configs = ["config1"]                  # 对局部表的额定配置

[table-configs.config1] 
target-tables = ["sbtest.sbtest1"]                # 指定校验指标实例上的 sbtest.sbtest1 表
range = "id > 10 AND id < 100"                    # 指定校验指标表的具体范畴,相当于 SQL 中的 where 条件
#数据校验见数据修复性能局部 

4. 数据修复性能

# 为了演示数据修复,在指标端表中删除一些数据
shell> mysql -usgy -padmin -h 10.186.65.89 -P 3309 -e "delete from sbtest.sbtest1 where id in (11,22,33,44,55,66,77,88,99);"

# 应用对单表进行范畴校验配置,对 sbtest.sbtest1 表进行范畴校验
shell> ./sync_diff_inspector  --config=./config.toml
A total of 1 tables need to be compared

Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent     #sbtest.sbtest1 表构造统一
Comparing the table data of ``sbtest`.`sbtest1`` ... failure             #sbtest.sbtest1 表数据有差别
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
The data of `sbtest`.`sbtest1` is not equal

The rest of tables are all equal.
The patch file has been generated in 
        'output/fix-on-mysql2/'                 # 生成修复 SQL 输入到此目录
You can view the comparision details through './output/sync_diff.log'

# 查看输入修复 SQL
shell> cat output/fix-on-mysql2/sbtest\:sbtest1\:0\:0-0\:0.sql 
-- table: sbtest.sbtest1
-- range in sequence: Full
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (11,50148,'69183965773-14680923687-92934799461-07606242492-78811530738-23241332728-92911647895-70477201282-85254929997-06214236905','33737501839-63208420999-35708593012-95906952636-68691055996');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (22,41538,'14140185946-16271766410-68340573738-46226480462-08989140676-29936780681-56784925909-45742390296-67137862436-18242076592','25112986220-19824650341-42825248958-70186905082-33867163574');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (33,50105,'48402160130-78797253227-05588677001-93556313541-39295466587-91364622063-58862572731-27837539373-64526858273-89372384747','72073637794-12055602042-16862397531-87496431032-85451396141');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (44,49917,'98096713237-15265478716-72025332919-62964308854-01270604715-12000922788-50929365082-43513513022-28543412388-57790852446','33907865533-62267179125-36062850111-84091551774-69847376840');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (55,50259,'68826640374-18002055907-53999869701-72145793168-90893177888-85273641163-24331745145-62755454379-79511152711-99618812770','02724012569-91405199011-30257626349-21678066897-42535351703');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (66,41995,'58396552954-26907336026-99506693837-77815822050-42927030403-40927779227-58101279219-11438233008-00344004393-35806649113','02348992414-65327666387-20632806790-74456238429-90933031209');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (77,49757,'73705207329-00308504929-05904865650-29498186065-09990420614-84131302024-40320022420-77358683577-34731688411-70665402097','92567035674-84728177369-79087155038-84461952379-45481760225');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (88,50497,'77112982215-98755241853-36424062009-45217742824-83650985380-60232607362-47569976121-30091332050-97996374956-97911403909','88977046077-43519705750-51246090615-77629911610-94055348738');
REPLACE INTO `sbtest`.`sbtest1`(`id`,`k`,`c`,`pad`) VALUES (99,49977,'63255973711-62890656114-72914458941-22277906368-32619356110-31219579310-16762665782-69578495131-76043317830-28240408380','08589883275-03392784968-00244590156-39735355951-95769933801');

#将生成的修复 SQL 导入指标库
shell> mysql -usgy -padmin -h 10.186.65.89 -P 3309 < output/fix-on-mysql2/sbtest\:sbtest1\:0\:0-0\:0.sql 
mysql: [Warning] Using a password on the command line interface can be insecure.

#再次测验, 发现数据曾经统一
shell> ./sync_diff_inspector  --config=./config.toml
A total of 1 tables need to be compared

Comparing the table structure of ``sbtest`.`sbtest1`` ... equivalent
Comparing the table data of ``sbtest`.`sbtest1`` ... equivalent
_____________________________________________________________________________
Progress [============================================================>] 100% 0/0
A total of 1 table have been compared and all are equal.
You can view the comparision details through './output/sync_diff.log'

四、相干问题

  • 批改配置文件后须要手动删除 outputDir 目录
shell> vim config.toml
shell> ./sync_diff_inspector --config=./config.toml     #将 output-dir 删除即可解决
Fail to initialize config.
failed to init Task: config changes breaking the checkpoint, please use another outputDir and start over again!
  • 表倡议应用 utf8mb4 字符集,不反对 MySQL8.0 的 utf8mb3 字符集
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.30    |
+-----------+
1 row in set (0.00 sec)

mysql> alter table sgy.sbtest4 CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> show create table sgy.sbtest4 \G
*************************** 1. row ***************************
       Table: sbtest4
Create Table: CREATE TABLE `sbtest4` (
  `id` int NOT NULL AUTO_INCREMENT,
  `k` int NOT NULL DEFAULT '0',
  `c` char(120) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  `pad` char(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_4` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb3     
1 row in set (0.00 sec)   

#执行数据校验时报错
shell> ./sync_diff_inspector --config=./config.toml
There is something error when initialize diff, please check log info in output/sync_diff.log

#查看日志文件
shell> cat output/sync_diff.log  |grep utf8mb3
[2023/02/19 11:13:04.980 +08:00] [FATAL] [main.go:120] ["failed to initialize diff process"] [error="get table sgy.sbtest4's information error [parser:1115]Unknown character set: 'utf8mb3'\ngithub.com/pingcap/errors.

更具体的应用阐明,请参考 sync-diff-inspector 官网文档:https://docs.pingcap.com/zh/tidb/stable/sync-diff-inspector-o…

正文完
 0