共计 11541 个字符,预计需要花费 29 分钟才能阅读完成。
1、问题形容
前两天在群里看到共事反馈一个空格问题,大抵景象如下:
mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.25 |
+-----------+
1 row in set (0.00 sec)
mysql> create table t1(
-> c1 int,
-> c2 varchar(4) check(c2<>'') #单引号之间无空格
-> )engine=innodb;
Query OK, 0 rows affected (0.21 sec)
mysql> insert into t1 select 1,' '; #c2 字段插入两个空格
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.
check 定义 c2<>”,往 c2 字段插入空格,提醒违反 check 束缚。
为什么 insert 语句中的 ’ ‘(单引号之间有一个或多个空格) 会被判断为 ”(单引号之间无空格),导致插入失败?
2、波及常识
2.1、Stored and Retrieved
https://dev.mysql.com/doc/refman/8.0/en/char.html
When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.
VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.
CHAR(N):当插入的字符数小于 N,它会在字符串的左边补充空格,直到总字符数达到 N 再进行存储;当查问返回数据时默认会将字符串尾部的空格去掉,除非 SQL_MODE 设置 PAD_CHAR_TO_FULL_LENGTH(手册显示 8.0.13 deprecated,8.0.25 还能应用)。
VARCHAR(N):当插入的字符数小于 N,它不会在字符串的左边补充空格,insert 内容一成不变的进行存储;如果本来字符串左边有空格,在存储和查问返回时都会保留空格。
2.2、Collation Pad Attribute
https://dev.mysql.com/doc/refman/8.0/en/char.html
Values in CHAR, VARCHAR, and TEXT columns are sorted and compared according to the character set collation assigned to the column.
MySQL collations have a pad attribute of PAD SPACE, other than Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.
对于 CHAR、VARCHAR、TEXT 字段,排序和比拟运算依赖字段上的 Collation,Collation 的 Pad 属性管制字符串尾部空格解决形式。
能够通过 INFORMATION_SCHEMA.COLLATIONS 表,查看 Collation 所应用的 Pad 属性:
mysql> select collation_name,pad_attribute from information_schema.collations;
+----------------------------+---------------+
| collation_name | pad_attribute |
+----------------------------+---------------+
| armscii8_general_ci | PAD SPACE |
...
| utf8mb4_0900_bin | NO PAD |
+----------------------------+---------------+
272 rows in set (0.01 sec)
2.3、Trailing Space Handling in Comparisons
https://dev.mysql.com/doc/refman/8.0/en/charset-binary-collations.html#charset-binary-collations-trailing-space-comparisons
For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:
• For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.
• NO PAD collations treat trailing spaces as significant in comparisons, like any other character.
“Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant, regardless of collation.
PAD SPACE:在排序和比拟运算中,疏忽字符串尾部空格。
NO PAD:在排序和比拟运算中,字符串尾部空格当成一般字符,不能疏忽。
3、问题解决
以下操作基于 MySQL 8.0.25 社区版
3.1、查看字段应用的 Collation
mysql> show full fields in t1;
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| c1 | int | NULL | YES | | NULL | | select,insert,update,references | |
| c2 | varchar(4) | utf8mb4_unicode_ci | YES | | NULL | | select,insert,update,references | |
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)
c2 列的 Collation 是 utf8mb4_unicode_ci。
3.2、查看 Collation 的 Pad 属性
mysql> select COLLATION_NAME,PAD_ATTRIBUTE from INFORMATION_SCHEMA.COLLATIONS where COLLATION_NAME in('utf8mb4_unicode_ci','utf8mb4_0900_ai_ci');
+--------------------+---------------+
| COLLATION_NAME | PAD_ATTRIBUTE |
+--------------------+---------------+
| utf8mb4_0900_ai_ci | NO PAD |
| utf8mb4_unicode_ci | PAD SPACE |
+--------------------+---------------+
1 row in set (0.00 sec)
utf8mb4_unicode_ci 的 Pad 属性是 PAD SPACE,由 2.3 可知 c2 列在排序和比拟运算中,疏忽字符串尾部空格。
因而 check 比拟时,会将插入的 ’ ‘ 中的空格疏忽,显然疏忽空格后和 check 束缚存在抵触,插入失败。
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.00 sec)
mysql> select '' ='';
+--------+
| ''='' |
+--------+
| 1 |
+--------+
1 row in set (0.00 sec)
3.3、如何让 check 束缚按惯例逻辑失效
这里的惯例是指空格就是空格,不应该把空格疏忽。只需将 c2 字段批改为 NO PAD 的 Collation 后,就能将空格失常插入:
mysql> insert into t1 select 1,' '; #c2 字段插入两个空格
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.
mysql> alter table t1 modify c2 varchar(4) collate utf8mb4_0900_ai_ci; #批改为 NO PAD 的 Collation
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> insert into t1 select 1,' '; #c2 字段插入两个空格
Query OK, 1 row affected (0.12 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into t1 select 1,''; #'' 之间无空格
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.
mysql> select c1,c2,hex(c2) from t1;
+------+------+---------+
| c1 | c2 | hex(c2) |
+------+------+---------+
| 1 | | 2020 |
+------+------+---------+
1 row in set (0.01 sec)
4、扩大
4.1、如果 c2 列是 CHAR 类型,和后面的问题体现一样吗
一样。CHAR、VARCHAR、TEXT 在做排序和比拟运算时,都是根据列的 Collation 的 Pad 属性解决字符串尾部的空格。此时拿来做比拟运算的字符串是 insert 中的内容。
4.2、WHERE 条件中表现形式是怎么的
创立一张新表并插入数据
mysql> create table t3(
-> c1 int,
-> c2 char(4) collate utf8mb4_unicode_ci,
-> c3 char(4) collate utf8mb4_0900_ai_ci,
-> c4 varchar(4) collate utf8mb4_unicode_ci,
-> c5 varchar(4) collate utf8mb4_0900_ai_ci
-> )engine=innodb;
Query OK, 0 rows affected (0.29 sec)
mysql> insert into t3 select 1,'a','a','a','a';
Query OK, 1 row affected (0.09 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into t3 select 2,'a','a','a','a'; #各列蕴含 1 个空格
Query OK, 1 row affected (0.20 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into t3 select 3,'a','a','a','a'; #前两列 3 个空格,后两列 2 个空格
Query OK, 1 row affected (0.17 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> insert into t3 select 4,'a','a','a','a'; #前两列 2 个空格,后两列 3 个空格
Query OK, 1 row affected (0.14 sec)
Records: 1 Duplicates: 0 Warnings: 0
察看 WHERE 条件返回后果,CHAR 类型的返回受 PAD_CHAR_TO_FULL_LENGTH 影响 (参考 2.1)
mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+---------+---------+----------+----------+
| 1 | a | a | a | a | 61 | 61 | 61 | 61 |
| 2 | a | a | a | a | 61 | 61 | 6120 | 6120 |
| 3 | a | a | a | a | 61 | 61 | 612020 | 612020 |
| 4 | a | a | a | a | 61 | 61 | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.00 sec)
c2 char-> 返回数据去掉字符串尾部的空格
c2 utf8mb4_unicode_ci->PAD SPACE-> 排序和比拟运算,疏忽字符串尾部空格
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+---------+---------+----------+----------+
| 1 | a | a | a | a | 61 | 61 | 61 | 61 |
| 2 | a | a | a | a | 61 | 61 | 6120 | 6120 |
| 3 | a | a | a | a | 61 | 61 | 612020 | 612020 |
| 4 | a | a | a | a | 61 | 61 | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.01 sec)
c3 char-> 返回数据去掉字符串尾部的空格
c3 utf8mb4_0900_ai_ci->NO PAD-> 排序和比拟运算,字符串尾部空格当成一般字符
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+---------+---------+----------+----------+
| 1 | a | a | a | a | 61 | 61 | 61 | 61 |
| 2 | a | a | a | a | 61 | 61 | 6120 | 6120 |
| 3 | a | a | a | a | 61 | 61 | 612020 | 612020 |
| 4 | a | a | a | a | 61 | 61 | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.00 sec)
c4 varchar-> 返回数据保留插入时的空格
c4 utf8mb4_unicode_ci->PAD SPACE-> 排序和比拟运算,疏忽字符串尾部空格
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';
+------+------+------+------+------+---------+---------+---------+---------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+---------+---------+---------+---------+
| 1 | a | a | a | a | 61 | 61 | 61 | 61 |
+------+------+------+------+------+---------+---------+---------+---------+
1 row in set (0.00 sec)
c5 varchar-> 返回数据保留插入时的空格
c5 utf8mb4_0900_ai_ci->NO PAD-> 排序和比拟运算,字符串尾部空格当成一般字符
mysql> set sql_mode='PAD_CHAR_TO_FULL_LENGTH';
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';
+------+------+------+------+------+----------+----------+----------+----------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+----------+----------+----------+----------+
| 1 | a | a | a | a | 61202020 | 61202020 | 61 | 61 |
| 2 | a | a | a | a | 61202020 | 61202020 | 6120 | 6120 |
| 3 | a | a | a | a | 61202020 | 61202020 | 612020 | 612020 |
| 4 | a | a | a | a | 61202020 | 61202020 | 61202020 | 61202020 |
+------+------+------+------+------+----------+----------+----------+----------+
4 rows in set (0.00 sec)
c2 char->PAD_CHAR_TO_FULL_LENGTH-> 返回数据字符串左边补充空格
c2 utf8mb4_unicode_ci->PAD SPACE-> 排序和比拟运算,疏忽字符串尾部空格
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';
Empty set (0.00 sec)
c3 char->PAD_CHAR_TO_FULL_LENGTH-> 返回数据字符串左边补充空格
c3 utf8mb4_0900_ai_ci->NO PAD-> 排序和比拟运算,字符串尾部空格当成一般字符
1~4 行 c3 列返回值都蕴含空格,且 c3 列的 Collation 是 NO PAD,字符串尾部空格不能疏忽,where 过滤找不到记录
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';
+------+------+------+------+------+----------+----------+----------+----------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+----------+----------+----------+----------+
| 1 | a | a | a | a | 61202020 | 61202020 | 61 | 61 |
| 2 | a | a | a | a | 61202020 | 61202020 | 6120 | 6120 |
| 3 | a | a | a | a | 61202020 | 61202020 | 612020 | 612020 |
| 4 | a | a | a | a | 61202020 | 61202020 | 61202020 | 61202020 |
+------+------+------+------+------+----------+----------+----------+----------+
4 rows in set (0.00 sec)
c4 varchar-> 返回数据保留插入时的空格
c4 utf8mb4_unicode_ci->PAD SPACE-> 排序和比拟运算,疏忽字符串尾部空格
mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';
+------+------+------+------+------+----------+----------+---------+---------+
| c1 | c2 | c3 | c4 | c5 | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+----------+----------+---------+---------+
| 1 | a | a | a | a | 61202020 | 61202020 | 61 | 61 |
+------+------+------+------+------+----------+----------+---------+---------+
1 row in set (0.00 sec)
c5 varchar-> 返回数据保留插入时的空格
c5 utf8mb4_0900_ai_ci->NO PAD-> 排序和比拟运算,字符串尾部空格当成一般字符
此时拿来做比拟运算的字符串是 Retrieved 的内容,CHAR 和 VARCHAR 返回数据时对字符串尾部的空格解决形式不同,并且 PAD_CHAR_TO_FULL_LENGTH 只影响 CHAR 类型。
4.3、对惟一索引的影响
https://dev.mysql.com/doc/refman/8.0/en/char.html
For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains ‘a’, an attempt to store ‘a ‘ causes a duplicate-key error.
如果存在惟一索引 (单列、字符类型),插入的数据仅在尾部空格个数不同,有可能会报 duplicate-key 谬误:
mysql> select c1,c4,c5,hex(c4),hex(c5) from t3;
+------+------+------+----------+----------+
| c1 | c4 | c5 | hex(c4) | hex(c5) |
+------+------+------+----------+----------+
| 1 | a | a | 61 | 61 |
| 2 | a | a | 6120 | 6120 |
| 3 | a | a | 612020 | 612020 |
| 4 | a | a | 61202020 | 61202020 |
+------+------+------+----------+----------+
4 rows in set (0.00 sec)
mysql> alter table t3 add unique(c4);
ERROR 1062 (23000): Duplicate entry 'a' for key 't3.c4'
mysql> alter table t3 add unique(c5);
Query OK, 0 rows affected (0.44 sec)
Records: 0 Duplicates: 0 Warnings: 0
能够看到 c4 列创立惟一索引失败,c5 列创立惟一索引胜利。
c4 utf8mb4_unicode_ci->PAD SPACE-> 排序和比拟运算,疏忽字符串尾部空格,4 行数据反复。
c5 utf8mb4_0900_ai_ci->NO PAD-> 排序和比拟运算,字符串尾部空格当成一般字符,4 行数据不同。
5、总结
Stored
– | CHAR(N) | VARCHAR(N) |
---|---|---|
Stored | 字符有余N左边补空格 | 保留插入时的空格,不会在左边额定补充空格 |
Retrieved
SQL_MODE | CHAR(N) | VARCHAR(N) |
---|---|---|
Default Value | 去掉字符串尾部的空格 | 保留插入时的空格 |
PAD_CHAR_TO_FULL_LENGTH | 返回残缺字符串,有余N左边补空格 | 保留插入时的空格 |
Comparison(不包含 like)
Pad Attribute | CHAR(N)/VARCHAR(N) |
---|---|
PAD SPACE | 疏忽字符串尾部空格 |
NO PAD | 字符串尾部空格当成一般字符,不能疏忽 |
Enjoy GreatSQL :)
文章举荐:
技术分享 | MGR 最佳实际 (MGR Best Practice)
https://mp.weixin.qq.com/s/66…
技术分享 | 万里数据库 MGR Bug 修复之路
https://mp.weixin.qq.com/s/Ia…
Macos 零碎编译 percona 及局部函数在 Macos 零碎上运算差别
https://mp.weixin.qq.com/s/jA…
技术分享 | 利用 systemd 治理 MySQL 单机多实例
https://mp.weixin.qq.com/s/iJ…
产品 | GreatSQL,打造更好的 MGR 生态
https://mp.weixin.qq.com/s/By…
产品 | GreatSQL MGR 优化参考
https://mp.weixin.qq.com/s/5m…
对于 GreatSQL
GreatSQL 是由万里数据库保护的 MySQL 分支,专一于晋升 MGR 可靠性及性能,反对 InnoDB 并行查问个性,是实用于金融级利用的 MySQL 分支版本。
Gitee:
https://gitee.com/GreatSQL/Gr…
GitHub:
https://github.com/GreatSQL/G…
微信 &QQ 群:
可扫码增加 GreatSQL 社区助手微信好友,发送验证信息“加群”退出 GreatSQL/MGR 交换微信群,亦可间接扫码退出 GreatSQL/MGR 交换 QQ 群。
本文由博客一文多发平台 OpenWrite 公布!