1、问题形容

前两天在群里看到共事反馈一个空格问题,大抵景象如下:

mysql> select @@version;+-----------+| @@version |+-----------+| 8.0.25    |+-----------+1 row in set (0.00 sec)mysql> create table t1(    -> c1 int,    -> c2 varchar(4) check(c2<>'')  #单引号之间无空格    -> )engine=innodb;Query OK, 0 rows affected (0.21 sec)mysql> insert into t1 select 1,'  ';  #c2字段插入两个空格ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.

check定义c2<>'',往c2字段插入空格,提醒违反check束缚。

为什么insert语句中的' '(单引号之间有一个或多个空格)会被判断为''(单引号之间无空格),导致插入失败?

2、波及常识

2.1、Stored and Retrieved

https://dev.mysql.com/doc/refman/8.0/en/char.html

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.

CHAR(N):当插入的字符数小于N,它会在字符串的左边补充空格,直到总字符数达到N再进行存储;当查问返回数据时默认会将字符串尾部的空格去掉,除非SQL_MODE设置PAD_CHAR_TO_FULL_LENGTH(手册显示8.0.13 deprecated,8.0.25还能应用)。

VARCHAR(N):当插入的字符数小于N,它不会在字符串的左边补充空格,insert内容一成不变的进行存储;如果本来字符串左边有空格,在存储和查问返回时都会保留空格。

2.2、Collation Pad Attribute

https://dev.mysql.com/doc/refman/8.0/en/char.html

Values in CHAR, VARCHAR, and TEXT columns are sorted and compared according to the character set collation assigned to the column.

MySQL collations have a pad attribute of PAD SPACE, other than Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.

对于CHAR、VARCHAR、TEXT字段,排序和比拟运算依赖字段上的Collation,Collation的Pad属性管制字符串尾部空格解决形式。
能够通过INFORMATION_SCHEMA.COLLATIONS表,查看Collation所应用的Pad属性:

mysql> select collation_name,pad_attribute from information_schema.collations;+----------------------------+---------------+| collation_name             | pad_attribute |+----------------------------+---------------+| armscii8_general_ci        | PAD SPACE     |...| utf8mb4_0900_bin           | NO PAD        |+----------------------------+---------------+272 rows in set (0.01 sec)

2.3、Trailing Space Handling in Comparisons

https://dev.mysql.com/doc/refman/8.0/en/charset-binary-collations.html#charset-binary-collations-trailing-space-comparisons

For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:

• For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.

• NO PAD collations treat trailing spaces as significant in comparisons, like any other character.

"Comparison" in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant, regardless of collation.

PAD SPACE:在排序和比拟运算中,疏忽字符串尾部空格。
NO PAD:在排序和比拟运算中,字符串尾部空格当成一般字符,不能疏忽。

3、问题解决

以下操作基于MySQL 8.0.25 社区版

3.1、查看字段应用的Collation

mysql> show full fields in t1;+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+| Field | Type       | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+| c1    | int        | NULL               | YES  |     | NULL    |       | select,insert,update,references |         || c2    | varchar(4) | utf8mb4_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+2 rows in set (0.00 sec)

c2列的Collation是utf8mb4_unicode_ci。

3.2、查看Collation的Pad属性

mysql> select COLLATION_NAME,PAD_ATTRIBUTE from INFORMATION_SCHEMA.COLLATIONS where COLLATION_NAME in('utf8mb4_unicode_ci','utf8mb4_0900_ai_ci');+--------------------+---------------+| COLLATION_NAME     | PAD_ATTRIBUTE |+--------------------+---------------+| utf8mb4_0900_ai_ci | NO PAD        || utf8mb4_unicode_ci | PAD SPACE     |+--------------------+---------------+1 row in set (0.00 sec)

utf8mb4_unicode_ci的Pad属性是PAD SPACE,由2.3可知c2列在排序和比拟运算中,疏忽字符串尾部空格。

因而check比拟时,会将插入的' '中的空格疏忽,显然疏忽空格后和check束缚存在抵触,插入失败。

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;Query OK, 0 rows affected (0.00 sec)mysql> select '  ' = '';+--------+| ' '='' |+--------+|      1 |+--------+1 row in set (0.00 sec)

3.3、如何让check束缚按惯例逻辑失效

这里的惯例是指空格就是空格,不应该把空格疏忽。只需将c2字段批改为NO PAD的Collation后,就能将空格失常插入:

mysql> insert into t1 select 1,'  ';  #c2字段插入两个空格ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated. mysql> alter table t1 modify c2 varchar(4) collate utf8mb4_0900_ai_ci;  #批改为NO PAD的CollationQuery OK, 0 rows affected (0.15 sec)Records: 0  Duplicates: 0  Warnings: 0mysql> insert into t1 select 1,'  ';  #c2字段插入两个空格Query OK, 1 row affected (0.12 sec)Records: 1  Duplicates: 0  Warnings: 0mysql> insert into t1 select 1,'';  #''之间无空格ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.mysql> select c1,c2,hex(c2) from t1;+------+------+---------+| c1   | c2   | hex(c2) |+------+------+---------+|    1 |      | 2020    |+------+------+---------+1 row in set (0.01 sec)

4、扩大

4.1、如果c2列是CHAR类型,和后面的问题体现一样吗

一样。CHAR、VARCHAR、TEXT在做排序和比拟运算时,都是根据列的Collation的Pad属性解决字符串尾部的空格。此时拿来做比拟运算的字符串是insert中的内容。

4.2、WHERE条件中表现形式是怎么的

创立一张新表并插入数据

mysql> create table t3(    -> c1 int,    -> c2 char(4) collate utf8mb4_unicode_ci,    -> c3 char(4) collate utf8mb4_0900_ai_ci,    -> c4 varchar(4) collate utf8mb4_unicode_ci,    -> c5 varchar(4) collate utf8mb4_0900_ai_ci    -> )engine=innodb;Query OK, 0 rows affected (0.29 sec)mysql> insert into t3 select 1,'a','a','a','a';Query OK, 1 row affected (0.09 sec)Records: 1  Duplicates: 0  Warnings: 0mysql> insert into t3 select 2,'a ','a ','a ','a ';  #各列蕴含1个空格Query OK, 1 row affected (0.20 sec)Records: 1  Duplicates: 0  Warnings: 0mysql> insert into t3 select 3,'a   ','a   ','a  ','a  ';  #前两列3个空格,后两列2个空格Query OK, 1 row affected (0.17 sec)Records: 1  Duplicates: 0  Warnings: 0mysql> insert into t3 select 4,'a  ','a  ','a   ','a   ';  #前两列2个空格,后两列3个空格Query OK, 1 row affected (0.14 sec)Records: 1  Duplicates: 0  Warnings: 0

察看WHERE条件返回后果,CHAR类型的返回受PAD_CHAR_TO_FULL_LENGTH影响(参考2.1)

mysql> set sql_mode='';Query OK, 0 rows affected (0.00 sec)mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';+------+------+------+------+------+---------+---------+----------+----------+| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |+------+------+------+------+------+---------+---------+----------+----------+|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       ||    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     ||    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   ||    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |+------+------+------+------+------+---------+---------+----------+----------+4 rows in set (0.00 sec)c2 char->返回数据去掉字符串尾部的空格c2 utf8mb4_unicode_ci->PAD SPACE->排序和比拟运算,疏忽字符串尾部空格mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';+------+------+------+------+------+---------+---------+----------+----------+| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |+------+------+------+------+------+---------+---------+----------+----------+|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       ||    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     ||    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   ||    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |+------+------+------+------+------+---------+---------+----------+----------+4 rows in set (0.01 sec)c3 char->返回数据去掉字符串尾部的空格c3 utf8mb4_0900_ai_ci->NO PAD->排序和比拟运算,字符串尾部空格当成一般字符mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';+------+------+------+------+------+---------+---------+----------+----------+| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |+------+------+------+------+------+---------+---------+----------+----------+|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       ||    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     ||    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   ||    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |+------+------+------+------+------+---------+---------+----------+----------+4 rows in set (0.00 sec)c4 varchar->返回数据保留插入时的空格c4 utf8mb4_unicode_ci->PAD SPACE->排序和比拟运算,疏忽字符串尾部空格mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';+------+------+------+------+------+---------+---------+---------+---------+| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4) | hex(c5) |+------+------+------+------+------+---------+---------+---------+---------+|    1 | a    | a    | a    | a    | 61      | 61      | 61      | 61      |+------+------+------+------+------+---------+---------+---------+---------+1 row in set (0.00 sec)c5 varchar->返回数据保留插入时的空格c5 utf8mb4_0900_ai_ci->NO PAD->排序和比拟运算,字符串尾部空格当成一般字符mysql> set sql_mode='PAD_CHAR_TO_FULL_LENGTH';Query OK, 0 rows affected, 1 warning (0.01 sec)mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';+------+------+------+------+------+----------+----------+----------+----------+| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4)  | hex(c5)  |+------+------+------+------+------+----------+----------+----------+----------+|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61       | 61       ||    2 | a    | a    | a    | a    | 61202020 | 61202020 | 6120     | 6120     ||    3 | a    | a    | a    | a    | 61202020 | 61202020 | 612020   | 612020   ||    4 | a    | a    | a    | a    | 61202020 | 61202020 | 61202020 | 61202020 |+------+------+------+------+------+----------+----------+----------+----------+4 rows in set (0.00 sec)c2 char->PAD_CHAR_TO_FULL_LENGTH->返回数据字符串左边补充空格c2 utf8mb4_unicode_ci->PAD SPACE->排序和比拟运算,疏忽字符串尾部空格mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';Empty set (0.00 sec)c3 char->PAD_CHAR_TO_FULL_LENGTH->返回数据字符串左边补充空格c3 utf8mb4_0900_ai_ci->NO PAD->排序和比拟运算,字符串尾部空格当成一般字符1~4行c3列返回值都蕴含空格,且c3列的Collation是NO PAD,字符串尾部空格不能疏忽,where过滤找不到记录mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';+------+------+------+------+------+----------+----------+----------+----------+| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4)  | hex(c5)  |+------+------+------+------+------+----------+----------+----------+----------+|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61       | 61       ||    2 | a    | a    | a    | a    | 61202020 | 61202020 | 6120     | 6120     ||    3 | a    | a    | a    | a    | 61202020 | 61202020 | 612020   | 612020   ||    4 | a    | a    | a    | a    | 61202020 | 61202020 | 61202020 | 61202020 |+------+------+------+------+------+----------+----------+----------+----------+4 rows in set (0.00 sec)c4 varchar->返回数据保留插入时的空格c4 utf8mb4_unicode_ci->PAD SPACE->排序和比拟运算,疏忽字符串尾部空格mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';+------+------+------+------+------+----------+----------+---------+---------+| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4) | hex(c5) |+------+------+------+------+------+----------+----------+---------+---------+|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61      | 61      |+------+------+------+------+------+----------+----------+---------+---------+1 row in set (0.00 sec)c5 varchar->返回数据保留插入时的空格c5 utf8mb4_0900_ai_ci->NO PAD->排序和比拟运算,字符串尾部空格当成一般字符

此时拿来做比拟运算的字符串是Retrieved的内容,CHAR和VARCHAR返回数据时对字符串尾部的空格解决形式不同,并且PAD_CHAR_TO_FULL_LENGTH只影响CHAR类型。

4.3、对惟一索引的影响

https://dev.mysql.com/doc/refman/8.0/en/char.html

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

如果存在惟一索引(单列、字符类型),插入的数据仅在尾部空格个数不同,有可能会报duplicate-key谬误:

mysql> select c1,c4,c5,hex(c4),hex(c5) from t3;+------+------+------+----------+----------+| c1   | c4   | c5   | hex(c4)  | hex(c5)  |+------+------+------+----------+----------+|    1 | a    | a    | 61       | 61       ||    2 | a    | a    | 6120     | 6120     ||    3 | a    | a    | 612020   | 612020   ||    4 | a    | a    | 61202020 | 61202020 |+------+------+------+----------+----------+4 rows in set (0.00 sec)mysql> alter table t3 add unique(c4);ERROR 1062 (23000): Duplicate entry 'a' for key 't3.c4'mysql> alter table t3 add unique(c5);Query OK, 0 rows affected (0.44 sec)Records: 0  Duplicates: 0  Warnings: 0

能够看到c4列创立惟一索引失败,c5列创立惟一索引胜利。

c4 utf8mb4_unicode_ci->PAD SPACE->排序和比拟运算,疏忽字符串尾部空格,4行数据反复。

c5 utf8mb4_0900_ai_ci->NO PAD->排序和比拟运算,字符串尾部空格当成一般字符,4行数据不同。

5、总结

Stored

-CHAR(N)VARCHAR(N)
Stored字符有余N左边补空格保留插入时的空格,不会在左边额定补充空格

Retrieved

SQL_MODECHAR(N)VARCHAR(N)
Default Value去掉字符串尾部的空格保留插入时的空格
PAD_CHAR_TO_FULL_LENGTH返回残缺字符串,有余N左边补空格保留插入时的空格

Comparison(不包含like)

Pad AttributeCHAR(N)/VARCHAR(N)
PAD SPACE疏忽字符串尾部空格
NO PAD字符串尾部空格当成一般字符,不能疏忽

Enjoy GreatSQL :)

文章举荐:

技术分享 | MGR最佳实际(MGR Best Practice)
https://mp.weixin.qq.com/s/66...

技术分享 | 万里数据库MGR Bug修复之路
https://mp.weixin.qq.com/s/Ia...

Macos零碎编译percona及局部函数在Macos零碎上运算差别
https://mp.weixin.qq.com/s/jA...

技术分享 | 利用systemd治理MySQL单机多实例
https://mp.weixin.qq.com/s/iJ...

产品 | GreatSQL,打造更好的MGR生态
https://mp.weixin.qq.com/s/By...

产品 | GreatSQL MGR优化参考
https://mp.weixin.qq.com/s/5m...

对于 GreatSQL

GreatSQL是由万里数据库保护的MySQL分支,专一于晋升MGR可靠性及性能,反对InnoDB并行查问个性,是实用于金融级利用的MySQL分支版本。

Gitee:
https://gitee.com/GreatSQL/Gr...

GitHub:
https://github.com/GreatSQL/G...

微信&QQ群:

可扫码增加GreatSQL社区助手微信好友,发送验证信息“加群”退出GreatSQL/MGR交换微信群,亦可间接扫码退出GreatSQL/MGR交换QQ群。

本文由博客一文多发平台 OpenWrite 公布!