关于字符集:故障分析-MySQL-convert-函数导致的字符集报错处理

3次阅读

共计 6764 个字符,预计需要花费 17 分钟才能阅读完成。

作者:徐耀荣

爱可生南区交付服务部 DBA 团队成员,次要负责 MySQL 故障解决以及相干技术支持。喜好电影,游戏,游览以及桌球。

本文起源:原创投稿

* 爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。


一、问题背景

有客户之前遇到一个 mysql8.0.21 实例中排序规定的报错,是在调用视图时抛出,报错信息如下:

ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

二、问题模仿

mysql> show create table t1\G;
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (`name1` varchar(12) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec)

mysql> show create table t2\G;
*************************** 1. row ***************************
       Table: t2
Create Table: CREATE TABLE `t2` (`name2` varchar(12) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> CREATE VIEW t3 as select * from t1,t2 where `t1`.`name1`= `t2`.`name2`;
Query OK, 0 rows affected (0.06 sec)

mysql> select * from t3;
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

三、问题剖析

通过查看视图定义,能够发现因为视图中波及到的两张表字符集不同,所以创立视图时 MySQL 会主动应用 convert 函数转换字符集。

mysql> show create view t3\G;
*************************** 1. row ***************************
                View: t3
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `t3` AS select `t1`.`name1` AS `name1`,`t2`.`name2` AS `name2` from (`t1` join `t2`) where (`t1`.`name1` = convert(`t2`.`name2` using utf8mb4))
character_set_client: utf8mb4
collation_connection: utf8mb4_general_ci
1 row in set (0.00 sec)

在 MySQL 8.0 中 utf8mb4 的默认排序规定为 utf8mb4_0900_ai_ci,而在 t1 表的排序规定为 utf8mb4_general_ci,那么咱们试着将排序规定相干的参数批改后再执行 SQL 看看,批改后的环境参数如下

mysql> show variables like '%collat%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_general_ci |
| collation_database            | utf8mb4_bin        |
| collation_server              | utf8mb4_bin        |
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+

再次执行 sql 发现还是会报一样的错。

mysql> select * from t1,t2 where `t1`.`name1`=convert(`t2`.`name2` using utf8mb4);
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='

通过 show collation 来查看 utf8mb4 字符集对应的默认排序规定,输入显示默认规定为 utf8mb4_general_ci,并不是 utf8mb4_0900_ai_ci。

mysql> show collation like '%utf8mb4%';
+----------------------------+---------+-----+---------+----------+---------+---------------+
| Collation                  | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+----------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_general_ci         | utf8mb4 |  45 | Yes     | Yes      |       1 | PAD SPACE     |
+----------------------------+---------+-----+---------+----------+---------+---------------+

mysql> show character set like '%utf8mb4%';
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |      4 |
+---------+---------------+--------------------+--------+
1 row in set (0.00 sec)

持续排查发现元数据中的字符集默认排序规定如下,默认规定为 utf8mb4_0900_ai_ci。

mysql>  select * from INFORMATION_SCHEMA.COLLATIONS where IS_DEFAULT='Yes' and CHARACTER_SET_NAME='utf8mb4'\G;
*************************** 1. row ***************************
    COLLATION_NAME: utf8mb4_0900_ai_ci
CHARACTER_SET_NAME: utf8mb4
                ID: 255
        IS_DEFAULT: Yes
       IS_COMPILED: Yes
           SORTLEN: 0
     PAD_ATTRIBUTE: NO PAD
1 row in set (0.00 sec)

查看参数发现,元数据信息中 utf8mb4 字符集默认排序规定是 utf8mb4_0900_ai_ci,show collation/show character 输入的都是 utf8mb4_general_ci。为什么 show 显示的后果和 INFORMATION_SCHEMA.COLLATIONS 表查到的信息还不一样呢?此处咱们暂且按下不表,咱们先看看官网文档中 convert 函数用法,其中有上面这段原文:

If you specify CHARACTER SET charset_name as just shown, the character set and collation of the result are charset_name and the default collation of charset_name. If you omit CHARACTER SET charset_name, the character set and collation of the result are defined by the character_set_connection and collation_connection system variables that determine the default connection character set and collation (see Section 10.4,“Connection Character Sets and Collations”).

从上述原文可知如果 convert 只指定了字符集,那么该后果的排序规定就是所指定字符集的默认规定,由之前的测试状况可知,convert 应用的是 INFORMATION_SCHEMA.COLLATIONS 的排序规定,而不是 default_collation_for_utf8mb4 指定的 utf8mb4_general_ci,那咱们来看看 default_collation_for_utf8mb4 参数次要作用场景:

  1. SHOW COLLATION and SHOW CHARACTER SET.
  2. CREATE TABLE and ALTER TABLE having a CHARACTER SET utf8mb4 clause without a COLLATION clause, either for the table character set or for a column character set.
  3. CREATE DATABASE and ALTER DATABASE having a CHARACTER SET utf8mb4 clause without a COLLATION clause.
  4. Any statement containing a string literal of the form _utf8mb4’some text’ without a COLLATE clause.

其中,第一点解释了为什么 show 查到的信息和元数据中信息不一样,default_collation_for_utf8mb4 批改后影响 show COLLATION and SHOW CHARACTER SET 的查问后果,并不会扭转字符集的默认排序规定,所以 utf8mb4 的默认规定还是 utf8mb4_0900_ai_ci,sql 执行仍然会报错。

将 convert 函数指定为 t1.name1 字段的排序规定后,sql 执行失常。

mysql> select * from t1,t2 where `t1`.`name1` = convert(`t2`.`name2` using utf8mb4) collate utf8mb4_general_ci;
+-------+-------+
| name1 | name2 |
+-------+-------+
| jack  | jack  |
+-------+-------+
1 row in set (0.00 sec)

另外,上面测试能够验证 default_collation_for_utf8mb4 的第四个场景。

mysql> select * from INFORMATION_SCHEMA.COLLATIONS where IS_DEFAULT='Yes' and CHARACTER_SET_NAME='utf8mb4'\G;
*************************** 1. row ***************************
    COLLATION_NAME: utf8mb4_0900_ai_ci
CHARACTER_SET_NAME: utf8mb4
                ID: 255
        IS_DEFAULT: Yes
       IS_COMPILED: Yes
           SORTLEN: 0
     PAD_ATTRIBUTE: NO PAD
1 row in set (0.00 sec)

mysql> show variables like '%default_collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| default_collation_for_utf8mb4 | utf8mb4_general_ci |
+-------------------------------+--------------------+
1 row in set (0.01 sec)

mysql> set @s1 = _utf8mb4 'jack',@s2 = _utf8mb4 'jack';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @s1 = @s2;
+-----------+
| @s1 = @s2 |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)
_utf8mb4 申明的 @s1 和 @s2 排序规定是 default_collation_for_utf8mb4 参数值,为 utf8mb4_general_ci

mysql> SELECT @s1 = CONVERT(@s2 USING utf8mb4);
ERROR 1267 (HY000): Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_0900_ai_ci,IMPLICIT) for operation '='
此时,通过 CONVERT 函数解决的 @s2 排序规定是 utf8mb4_0900_ai_ci,所以会报错

mysql> SELECT @s1 = CONVERT(@s2 USING utf8mb4) collate utf8mb4_general_ci;
+-------------------------------------------------------------+
| @s1 = CONVERT(@s2 USING utf8mb4) collate utf8mb4_general_ci |
+-------------------------------------------------------------+
|                                                                 1 |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

四、问题总结

运维中为防止字符集引起的报错问题,有如下倡议可供参考:(具体参数值依据业务需要抉择)

  • 创立数据库实例时需指定参数 character_set_database(默认值:utf8mb4),character_set_server(默认值:utf8mb4)。
  • 当须要创立非默认字符集 database / table 时,须要在 sql 中明确指定字符集和排序规定。
  • 应用 convert 函数转换字符集时,当字段排序规定不是转换后字符集的默认排序规定,须要指定具体的排序规定。SELECT @s1 = CONVERT(@s2 USING utf8mb4) collate utf8mb4_general_ci
  • MySQL 5.7 迁徙至 MySQL 8.0 时,需注意 MySQL 5.7 版本中 utf8mb4 默认排序规定是 utf8mb4_general_ci,MySQL 8.0 中 utf8mb4 默认排序规定是 utf8mb4_0900_ai_ci。

参考

https://dev.mysql.com/doc/ref…

https://dev.mysql.com/doc/ref…

https://dev.mysql.com/doc/ref…

正文完
 0