关于后端:MySQL统计信息不准导致的性能问题

简介：统计信息不准导致谬误的执行打算，引发性能问题
表的统计信息谬误导致优化器抉择谬误的执行打算。
一个客户的性能优化案例: 没有批改数据库实例的任何配置参数以及业务代码没有变更的状况下，一条 sql 呈现大幅性能降落。

咱们来看看出问题的sql 以及他的执行打算:

mysql> explain
-> SELECT count(con.id) ,
-> MAX(DAYNAME(con.date)) ,
-> now() ,
-> pcz.type,
-> pcz.c_c
-> FROM con AS con
-> join orders o on con.o_id = o.id
-> JOIN pcz AS pcz ON o.d_p_c_z_id = pcz.id
-> left join c c on con.c_id = c.id
-> WHERE con.date = current_date() and pcz.type = "T_D"

-> GROUP BY con.date, pcz.c_c, pcz.type;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	pcz	NULL	ALL	PRIMARY	NULL	NULL	NULL	194	10.00	Using where; Using temporary; Using filesort
1	SIMPLE	o	NULL	ref	PRIMARY,dpcz_FK	dpcz_FK	9	custom.pcz.id	1642	100.00	Using index
1	SIMPLE	con	NULL	ref	FK_order,IDX_date	FK_order	8	custom.o.id	1	4.23	Using where
1	SIMPLE	c	NULL	eq_ref	PRIMARY	PRIMARY	8	custom.con.c_id	1	100.00	Using index

执行打算显示 rows examined = (19410%)1642(14.23%)=1347 查看执行打算咱们就发现 where 条件 con.date = current_date() 。这个条件看起来更适宜作为索引过滤数据。然而为什么 MySQL 优化器不抉择该索引呢？接下来应用 force index 强制执行打算应用 con.date 字段上的索引。执行打算如下:

mysql> explain
-> SELECT count(con.id) ,
-> MAX(DAYNAME(con.date)) ,
-> now() ,
-> pcz.type,
-> pcz.c_c
-> FROM con AS con USE INDEX(IDX_date)
-> join orders o on con.o_id = o.id
-> JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id
-> left join c c on con.c_id = c.id
-> WHERE con.date = current_date() and pcz.type = "T_D"

-> GROUP BY con.date, pcz.c_c, pcz.type;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	con	NULL	ref	IDX_date	IDX_date	3	const	110446	100.00	Using temporary; Using filesort
1	SIMPLE	c	NULL	eq_ref	PRIMARY	PRIMARY	8	custom.con.c_id	1	100.00	Using index
1	SIMPLE	o	NULL	eq_ref	PRIMARY,dpcz_FK	PRIMARY	8	custom.con.o_id	1	100.00	Using where
1	SIMPLE	pcz	NULL	eq_ref	PRIMARY	PRIMARY	8	custom.o.d_p_c_z_id	1	10.00	Using where

问题来了 rows examined = 110446(110%)=11045 rows依据计算评估，第一个执行打算的 1347 大略是 110446 的十分之一，至多从外表上看来这个是MySQL 优化器抉择第一个执行打算的起因。

然而比照理论的查问后果的响应工夫，必定粗问题了。因为执行打算二的sql 的响应工夫在预期之内，然而执行打算一对应的响应工夫反而更慢。

进一步来看表 orders 的创立语句以及执行打算1，咱们发现表pcz确实有194行。而后查看索引 orders.dpcz_FK，表 orders 返回 1642行，因为外键束缚 orders_ibfk_10的定义，也就意味着表 orders 的记录数应该是 194*1642=318548 ，然而理论的行数是 32508150，百倍于执行打算预计的值 318548 。

CREATE TABLE orders (
id bigint(20) NOT NULL AUTO_INCREMENT,
...
d_p_c_z_id bigint(20) DEFAULT NULL,
...,
PRIMARY KEY (id),
...
KEY dpcz_FK (d_p_c_z_id),
...
CONSTRAINT orders_ibfk_10 FOREIGN KEY (d_p_c_z_id) REFERENCES p_c_z (id) ON DELETE CASCADE ON UPDATE CASCADE,
...
) ENGINE=InnoDB ....

mysql> select * from mysql.innodb_table_stats where database_name='cutom' and table_name='orders';
database_name	table_name	last_update	n_rows	clustered_index_size	sum_of_other_index_sizes
custom	orders	2022-03-03 21:58:18	32508150	349120	697618

剖析至此，咱们能够判定 orders.dpcz_FK 的统计信息是不精确的，于是乎咱们应用如下语句确认它的理论数据量:

mysql> select * from mysql.innodb_index_stats where database_name='cutom' and table_name='orders' and index_name='dpcz_FK';

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
database_name	table_name	index_name	last_update	stat_name	stat_value	sample_size	stat_description
custom	orders	dpcz_FK	2022-02-28 12:35:30	n_diff_pfx01	19498	50	d_p_c_z_id
custom	orders	dpcz_FK	2022-02-28 12:35:30	n_diff_pfx02	32283087	128	d_p_c_z_id,id
custom	orders	dpcz_FK	2022-02-28 12:35:30	n_leaf_pages	55653	NULL	Number of leaf pages in the index
custom	orders	dpcz_FK	2022-02-28 12:35:30	size	63864	NULL	Number of pages in the index

mysql> select count(distinct d_p_c_z_id) from orders;
count(distinct d_p_c_z_id)
195

Bingo！从下面来看表 orders 字段 d_p_c_z_id 的区分度(不一样的值的总数)为 195 。在信息统计表外面 dpcz_FK的 stat_value 值是 19498 ，显然这个值是不精确的并且比理论值大的多，100倍。索引的 state_value 值应该等于这个字段的在表外面的区分度。

如果应用正确的索引 dpcz_FK 的值 stat_value 195 去从新评估执行打算的老本，咱们将失去执行打算1 的后果 32508150/195=166708 ，并且执行打算预估的扫描的行数应该是 (19410%)166708(14.23%)=136804。因为该值是10倍于执行打算2 的值 11045 。MySQL 在没有应用force index的状况下就能走到正确的执行打算。

这个sql的问题解决了，然而为什么 MySQL 的统计信息会计算错误，咱们如何修复它呢？

答复这个问题之前，咱们先理解一下 MySQL 是如何收集统计信息以及哪些参数管制这个动作。

InnoDB 是如何收集表的统计信息
咱们能够通过显式的形式或者零碎主动采集表的统计信息。

通过开启参数innodb_stats_auto_recalc =on(默认也是关上的) 以便在表的数据产生重大变动当前来主动收集表的统计信息。比方当表中的10% 的行发生变化，InnoDB 将从新计算统计信息。或者咱们能够应用ANALYZE TABLE显式地从新计算统计信息。

InnoDB 应用随机采样技术的办法采集统计信息-- 随机抽取索引页，预计索引的基数。参数 innodb_stats_persistent_sample_pages 管制采样页面的数量。参考 https://dev.mysql.com/doc/ref...

依据代码和形容，随机抽样并不是齐全随机的。采样页面实际上是依据采样算法抉择的。最终，不同键值的总数，即索引的 stat_value 将通过以下公式计算

N R N_DIFF_AVG_LEAF。其中

N : 叶页数
R : level LA上不同key值的个数与level LA上记录总数的比值
N_DIFF_AVG_LEAF：在所有 A 叶页中找到的不同键值的平均数。
采样算法代码的详细信息能够在链接中找到：https://github.com/mysql/mysq...

基于下面的介绍，咱们晓得当一个表的索引产生决裂时，无论是叶子页数（N），还是层LA 上不同键值的个数占层LA 总记录数的比值（R ) 变得越来越不精确，因而 stat_value 的计算可能不正确。一旦产生这种状况，除非更改参数innodb_stats_persistent_sample_pages或重建索引，否则显式从新计算（手动运行 ANALYZE TABLE）将无奈生成正确的 stat_value。

解决办法
咱们怎么修改表的统计信息，并且阻止这类状况进一步产生。

通过后面的剖析和探讨，咱们晓得有两个因素影响数据库收集表的统计信息，

innodb_stats_persistent_sample_pages: A

索引的组织形式

为了可能让 InnoDB 失去正确的统计信息，咱们须要调整 innodb_stats_persistent_sample_pages或者重建索引。

1 通过命令 analyze table 不重建的形式，放弃 innodb_stats_persistent_sample_pages =128，stat_value 稍微更改为 19582，靠近原始不正确的 19498，依然敞开。索引中的叶子页数从 55653 稍微更改为 55891，索引中的页数也从 63864 稍微更改为 64248

mysql> show variables = 'innodb_stats_persistent_sample_pages;
Variable_name	Value
innodb_stats_persistent_sample_pages	128

mysql> analyze table orders;
Table	Op	Msg_type	Msg_text
custom.orders	analyze	status	OK

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
database_name	table_name	index_name	last_update	stat_name	stat_value	sample_size	stat_description
custom	orders	dpcz_FK	2022-03-03 21:58:18	n_diff_pfx01	19582	50	d_p_c_z_id
custom	orders	dpcz_FK	2022-03-03 21:58:18	n_diff_pfx02	32425512	128	d_p_c_z_id,id
custom	orders	dpcz_FK	2022-03-03 21:58:18	n_leaf_pages	55891	NULL	Number of leaf pages in the index
custom	orders	dpcz_FK	2022-03-03 21:58:18	size	64248	NULL	Number of pages in the index

2 ANALYZE TABLE 不重建，但将 innodb_stats_persistent_sample_pages 从 128 减少到 512，使 stat_value 到192十分靠近理论基数 195。索引中的叶页数产生了很大变动，从 55653 到 44188。索引中的页数也从也产生了巨大变化，从 63864 变为 50304。

mysql> show variables like '%persistent_sample%';
Variable_name	Value
innodb_stats_persistent_sample_pages	512

mysql> analyze table orders;
Table	Op	Msg_type	Msg_text
custom.orders	analyze	status	OK

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
database_name	table_name	index_name	last_update	stat_name	stat_value	sample_size	stat_description
custom	orders	dpcz_FK	2022-03-09 06:54:29	n_diff_pfx01	192	179	d_p_c_z_id
custom	orders	dpcz_FK	2022-03-09 06:54:29	n_diff_pfx02	31751321	512	d_p_c_z_id,id
custom	orders	dpcz_FK	2022-03-09 06:54:29	n_leaf_pages	44188	NULL	Number of leaf pages in the index
custom	orders	dpcz_FK	2022-03-09 06:54:29	size	50304	NULL	Number of pages in the index

3 重建表，放弃 innodb_stats_persistent_sample_pages 为128，同样失去了正确的 stat_value 187，靠近实在基数195。索引中的叶子页数大幅变动，从55653变为43733，索引中的页数也从63864变动到 50111。

mysql> show variables = 'innodb_stats_persistent_sample_pages';
Variable_name	Value
innodb_stats_persistent_sample_pages	128

mysql> alter table orders engine=innodb;
Query OK, 0 rows affected (11 min 16.37 sec)

mysql> select * from mysql.innodb_index_stats where database_name='custom' and table_name='orders' and index_name='dpcz_FK';
database_name	table_name	index_name	last_update	stat_name	stat_value	sample_size	stat_description
custom	orders	dpcz_FK	2022-03-07 18:44:43	n_diff_pfx01	187	128	d_p_c_z_id
custom	orders	dpcz_FK	2022-03-07 18:44:43	n_diff_pfx02	31531493	128	d_p_c_z_id,id
custom	orders	dpcz_FK	2022-03-07 18:44:43	n_leaf_pages	43733	NULL	Number of leaf pages in the index
custom	orders	dpcz_FK	2022-03-07 18:44:43	size	50111	NULL	Number of pages in the index

在更正表统计数据后，MySQL 优化器也会抉择正确的执行打算：

mysql> explain
SELECT count(con.id) ,

   MAX(DAYNAME(con.date)) ,   now()                  ,   pcz.type,   pcz.c_c

FROM con AS con

     join orders o on con.order_id = o.id     JOIN p_c_z AS pcz ON o.d_p_c_z_id = pcz.id     left join c c on con.c_id = c.id

WHERE con.date = current_date()
and pcz.type = "T_D"

GROUP BY con.date, pcz.c_c, pcz.type;
id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	con	NULL	ref	FK_order,IDX_date	IDX_date	3	const	3074	100.00	Using temporary; Using filesort
1	SIMPLE	c	NULL	eq_ref	PRIMARY	PRIMARY	8	custom.con.c_id	1	100.00	Using index
1	SIMPLE	o	NULL	eq_ref	PRIMARY,dpcz_FK	PRIMARY	8	custom.con.order_id	1	100.00	Using where
1	SIMPLE	pcz	NULL	eq_ref	PRIMARY	PRIMARY	8	custom.o.d_p_c_z_id	1	10.00	Using where

4 rows in set, 1 warning (0.01 sec)
论断
MySQL优化器依赖于表的统计信息的准确性来抉择最优的执行打算。咱们能够通过更改参数 innodb_stats_persistent_sample_pages 来控制系统采集表统计信息的准确性。

咱们还能够抉择通过在对索引进行碎片整顿的同时重建/重建表来强制从新计算表统计信息，这有助于进步表统计信息的准确性。重构表，咱们能够间接用 alter table xx;批改表或者应用 pt-online-schema-change 达到同样的成果。

原文链接：https://click.aliyun.com/m/10...

本文为阿里云原创内容，未经容许不得转载。