关于索引:第31期索引设计索引数量探讨

31次阅读

共计 8807 个字符,预计需要花费 23 分钟才能阅读完成。

个别提到索引,大家都只关注索引的长处,一个优良的索引确实会让查问申请的效率大幅晋升、亦或是大幅晋升带有索引键进行过滤的写入申请(update、delete) 效率。不可否认,索引在这样的场景下带来微小的性能晋升,然而对于整张表的写申请来讲,则刚好相同,索引的存在会在肯定水平上升高写入申请的效率。为了保护索引数据的准实时性而让优化器基于索引做出更优化的执行打算,数据库会自动更新索引数据分布,比方带来索引页的拆分与合并等。

那索引的存在对写入申请影响到底有多大?也就是说,索引的数量多少与写申请效率的升高有没有一个对应的关系?这篇我用几个简略的例子看看是否失去一个明确的论断。(简略基于笔记本虚拟机测试,次要是基于 MySQL 单实例。)

我的例子次要针对四类语句:insert、update、delete、load data。
用 mysqlsh 部署一个洁净的实例:(带入端口,近程管理员,report_host 选项)
 MySQL  Py > dba.deploy_sandbox_instance(3500,{"password":"root","allowRootFrom":"%","mysqldOptions":["report_host=debian-ytt1"]})
A new MySQL sandbox instance will be created on this host in
/home/ytt/mysql-sandboxes/3500

...

Deploying new MySQL instance...

Instance localhost:3500 successfully deployed and started.
Use shell.connect('root@localhost:3500') to connect to the instance.

这里用来测试索引数量的表有 10 个,别离为 t1 到 t10,除了每张表索引数量不一样外,字段定义等都一样:t1 有 1 个索引,t2 有个 2 个,顺次类推,t10 有 10 个索引,字段类型都是整形(索引数量不蕴含主键,只蕴含二级索引)。表 t0 表构造如下:

(debian-ytt1:3500)|(ytt)>show create table t0\G
*************************** 1. row ***************************
       Table: t0
Create Table: CREATE TABLE `t0` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `r0` int DEFAULT NULL,
  `r1` int DEFAULT NULL,
  `r2` int DEFAULT NULL,
  `r3` int DEFAULT NULL,
  `r4` int DEFAULT NULL,
  `r5` int DEFAULT NULL,
  `r6` int DEFAULT NULL,
  `r7` int DEFAULT NULL,
  `r8` int DEFAULT NULL,
  `r9` int DEFAULT NULL,
  `r10` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

在 shell 命令行批量建这 10 张表,并且别离加上对应的索引:

root@debian-ytt1:~# for i in `seq 1 10`;do mysql --login-path=root_ytt -e "use ytt;create table t$i like t0";done;
root@debian-ytt1:~# for i in `seq 1 10`; do for j in `seq 1 "$i"`; do mysql --login-path=root_ytt -e"use ytt;alter table t$i add key idx_r$j (r$j)"; done; done;
这里我简略写一个存储过程,用来模仿 Insert、update、delete 三种写入操作:
DELIMITER $$

USE `ytt`$$

DROP PROCEDURE IF EXISTS `sp_batch_write`$$

CREATE DEFINER=`root`@`%` PROCEDURE `sp_batch_write`(IN f_write ENUM('insert','update','delete'),
    IN f_table_name VARCHAR(64),
    IN f_num INT UNSIGNED
    )
BEGIN
  DECLARE i INT UNSIGNED DEFAULT 0;
  
  IF f_write = 'insert' THEN
    SET @stmt = CONCAT('insert into',f_table_name,'(r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10)  
                        values (ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),
                        ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000),ceil(rand()*10000))');
   
    SET @@autocommit=0;
    WHILE i < f_num
    DO
      PREPARE s1 FROM @stmt;
      EXECUTE s1;
      IF MOD(i,50) = 0 THEN
        COMMIT;
      END IF;
      SET i = i + 1;
    END WHILE;  
    DROP PREPARE s1;
    COMMIT;
    SET @@autocommit=1;

  ELSEIF f_write = 'update' THEN
    SET @stmt = CONCAT('update',f_table_name,' set r0=ceil(rand()*10000),r1 = ceil(rand()*10000),
    r2 = ceil(rand()*10000),r3 = ceil(rand()*10000),r4 = ceil(rand()*10000),
    r5 = ceil(rand()*10000),r6 = ceil(rand()*10000),r7 = ceil(rand()*10000),
    r8 = ceil(rand()*10000),r9 = ceil(rand()*10000),r10 = ceil(rand()*10000)');
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  ELSEIF f_write = 'delete' THEN
    SET @stmt = CONCAT('delete from',f_table_name);
    PREPARE s1 FROM @stmt;
    EXECUTE s1;
    DROP PREPARE s1;
  END IF;

END$$

DELIMITER ;
  1. INSERT 操作这 10 张表的工夫比照:
(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t1',100000);
Query OK, 0 rows affected (1 min 24.61 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t2',100000);
Query OK, 0 rows affected (1 min 25.40 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t3',100000);
Query OK, 0 rows affected (1 min 29.12 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t4',100000);
Query OK, 0 rows affected (1 min 36.65 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t5',100000);
Query OK, 0 rows affected (1 min 39.85 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t6',100000);
Query OK, 0 rows affected (1 min 41.97 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t7',100000);
Query OK, 0 rows affected (2 min 29.07 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t8',100000);
Query OK, 0 rows affected (2 min 17.62 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t9',100000);
Query OK, 0 rows affected (2 min 14.14 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('insert','t10',100000);
Query OK, 0 rows affected (2 min 32.32 sec)

基本上,越到前面,工夫越长。简略画了张图,见下图:

  1. UPDATE 操作这 10 张表工夫比照:
(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t1',100000);
Query OK, 0 rows affected (22.81 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t2',100000);
Query OK, 0 rows affected (47.42 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t3',100000);
Query OK, 0 rows affected (1 min 8.96 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t4',100000);
Query OK, 0 rows affected (1 min 55.50 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t5',100000);
Query OK, 0 rows affected (2 min 13.13 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t6',100000);
Query OK, 0 rows affected (2 min 6.79 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t7',100000);
Query OK, 0 rows affected (2 min 35.01 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t8',100000);
Query OK, 0 rows affected (2 min 44.50 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t9',100000);
Query OK, 0 rows affected (3 min 1.27 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('update','t10',100000);
Query OK, 0 rows affected (4 min 43.74 sec)

对表更新的影响最大,随着索引数量的增多,表更新工夫越来越慢。见下图:

  1. DELETE 操作最这 10 张表的工夫比照:

删除前,先看下每张表对应磁盘文件大小,也是依照索引数量多少顺次由小到大排列,所以数量越少,占用磁盘空间越小。

root@debian-ytt1:/home/ytt/mysql-sandboxes/3500/sandboxdata/ytt# ls -Sihlr
总用量 521M
2229717 -rw-r----- 1 ytt ytt 112K 6 月   1 14:03 t0.ibd
2229718 -rw-r----- 1 ytt ytt  28M 6 月   1 16:14 t1.ibd
2229719 -rw-r----- 1 ytt ytt  36M 6 月   1 16:16 t2.ibd
2229720 -rw-r----- 1 ytt ytt  40M 6 月   1 16:16 t3.ibd
2229721 -rw-r----- 1 ytt ytt  44M 6 月   1 16:16 t4.ibd
2229722 -rw-r----- 1 ytt ytt  52M 6 月   1 16:03 t5.ibd
2229723 -rw-r----- 1 ytt ytt  56M 6 月   1 16:06 t6.ibd
2229725 -rw-r----- 1 ytt ytt  60M 6 月   1 15:25 t8.ibd
2229724 -rw-r----- 1 ytt ytt  64M 6 月   1 16:08 t7.ibd
2229734 -rw-r----- 1 ytt ytt  68M 6 月   1 15:30 t9.ibd
2229735 -rw-r----- 1 ytt ytt  72M 6 月   1 15:32 t10.ibd

删除工夫必定随着磁盘文件大小递增:

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t1',100000);
Query OK, 0 rows affected (31.80 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t2',100000);
Query OK, 0 rows affected (59.89 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t3',100000);
Query OK, 0 rows affected (1 min 21.83 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t4',100000);
Query OK, 0 rows affected (2 min 0.33 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t5',100000);
Query OK, 0 rows affected (2 min 17.66 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t6',100000);
Query OK, 0 rows affected (2 min 9.37 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t7',100000);
Query OK, 0 rows affected (2 min 53.61 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t8',100000);
Query OK, 0 rows affected (2 min 39.29 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t9',100000);
Query OK, 0 rows affected (3 min 38.26 sec)

(debian-ytt1:3500)|(ytt)>call sp_batch_write('delete','t10',100000);
Query OK, 0 rows affected (4 min 22.88 sec)

后果和预期一样,见下图:

这也就是为什么对一张表记录清空时,不倡议应用 delete 的起因!(delete 每行打标记,前期 purge 线程定期清理,空间也不会立刻开释,用 truncate 代替 delete!)

  1. LOAD DATA 导入操作对这 10 张表工夫比照:
(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t1;
Query OK, 100000 rows affected (11.01 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t2;
Query OK, 100000 rows affected (14.94 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t3;
Query OK, 100000 rows affected (19.33 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t4;
Query OK, 100000 rows affected (22.06 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t5;
Query OK, 100000 rows affected (28.86 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t6;
Query OK, 100000 rows affected (30.57 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t7;
Query OK, 100000 rows affected (30.83 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t8;
Query OK, 100000 rows affected (35.88 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t9;
Query OK, 100000 rows affected (36.76 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

(debian-ytt1:3500)|(ytt)>load data infile '/home/ytt/mysql-sandboxes/3500/mysql-files/t0.txt' into table t10;
Query OK, 100000 rows affected (45.79 sec)
Records: 100000  Deleted: 0  Skipped: 0  Warnings: 0

导入工夫也是随着索引数量增多而减少,见下图:

从以上简略测试后果能够失去,新增一个索引,就会导致额定的写入工夫耗费。在写入优先的业务中,索引并不是越多越好,最好是保留必要的索引,删除掉不必要的索引。说到这里,可能在互联网中最罕用的一种场景,比方主从架构,主机和从机表构造一样,然而索引数量不一样;主机只有主键,从机除了主键外,可能有多个二级索引。其实这样对于写申请在主机,读申请在从机,并且从机能够承受肯定水平的数据延迟时间的场景,那其实十分适合;然而如果从机不能承受太长时间的数据提早,此时就得舍弃掉一堆不是很罕用的索引来晋升从机的日志生产效率。


对于 MySQL 的技术内容,你们还有什么想晓得的吗?连忙留言通知小编吧!

正文完
 0