Coinciding with the new native data dictionary in MySQL 8.0, we have made a number of useful enhancements to our INFORMATION_SCHEMA subsystem design in MySQL 8.0. In this post I will first go through our legacy implementation as it has stood since MySQL 5.1, and then cover what’s changed.
与 MySQL 8.0 原生数据字典统一,在 MySQL 8.0 的 INFORMATION_SCHEMA 子系统设计中,咱们做了一些很有用的加强。在这篇文章中,我将会介绍自 MySQL 5.1 以来的旧的实现形式,而后介绍咱们做了什么扭转。
Background
INFORMATION_SCHEMA was first introduced into MySQL 5.0, as a standards compliant way of retrieving meta data from a running MySQL server. When we look at the history of INFORMATION_SCHEMA there have been a number of complaints about the performance of certain queries, particularly in the case that there are many database objects (schemas, tables etc).
INFORMATION_SCHEMA 首次引入 MySQL 5.0,作为一种从正在运行的 MySQL 服务器检索元数据的规范兼容形式。当咱们回顾 INFORMATION_SCHEMA 的历史时,对于某些特定查问性能总是有很多的埋怨,特地是在有许多数据库对象(schema,表等)的状况下。
In an effort to address these reported issues, since MySQL 5.1 we have made a number of performance optimizations to speed up the execution of INFORMATION_SCHEMA queries. The optimizations are described in the MySQL manual, and apply when the user provides an explicit schema name or table name in the query.
为了解决这些上报的问题,从 MySQL 5.1 开始,咱们进行了许多性能优化来放慢 INFORMATION_SCHEMA 查问的执行速度。MySQL 手册 < 链接 1 > 中形容了这些优化,当用户在查问中提供显式 schema 名称或表名时,将会利用这些。
Alas, despite these improvements INFORMATION_SCHEMA performance is still a major pain point for many of our users. The key reason behind these performance issues in the current INFORMATION_SCHEMA implementation is that INFORMATION_SCHEMA tables are implemented as temporary tables that are created on-the-fly during query execution. These temporary tables are populated via:
Meta data from files, e.g. table definitions from .FRM files.
Details from storage engines, e.g. dynamic table statistics.
Data from global data structures in the MySQL server.
只管有这些改良,INFORMATION_SCHEMA 的 性能依然是咱们许多用户的次要痛点。在以后 INFORMATION_SCHEMA 实现形式下产生的性能问题背地的要害起因是,INFORMATION_SCHEMA 表的查问实现形式是在查问执行期间创立长期表。这些长期表通过以下形式填充:
元数据来自文件,例如:表定义来自 FRM 文件
细节来自于存储引擎,例如:动静表的统计信息
来自 MySQL server 层中全局数据结构的数据
For a MySQL server having hundreds of database, each with hundreds of tables within them, the INFORMATION_SCHEMA query would end-up doing lot of I/O reading each individual FRM files from the file system. And it would also end-up using more CPU cycles in effort to open the table and prepare related in-memory data structures. It does attempt to use the MySQL server table cache (the system variable‘table_definition_cache‘), however in large server instances it’s very rare to have a table cache that is large enough to accommodate all of these tables.
对于一个 MySQL 实例来说可能有上百个库,每个库又有上百张表,INFORMATION_SCHEMA 查问最终会从文件系统中读取每个独自的 FRM 文件,造成很多 I / O 读取。并且最终还会耗费更多的 CPU 来关上表并筹备相干的内存数据结构。它的确尝试应用 MySQL server 层的表缓存(零碎变量 table_definition_cache),然而在大型实例中,很少有一个足够大的表缓存来包容所有的表。
One can easily face the above mentioned performance issue if the optimization is not used by the INFORMATION_SCHEMA query. For example, let us consider the two queries below
如果 INFORMATION_SCHEMA 查问未应用优化,则能够很容易碰到下面的性能问题。例如,让咱们思考上面的两个查问
1. row **
id: 1
select_type: SIMPLE
table: TABLES
partitions: NULL
type: ALL
possible_keys: NULL
key: TABLE_SCHEMA,TABLE_NAME
key_len: NULL
ref: NULL
rows: NULL
filtered: NULL
Extra: Using where; Skip_open_table; Scanned 0 databases
1 row in set, 1 warning (0. 00 sec)
mysql > EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA. TABLES WHERE
-> TABLE_SCHEMA like ‘test% ‘ AND TABLE_NAME like ‘t% ‘\G
1. row **
id: 1
select_type: SIMPLE
table: TABLES
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
filtered: NULL
Extra: Using where; Skip_open_table; Scanned all databases
1 row in set, 1 warning (0. 00 sec)
As we can see from the EXPLAIN output, we see that the former query would use the values provided in WHERE clause for the TABLE_SCHEMA and TABLE_NAME field as a key to read the desired FRM files from the file system. However, the latter query would end up reading all thewww.sangpi.com FRM in the entire data directory, which is very costly and does not scale.
从 EXPLAIN 的输入能够看到。咱们看到前一个查问将应用 WHERE 子句中为 TABLE_SCHEMA 和 TABLE_NAME 字段提供的值作为键,从文件系统读取所需 FRM 文件。然而,后一个查问最终将读取整个数据目录中的所有 FRM,这十分低廉且无奈扩大。
Changes in MySQL 8.0
One of the major changes in 8.0 is the introduction of a native data dictionary based on InnoDB. This change has enabled us to get rid of file-based metadata store (FRM files) and also help MySQL to move towards supporting transactional DDL. For more details on introduction of data dictionary feature in 8.0 and its benefits, please look at Staale’s post here.
8.0 中的一个次要变动是引入了基于 InnoDB 的数据字典。这一变动使咱们可能解脱基于文件的游戏元数据存储(FRM 文件),并帮忙 MySQL 转向反对事务 DDL。无关在 8.0 中引入数据字典性能及其长处的更多详细信息,请在此处查看 Staale 的文章 < 链接 2 >。
Now that the metadata of all database tables is stored in transactional data dictionary tables, it enables us to design an INFORMATION_SCHEMA table as a database VIEW over the data dictionary tables. This eliminates costs such as the creation of temporary tables for each INFORMATION_SCHEMA query during execution on-the-fly, and also scanning file-system directories to find FRM files. It is also now possible to utilize the full power of the MySQL optimizer to prepare better query execution plans using indexes on data dictionary tables.
既然所有数据库表的元数据都存储在事务数据字典表中,它使咱们可能将 INFORMATION_SCHEMA 表设计为数据字典表上的数据库视图。这打消了老本,例如在执行期间为每个 INFORMATION_SCHEMA 查问创立长期表,以及扫描文件系统目录以查找 FRM 文件。当初还能够利用 MySQL 优化器的全副性能,应用数据字典表上的索引来取得更好的执行打算。