关于data:Amazon-Aurora-并行查询加速分析处理的利器

33次阅读

共计 22555 个字符,预计需要花费 57 分钟才能阅读完成。

Amazon Aurora 既具备高端商用数据库的性能和可用性,又具备开源数据库的简略性和老本效益。它提供了比规范 MySQL 高五倍的吞吐量,并且具备更高的可扩展性、持久性和安全性。 Amazon Aurora应用了计算和存储拆散的架构,数据库集群蕴含一个或多个数据库计算实例以及一个跨多可用区的数据存储层。


Amazon Aurora 集群架构图

Amazon Aurora Parallel Query(并行查问)是 Aurora数据库的一项性能,实用于兼容 MySQL 的 Amazon AuroraAurora 最新的 MySQL 5.6 和 MySQL 5.7 兼容版本均反对并行查问。并行查问充分利用了 Aurora 的架构,将解决向下推送到 Aurora 存储层,将计算散布到数千个节点上。通过将剖析查询处理卸载到 Aurora 存储层,并行查问缩小了与事务工作负载对网络、CPU 和缓冲池的争用,能够将查问速度进步多达两个数量级,同时放弃外围事务工作负载的高吞吐量。

并行查问非常适合 Aurora MySQL 数据库集群里具备蕴含数百万行的表以及须要数分钟或数小时能力实现的剖析查问。本文通过测试了各场景下的查问耗时,并行查问的启用对于 OLTP 事务性查问影响甚微,而对于 OLAP 剖析性查问则能显著进步速度。例如在试验场景下:在千万行至亿行的数据集里,应用 db.r5.2xlarge 机型,运行一个多表联结剖析查问,禁用并行查问时,耗时为 22 分 1.33 秒,开启并行查问后,耗时仅为 39.74 秒。

以下为试验内容,具体阐明了试验步骤并展示了并行查问所带来的优化成果。

试验

环境筹备

试验须要预置 Aurora MySQL 数据库集群以及 MySQL 客户端实例。

并行查问现已在由光环新网经营的亚马逊云科技中国(北京)区域以及由西云数据经营的亚马逊云科技中国(宁夏)区域正式推出。本试验应用了亚马逊云科技中国宁夏区域。

1. Aurora MySQL 数据库集群

要创立具备并行查问的 Aurora MySQL 集群,能够应用与其余 Aurora MySQL 集群雷同的亚马逊云科技治理控制台和 Amazon CLI 办法。您能够创立新的集群以应用并行查问,也能够通过从 MySQL 兼容的 Aurora 数据库集群的快照还原,创立一个数据库集群以应用并行查问。

在抉择 Aurora MySQL 引擎版本时,建议您抉择与 MySQL 5.7 兼容的最新引擎 Aurora MySQL 2.09 或更高版本,以及与 MySQL 5.6 兼容的 Aurora MySQL 1.23 或更高版本。应用这些版本,应用并行查问的限度起码,这些版本还具备最大的灵活性,能够随时关上或敞开并行查问。

本试验采纳了 Aurora MySQL 2.09 版本。


抉择反对并行查问性能的版本

集群中的数据库实例必须应用 db.r* 实例类才可反对并行查问。

本试验针对大小两种数据集,选用了大小两种机型。针对 10G 级数据集,选用了 db.r5.xlarge 实例类型。针对 100G 级数据集,选用了 db.r5.2xlarge 实例类型。


抉择反对并行查问性能的实例类型

更多详情请参考 Aurora 用户指南中的创立应用并行查问的数据库集群。https://docs.aws.amazon.com/z…

2. MySQL 客户端实例

依据 Aurora MySQL 集群连贯配置的选项,在可拜访数据库集群的网络环境内创立一台 EC2 实例。因为须要生成预加载测量数据,留神保障实例配置的存储空间短缺。

更多详情请参考启用实例以及连贯到 Amazon Aurora 数据库集群。

启用实例: https://docs.aws.amazon.com/z…  
Amazon Aurora 数据库集群:https://docs.aws.amazon.com/z…

数据筹备

本试验应用了 TPC-H 数据集。您能够从 TPC-H 网站获取生成示例数据的表定义、查问以及 dbgen 程序。请参考以下步骤将数据集导入到 Aurora MySQL 集群。
http://www.tpc.org/tpch/

1. 下载 TPC- H 工具

关上 TPC Download Current 页面,找到 TPC- H 项,以后最新版本为 2.18.0,点击下载
http://tpc.org/tpc_documents_…


下载 TPC- H 工具

将下载的工具压缩包拷贝到 MySQL 客户端实例,解压。

2. 生成测试数据

进入子目录 dbgen,编辑 makefile 文件:

[ec2-user@ip-172-31-42-211 dbgen]$ cp makefile.suite makefile
[ec2-user@ip-172-31-42-211 dbgen]$ vi makefile

更新 makefile 文件中的编译器 cc 为 gcc,数据库引擎 DATABASE 为 MYSQL,操作系统 MACHINE 为 LINUX,负载 WORKLOAD 为 TPCH:

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE=MYSQL
MACHINE = LINUX
WORKLOAD = TPCH
#

因为 makefile 中的默认数据库选项没有 MYSQL 选项,须要在 tpch.h 文件中手动增加依赖。

关上 tpcd.h,在文件中增加代码片段:

#ifdef MYSQL
#define GEN_QUERY_PLAN ""#define START_TRAN"START TRANSACTION"#define END_TRAN"COMMIT"#define SET_OUTPUT""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif

执行编译命令 make,生成 dbgen 数据生成工具:

[ec2-user@ip-172-31-42-211 dbgen]$ make

应用 dbgen 工具生成数据,在参数中可指定生成数据的大小。运行完结后,产生了 8 个 tbl 文件,对应到 8 个表。

本试验测试了大小两种数据集,别离分成了 10G 的数据文件以及 100G 的数据文件。

小数据集指定了 10 的参数,生成数据文件如下:

[ec2-user@ip-172-31-42-211 dbgen]$ ./dbgen -s 100
[ec2-user@ip-172-31-42-211 dbgen]$ ls *.tbl -lh
-rw-rw-r-- 1 ec2-user ec2-user 234M customer.tbl
-rw-rw-r-- 1 ec2-user ec2-user 7.3G lineitem.tbl
-rw-rw-r-- 1 ec2-user ec2-user 2.2K nation.tbl
-rw-rw-r-- 1 ec2-user ec2-user 1.7G orders.tbl
-rw-rw-r-- 1 ec2-user ec2-user 233M part.tbl
-rw-rw-r-- 1 ec2-user ec2-user 1.2G partsupp.tbl
-rw-rw-r-- 1 ec2-user ec2-user  389 region.tbl
-rw-rw-r-- 1 ec2-user ec2-user  14M supplier.tbl

大数据集指定了 100 的参数,生成数据文件如下:

[ec2-user@ip-172-31-42-211 dbgen]$ ./dbgen -s 100
[ec2-user@ip-172-31-42-211 dbgen]$ ls *.tbl -lh
-rw-rw-r-- 1 ec2-user ec2-user 2.3G customer.tbl
-rw-rw-r-- 1 ec2-user ec2-user  75G lineitem.tbl
-rw-rw-r-- 1 ec2-user ec2-user 2.2K nation.tbl
-rw-rw-r-- 1 ec2-user ec2-user  17G orders.tbl
-rw-rw-r-- 1 ec2-user ec2-user 2.3G part.tbl
-rw-rw-r-- 1 ec2-user ec2-user  12G partsupp.tbl
-rw-rw-r-- 1 ec2-user ec2-user  389 region.tbl
-rw-rw-r-- 1 ec2-user ec2-user 137M supplier.tbl

3. 导入测试数据

tpch 工具包里提供了两个脚本:dss.ddl 数据库及表的初始化脚本,dss.ri 数据表的主键索引及外键脚本。咱们将先执行 dss.ddl 脚本创立数据库及表,再执行 dss.ri 创立对应表的主键索引及外键关联,最初导入数据。

因为脚本不适用于 MySQL,须要做相应调整。

更新 dss.ddl 头部:

create database tpch;
use tpch;

连贯 MySQL:

mysql --host=database-test-instance-1.xxx.rds.cn-northwest-1.amazonaws.com.cn --user=admin --password=xxx

运行 dss.ddl 脚本,留神替换对应 dss.ri 文件门路:

MySQL [(none)]> \. /home/ec2-user/2.18.0_rc2/dbgen/dss.ddl

查看已建胜利的 tpch 数据库:

MySQL [tpch]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| tpch               |
+--------------------+

查看已建胜利的数据库表:

MySQL [tpch]> use tpch;
Database changed
MySQL [tpch]> show tables;
+----------------+
| Tables_in_tpch |
+----------------+
| CUSTOMER       |
| LINEITEM       |
| NATION         |
| ORDERS         |
| PART           |
| PARTSUPP       |
| REGION         |
| SUPPLIER       |
+----------------+

更新 dss.ri,包含批改连贯形式,将 CONNECT TO TPCD 更新为 use tpch;替换数据库名 TPCD. 为 tpch.;减少外键关联列;将大写表名改为小写表名,以适配查问语句;更新完的 dss.ri 残缺内容为:

use tpch;
-- For table REGION
ALTER TABLE tpch.REGION
ADD PRIMARY KEY (R_REGIONKEY);

-- For table NATION
ALTER TABLE tpch.NATION
ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE tpch.NATION
ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references tpch.REGION(R_REGIONKEY);
COMMIT WORK;

-- For table PART
ALTER TABLE tpch.PART
ADD PRIMARY KEY (P_PARTKEY);
COMMIT WORK;

-- For table SUPPLIER
ALTER TABLE tpch.SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE tpch.SUPPLIER
ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references tpch.NATION(N_NATIONKEY);
COMMIT WORK;

-- For table PARTSUPP
ALTER TABLE tpch.PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;

-- For table CUSTOMER
ALTER TABLE tpch.CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE tpch.CUSTOMER
ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references tpch.NATION(N_NATIONKEY);
COMMIT WORK;

-- For table LINEITEM
ALTER TABLE tpch.LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
COMMIT WORK;

-- For table ORDERS
ALTER TABLE tpch.ORDERS
ADD PRIMARY KEY (O_ORDERKEY);
COMMIT WORK;

-- For table PARTSUPP
ALTER TABLE tpch.PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references tpch.SUPPLIER(S_SUPPKEY);
COMMIT WORK;
ALTER TABLE tpch.PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references tpch.PART(P_PARTKEY);
COMMIT WORK;

-- For table ORDERS
ALTER TABLE tpch.ORDERS
ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references tpch.CUSTOMER(C_CUSTKEY);
COMMIT WORK;

-- For table LINEITEM
ALTER TABLE tpch.LINEITEM
ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references tpch.ORDERS(O_ORDERKEY);
COMMIT WORK;
ALTER TABLE tpch.LINEITEM
ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references
        tpch.PARTSUPP(PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;

alter table CUSTOMER    rename to customer ;
alter table LINEITEM    rename to lineitem ;
alter table NATION    rename to nation   ;
alter table ORDERS    rename to orders   ;
alter table PART    rename to part     ;
alter table PARTSUPP    rename to partsupp ;
alter table REGION    rename to region   ;
alter table SUPPLIER    rename to supplier ;

运行 dss.ri 脚本建设数据库表主键及外键,留神替换对应 dss.ri 文件门路:

MySQL [(none)]> \. /home/ec2-user/2.18.0_rc2/dbgen/dss.ri

按程序执行脚本将 tbl 文件导入数据库相应表,留神替换对应 tbl 文件门路:

load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/region.tbl' into table region fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/nation.tbl' into table nation fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/part.tbl' into table part fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/supplier.tbl' into table supplier fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/partsupp.tbl' into table partsupp fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/customer.tbl' into table customer fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/orders.tbl' into table orders fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/lineitem.tbl' into table lineitem fields terminated by '|' lines terminated by '|\n';

导入实现后,可查看已导入的数据。测试查问中应用的表包含:customer, lineitem, orders,请察看各表的导入情况。

以下为 10G 级的数据表导入情况,数据表有百万行至千万行级数据:

MySQL [tpch]> show table status;
+----------+--------+---------+------------+----------+----------------+-------------+
| Name     | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length |
+----------+--------+---------+------------+----------+----------------+-------------+
| customer | InnoDB |      10 | Dynamic    |  1480453 |            194 |   288112640 |
| lineitem | InnoDB |      10 | Dynamic    | 55000836 |            148 |  8158969856 |
| nation   | InnoDB |      10 | Dynamic    |       25 |            655 |       16384 |
| orders   | InnoDB |      10 | Dynamic    | 14213703 |            131 |  1873805312 |
| part     | InnoDB |      10 | Dynamic    |  1912304 |            162 |   310149120 |
| partsupp | InnoDB |      10 | Dynamic    |  7467820 |            268 |  2005925888 |
| region   | InnoDB |      10 | Dynamic    |        5 |           3276 |       16384 |
| supplier | InnoDB |      10 | Dynamic    |    98503 |            186 |    18366464 |
+----------+--------+---------+------------+----------+----------------+-------------+

以下为 100G 级的数据表导入情况,数据表有千万行至亿行级数据:

+----------+--------+---------+------------+-----------+----------------+-------------+
| Name     | Engine | Version | Row_format | Rows      | Avg_row_length | Data_length |
+----------+--------+---------+------------+-----------+----------------+-------------+
| customer | InnoDB |      10 | Dynamic    |  13528638 |            194 |  2631925760 |
| lineitem | InnoDB |      10 | Dynamic    | 106592268 |            137 | 14687404032 |
| nation   | InnoDB |      10 | Dynamic    |        25 |            655 |       16384 |
| orders   | InnoDB |      10 | Dynamic    | 141958614 |            131 | 18708692992 |
| part     | InnoDB |      10 | Dynamic    |  18240054 |            162 |  2970615808 |
| partsupp | InnoDB |      10 | Dynamic    |  79991226 |            252 | 20198719488 |
| region   | InnoDB |      10 | Dynamic    |         5 |           3276 |       16384 |
| supplier | InnoDB |      10 | Dynamic    |    988185 |            178 |   176865280 |
+----------+--------+---------+------------+-----------+----------------+-------------+

查问测试

在 Aurora MySQL 1.23 或 2.09 及更高版本中,默认状况下,并行查问和哈希联接设置都处于敞开状态。并行查问的启用或禁用可通过批改数据库的参数 aurora_parallel_query。默认参数组为只读,批改参数组请参考应用数据库参数组和数据库集群参数组。https://docs.aws.amazon.com/z…

查看集群并行查问启动状态可用以下命令:

mysql> select @@aurora_parallel_query;
+-------------------------+
| @@aurora_parallel_query |
+-------------------------+
|                       1 |
+-------------------------+

默认状况下,即便启用了并行查问,Aurora 查问优化器会依据查问主动决定是否应用并行查问,可在会话级别应用命令强制开启,笼罩查问优化器的主动抉择:
mysql> set aurora_pq_force = 1;

本试验别离针对单表事务查问,单表剖析查问,以及多表剖析查问,应用了不同查问脚本进行了测试。

查问脚本里波及到的表包含:orders, customer, lineitem.

orders 表构造如下:

MySQL [tpch]> describe orders;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| O_ORDERKEY      | int(11)       | NO   | PRI | NULL    |       |
| O_CUSTKEY       | int(11)       | NO   | MUL | NULL    |       |
| O_ORDERSTATUS   | char(1)       | NO   |     | NULL    |       |
| O_TOTALPRICE    | decimal(15,2) | NO   |     | NULL    |       |
| O_ORDERDATE     | date          | NO   |     | NULL    |       |
| O_ORDERPRIORITY | char(15)      | NO   |     | NULL    |       |
| O_CLERK         | char(15)      | NO   |     | NULL    |       |
| O_SHIPPRIORITY  | int(11)       | NO   |     | NULL    |       |
| O_COMMENT       | varchar(79)   | NO   |     | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+

customer 表构造如下:

MySQL [tpch]> describe customer;
+--------------+---------------+------+-----+---------+-------+
| Field        | Type          | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+-------+
| C_CUSTKEY    | int(11)       | NO   | PRI | NULL    |       |
| C_NAME       | varchar(25)   | NO   |     | NULL    |       |
| C_ADDRESS    | varchar(40)   | NO   |     | NULL    |       |
| C_NATIONKEY  | int(11)       | NO   | MUL | NULL    |       |
| C_PHONE      | char(15)      | NO   |     | NULL    |       |
| C_ACCTBAL    | decimal(15,2) | NO   |     | NULL    |       |
| C_MKTSEGMENT | char(10)      | NO   |     | NULL    |       |
| C_COMMENT    | varchar(117)  | NO   |     | NULL    |       |
+--------------+---------------+------+-----+---------+-------+

lineitem 表构造如下:

MySQL [tpch]> describe lineitem;
+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| L_ORDERKEY      | int(11)       | NO   | PRI | NULL    |       |
| L_PARTKEY       | int(11)       | NO   | MUL | NULL    |       |
| L_SUPPKEY       | int(11)       | NO   |     | NULL    |       |
| L_LINENUMBER    | int(11)       | NO   | PRI | NULL    |       |
| L_QUANTITY      | decimal(15,2) | NO   |     | NULL    |       |
| L_EXTENDEDPRICE | decimal(15,2) | NO   |     | NULL    |       |
| L_DISCOUNT      | decimal(15,2) | NO   |     | NULL    |       |
| L_TAX           | decimal(15,2) | NO   |     | NULL    |       |
| L_RETURNFLAG    | char(1)       | NO   |     | NULL    |       |
| L_LINESTATUS    | char(1)       | NO   |     | NULL    |       |
| L_SHIPDATE      | date          | NO   |     | NULL    |       |
| L_COMMITDATE    | date          | NO   |     | NULL    |       |
| L_RECEIPTDATE   | date          | NO   |     | NULL    |       |
| L_SHIPINSTRUCT  | char(25)      | NO   |     | NULL    |       |
| L_SHIPMODE      | char(10)      | NO   |     | NULL    |       |
| L_COMMENT       | varchar(44)   | NO   |     | NULL    |       |
+-----------------+---------------+------+-----+---------+-------+

单表事务查问脚本如下,针对 orders 表的不同列的指定了条件进行查问:

SELECT *
FROM orders
WHERE o_custkey = 3689999
  AND o_orderdate > date '1995-03-14'
  AND o_orderstatus = 'O'
  LIMIT 15;

单表剖析查问脚本如下,针对 orders 表的不同列的指定了条件,并应用统计函数 avg 进行剖析查问:

SELECT avg(o_totalprice)
FROM orders
WHERE o_orderdate > date '1995-03-14'
  AND o_orderstatus = 'O'
  AND o_orderpriority not in ('1-URGENT', '2-HIGH');

多表剖析查问脚本如下,关联了多表 customer, orders, lineitem,并指定条件进行剖析查问:

SELECT
  l_orderkey,
  SUM(l_extendedprice * (1-l_discount)) AS revenue,
  o_orderdate,
  o_shippriority
FROM customer, orders, lineitem
WHERE
  c_mktsegment='AUTOMOBILE'
  AND c_custkey = o_custkey
  AND l_orderkey = o_orderkey
  AND o_orderdate < date '1995-03-14'
  AND l_shipdate > date '1995-03-14'
GROUP BY
  l_orderkey,
  o_orderdate,
  o_shippriority
ORDER BY
  revenue DESC,
  o_orderdate LIMIT 15;

在多表联结查问场景下,Aurora MySQL 在须要应用 equijoin 联接大量数据时,哈希联接能够进步查问性能。为并行查问集群启用哈希联接,可设置集群配置参数 aurora_disable_hash_join=OFF,启用与并行查问联合应用的哈希联接优化。

查看集群哈希联接优化启动状态可用以下命令:

mysql> select @@aurora_disable_hash_join;
+----------------------------+
| @@aurora_disable_hash_join |
+----------------------------+
|                          0 |
+----------------------------+

Aurora 查问优化器会依据查问主动决定是否应用哈希联接,可在会话级别应用命令强制开启,笼罩查问优化器的主动抉择:

mysql> SET optimizer_switch='hash_join=on';
mysql> SET optimizer_switch='hash_join_cost_based=off';

本试验别离在导入了大小两种数据集的两个数据库集群上(别离为:导入了 10G 测试数据集的 db.r5.xlarge 数据库集群,导入了 100G 测试数据集的 db.r5.2xlarge 的数据库集群),参照以下步骤进行了测试:

  • 禁用并行查问,查看确认并行查问的禁用状态
  • 运行单表事务查问脚本
  • 剖析单表事务查问执行打算
  • 运行单表剖析查问脚本
  • 剖析单表剖析查问执行打算
  • 运行多表剖析查问脚本
  • 剖析多表剖析查问执行打算
    启用并行查问,启用哈希连贯,查看确认并行查问及哈希连贯的启用状态
  • 运行单表事务查问脚本
  • 剖析单表事务查问执行打算
  • 运行单表剖析查问脚本
  • 剖析单表剖析查问执行打算
  • 运行多表剖析查问脚本
  • 剖析多表剖析查问执行打算
    通过剖析查问打算,可看到传统查问打算与并行查问打算之间的区别。在启用并行查问后,查问中的步骤能够应用并行查问优化,如 EXPLAIN 输入中的 Extra 列所示。这些步骤的 I/O 密集型和 CPU 密集型解决将向下推送到存储层。

以下为 100G 测试数据集的查问后果及执行打算剖析:

单表事务查问(禁用并行查问)的后果:

MySQL [tpch]> SELECT *
    -> FROM orders
    -> WHERE o_custkey = 3689999
    ->   AND o_orderdate > date '1995-03-14'
    ->   AND o_orderstatus = 'O'
    ->   LIMIT 15;
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
| O_ORDERKEY | O_CUSTKEY | O_ORDERSTATUS | O_TOTALPRICE | O_ORDERDATE | O_ORDERPRIORITY | O_CLERK         | O_SHIPPRIORITY | O_COMMENT                                                       |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
|          1 |   3689999 | O             |    224560.83 | 1996-01-02  | 5-LOW           | Clerk#000095055 |              0 | nstructions sleep furiously among                               |
|   37007300 |   3689999 | O             |    189889.17 | 1998-05-12  | 5-LOW           | Clerk#000097117 |              0 | ully. carefully busy accoun                                     |
|  110398694 |   3689999 | O             |     40190.10 | 1998-06-08  | 3-MEDIUM        | Clerk#000071343 |              0 |  ideas? quickly thin accounts wake slyly. blithely              |
|  166279651 |   3689999 | O             |      8270.88 | 1997-10-23  | 1-URGENT        | Clerk#000039384 |              0 | ic, final accounts sleep. blithely pending requests nag slyly u |
|  276450979 |   3689999 | O             |     43595.81 | 1997-04-26  | 3-MEDIUM        | Clerk#000010520 |              0 |  quickly alongside of the furiously expr                        |
|  404928295 |   3689999 | O             |     20719.85 | 1996-11-05  | 2-HIGH          | Clerk#000027012 |              0 | y regular platelets                                             |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
6 rows in set (0.00 sec)

单表事务查问(禁用并行查问)的执行打算剖析:

+----+-------------+--------+...+-------------+
| id | select_type | table  |...| Extra       |
+----+-------------+--------+...+-------------+
|  1 | SIMPLE      | orders |...| Using where |
+----+-------------+--------+...+-------------+

单表事务查问(启用并行查问)的后果:

MySQL [tpch]> SELECT *
    -> FROM orders
    -> WHERE o_custkey = 3689999
    ->   AND o_orderdate > date '1995-03-14'
    ->   AND o_orderstatus = 'O'
    ->   LIMIT 15;
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
| O_ORDERKEY | O_CUSTKEY | O_ORDERSTATUS | O_TOTALPRICE | O_ORDERDATE | O_ORDERPRIORITY | O_CLERK         | O_SHIPPRIORITY | O_COMMENT                                                       |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
|          1 |   3689999 | O             |    224560.83 | 1996-01-02  | 5-LOW           | Clerk#000095055 |              0 | nstructions sleep furiously among                               |
|   37007300 |   3689999 | O             |    189889.17 | 1998-05-12  | 5-LOW           | Clerk#000097117 |              0 | ully. carefully busy accoun                                     |
|  110398694 |   3689999 | O             |     40190.10 | 1998-06-08  | 3-MEDIUM        | Clerk#000071343 |              0 |  ideas? quickly thin accounts wake slyly. blithely              |
|  166279651 |   3689999 | O             |      8270.88 | 1997-10-23  | 1-URGENT        | Clerk#000039384 |              0 | ic, final accounts sleep. blithely pending requests nag slyly u |
|  276450979 |   3689999 | O             |     43595.81 | 1997-04-26  | 3-MEDIUM        | Clerk#000010520 |              0 |  quickly alongside of the furiously expr                        |
|  404928295 |   3689999 | O             |     20719.85 | 1996-11-05  | 2-HIGH          | Clerk#000027012 |              0 | y regular platelets                                             |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
6 rows in set (0.00 sec)

单表事务查问(启用并行查问)的执行打算剖析:

+----+-------------+--------+...+-------------+
| id | select_type | table  |...| Extra       |
+----+-------------+--------+...+-------------+
|  1 | SIMPLE      | orders |...| Using where |
+----+-------------+--------+...+-------------+

单表剖析查问(禁用并行查问)的后果:

MySQL [tpch]> SELECT avg(o_totalprice)
    -> FROM orders
    -> WHERE o_orderdate > date '1995-03-14'
    ->   AND o_orderstatus = 'O'
    ->   AND o_orderpriority not in ('1-URGENT', '2-HIGH');
+-------------------+
| avg(o_totalprice) |
+-------------------+
|     150271.119856 |
+-------------------+
1 row in set (5 min 39.08 sec)

单表剖析查问(禁用并行查问)的执行打算剖析:

+----+-------------+--------+...+-------------+
| id | select_type | table  |...| Extra       |
+----+-------------+--------+...+-------------+
|  1 | SIMPLE      | orders |...| Using where |
+----+-------------+--------+...+-------------+

单表剖析查问(启用并行查问)的后果:

MySQL [tpch]> SELECT avg(o_totalprice)
    -> FROM orders
    -> WHERE o_orderdate > date '1995-03-14'
    ->   AND o_orderstatus = 'O'
    ->   AND o_orderpriority not in ('1-URGENT', '2-HIGH');
+-------------------+
| avg(o_totalprice) |
+-------------------+
|     150271.119856 |
+-------------------+
1 row in set (20.24 sec)

单表剖析查问(启用并行查问)的执行打算剖析:

+----+-------------+--------+...+----------------------------------------------------------------------------+
| id | select_type | table  |...| Extra                                                                      |
+----+-------------+--------+...+----------------------------------------------------------------------------+
|  1 | SIMPLE      | orders |...| Using where; Using parallel query (5 columns, 2 filters, 1 exprs; 0 extra) |
+----+-------------+--------+...+----------------------------------------------------------------------------+

多表剖析查问(禁用并行查问)的后果:

MySQL [tpch]> SELECT
    ->   l_orderkey,
    ->   SUM(l_extendedprice * (1-l_discount)) AS revenue,
    ->   o_orderdate,
    ->   o_shippriority
    -> FROM customer, orders, lineitem
    -> WHERE
    ->   c_mktsegment='AUTOMOBILE'
    ->   AND c_custkey = o_custkey
    ->   AND l_orderkey = o_orderkey
    ->   AND o_orderdate < date '1995-03-14'
    ->   AND l_shipdate > date '1995-03-14'
    -> GROUP BY
    ->   l_orderkey,
    ->   o_orderdate,
    ->   o_shippriority
    -> ORDER BY
    ->   revenue DESC,
    ->   o_orderdate LIMIT 15;
+------------+-------------+-------------+----------------+
| l_orderkey | revenue     | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
|   81011334 | 455300.0146 | 1995-03-07  |              0 |
|   28840519 | 454748.2485 | 1995-03-08  |              0 |
|   16384100 | 450935.1906 | 1995-03-02  |              0 |
|   72587110 | 443895.1245 | 1995-03-01  |              0 |
|   11982337 | 433364.5961 | 1995-02-15  |              0 |
|   34736612 | 428316.3377 | 1995-02-19  |              0 |
|   62597284 | 425985.1162 | 1995-03-04  |              0 |
|   59481859 | 421696.5251 | 1995-03-12  |              0 |
|   76740996 | 421355.8745 | 1995-02-25  |              0 |
|   20601378 | 419369.0300 | 1995-03-13  |              0 |
|   23482308 | 418992.5933 | 1995-02-14  |              0 |
|    3400066 | 418830.9286 | 1995-03-06  |              0 |
|   53367108 | 413322.3462 | 1995-03-06  |              0 |
|   44846022 | 412002.8474 | 1995-03-06  |              0 |
|   41160167 | 409386.8393 | 1995-03-09  |              0 |
+------------+-------------+-------------+----------------+
15 rows in set (22 min 1.33 sec)

多表剖析查问(禁用并行查问)的执行打算剖析:

+----+-------------+----------+...+----------------------------------------------+
| id | select_type | table    |...| Extra                                        |
+----+-------------+----------+...+----------------------------------------------+
|  1 | SIMPLE      | customer |...| Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | orders   |...| Using where                                  |
|  1 | SIMPLE      | lineitem |...| Using where                                  |
+----+-------------+----------+...+----------------------------------------------+

多表剖析查问(启用并行查问)的后果:

MySQL [tpch]> SELECT
    ->   l_orderkey,
    ->   SUM(l_extendedprice * (1-l_discount)) AS revenue,
    ->   o_orderdate,
    ->   o_shippriority
    -> FROM customer, orders, lineitem
    -> WHERE
    ->   c_mktsegment='AUTOMOBILE'
    ->   AND c_custkey = o_custkey
    ->   AND l_orderkey = o_orderkey
    ->   AND o_orderdate < date '1995-03-14'
    ->   AND l_shipdate > date '1995-03-14'
    -> GROUP BY
    ->   l_orderkey,
    ->   o_orderdate,
    ->   o_shippriority
    -> ORDER BY
    ->   revenue DESC,
    ->   o_orderdate LIMIT 15;
+------------+-------------+-------------+----------------+
| l_orderkey | revenue     | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
|   81011334 | 455300.0146 | 1995-03-07  |              0 |
|   28840519 | 454748.2485 | 1995-03-08  |              0 |
|   16384100 | 450935.1906 | 1995-03-02  |              0 |
|   72587110 | 443895.1245 | 1995-03-01  |              0 |
|   11982337 | 433364.5961 | 1995-02-15  |              0 |
|   34736612 | 428316.3377 | 1995-02-19  |              0 |
|   62597284 | 425985.1162 | 1995-03-04  |              0 |
|   59481859 | 421696.5251 | 1995-03-12  |              0 |
|   76740996 | 421355.8745 | 1995-02-25  |              0 |
|   20601378 | 419369.0300 | 1995-03-13  |              0 |
|   23482308 | 418992.5933 | 1995-02-14  |              0 |
|    3400066 | 418830.9286 | 1995-03-06  |              0 |
|   53367108 | 413322.3462 | 1995-03-06  |              0 |
|   44846022 | 412002.8474 | 1995-03-06  |              0 |
|   41160167 | 409386.8393 | 1995-03-09  |              0 |
+------------+-------------+-------------+----------------+
15 rows in set (39.74 sec)

多表剖析查问(启用并行查问)的执行打算剖析:

+----+-------------+----------+...+--------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table    |...| Extra                                                                                                                          |
+----+-------------+----------+...+--------------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | customer |...| Using where; Using temporary; Using filesort; Using parallel query (2 columns, 1 filters, 0 exprs; 0 extra)                    |
|  1 | SIMPLE      | orders   |...| Using where; Using join buffer (Hash Join Outer table orders); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)   |
|  1 | SIMPLE      | lineitem |...| Using where; Using join buffer (Hash Join Outer table lineitem); Using parallel query (5 columns, 1 filters, 1 exprs; 0 extra) |
+----+-------------+----------+...+--------------------------------------------------------------------------------------------------------------------------------+

除了监控 Amazon Aurora 数据库集群指标中所述的 Amazon CloudWatch 指标以外,Aurora 还提供了其余全局状态变量。能够应用这些全局状态变量来帮忙监督并行查问执行状况。它们能够让您深刻理解为什么优化程序在给定状况下可能应用或不应用并行查问。残缺变量列表,请参见监控并行查问。
https://docs.aws.amazon.com/z…

运行以下命令来查看并行查问相干的状态:

MySQL [tpch]> SHOW GLOBAL STATUS LIKE 'Aurora_pq%';
+--------------------------------------------------------------+-------+
| Variable_name                                                | Value |
+--------------------------------------------------------------+-------+
| Aurora_pq_max_concurrent_requests                            | 4     |
| Aurora_pq_request_attempted                                  | 1     |
| Aurora_pq_request_attempted_grouping_aggr                    | 0     |
| Aurora_pq_request_attempted_partition_table                  | 0     |
| Aurora_pq_request_by_force_config                            | 1     |
| Aurora_pq_request_by_global_config                           | 0     |
| Aurora_pq_request_by_hint                                    | 0     |
| Aurora_pq_request_by_session_config                          | 0     |
| Aurora_pq_request_executed                                   | 1     |

总结

本文通过测试工作负载,展现了各场景下的查问耗时。并行查问的启用对于 OLTP 事务性查问影响甚微,而对于 OLAP 剖析性查问则显著进步了查问速度。


阐明:单表事务查问为 OLTP 类查问语句,应用了外键,查问速度很快;单表剖析查问和多表剖析查问为 OLAP 类查问语句,须要做全表扫描及关联查问,查问耗时较长。

应用并行查问,您能够对 Aurora MySQL 表运行数据密集型剖析查问。在很多状况下,与传统的查询处理分工相比,性能进步了一个数量级,同时还能放弃外围事务工作负载的高吞吐量。

只需在 Amazon RDS 治理控制台中单击几次或者下载最新的亚马逊云科技开发工具包或 CLI 即可在新的或现有的 Aurora 集群中启用并行查问,请浏览 Aurora 文档理解更多信息。

Amazon RDS 治理控制台:
https://console.aws.amazon.co…

urora 文档:https://docs.aws.amazon.com/A…

您能够将并行查问与寰球数据库等其余 Aurora MySQL 5.7 性能联合应用。此外,此性能对 MySQL 5.7 和 MySQL 5.6 数据库的适用性已扩大到包含中国北京和宁夏的 20 几个亚马逊云科技区域。无关提供并行查问的区域残缺列表,请参阅 Aurora 定价。
https://aws.amazon.com/cn/rds…

本篇作者

林煜晨
亚马逊云科技解决方案架构师

负责互联网行业云端架构征询和设计。从事从年微软解决方案开发及征询,微软认证技术专家。之前就任于汤森路透负责技术专家,参于金融数据平台设计研发,在元数据管理系统架构设计和流程自动化畛域有丰盛教训。

正文完
 0