关于mysql:PostgreSQL-vs-MySQL-TPCH-测试

剖析类查问即使是 TPC- H 中的简略剖析查问，PostgreSQL 的性能都要远远优于 MySQL。昆仑数据库继承并扩大了 PostgreSQL 在剖析类 SQL 查询处理方面的弱小能力，能够反对 TPC- H 和 TPC-DS 的所有查问。同时，一个昆仑数据库集群能够治理的数据规模远远大于一个 PostgreSQL 实例。

服务器配置：PostgreSQL 和 MySQL 别离部署在一台：亚马逊 m5.4xlarge（CPU 8cores 16 Threads, 内存 64G，存储 gp3, 通用型 SSD 卷 3000IOPS，125 MB/s 吞吐量）上。

软件版本：

PostgreSQL：PostgreSQL 12.7 on x86_64-koji-linux-gn

MySQL: percona 8.0.26-16

数据库参数配置：

PostgreSQL：shared_buffers 8192MB

MySQL: innodb_buffer_pool_size 8192MB

测试背景：PostgreSQL 和 MySQL 采纳默认的装置配置，只调整了内存参数，整个测试过程 PostgreSQL 没有任何优化行为。

表的信息：

1 张事实表：lineorder

4 张维度表：customer，part，supplier，dates

表占用操作系统存储空间：19G

数据查问：11 条规范 SQL 查问测试语句（统计查问、多表关联、sum、简单条件、group by、order by 等组合形式）。

具体 SQL 语句：附录 1 TPC- H 测试 SQL。

MySQL 错误信息：ERROR 3 (HY000): Error writing file ‘/kunlun2/data10/6010/tmp/MYfd=332′(OS errno 28-No space left on device)

剖析及总结： 通过比照同一 SQL 语句在 PostgreSQL 和 MySQL 执行打算，会发现 MySQL 的执行打算没有采纳最优的 join 秩序及并行操作，导致性能差。

譬如 Q2.1 MySQL 执行打算：

执行打算剖析： 上述执行打算首先将几个维度表做 join（dates 和 supplier 和 part), 失去的后果再与事实表 lineorder join, 因此得出了一个超级大的两头后果集，数据量达到 10 的 15 次方的数量（查问打算第五行返回的 rows），最终导致临时文件耗尽磁盘空间而未能实现查问。

优化计划： 通过在 SQL 语句中强制指定表 join 秩序：首先与 part 表 join 失去一个最小的数据子集，而后再与 supplier 和 dates join, 逐渐放大范畴，查问语句及查问打算的成果如下：

explain format=tree  selectsum(lo_revenue) as lo_revenue, d_year as year, p_brand from ((lineorderstraight_join part on lo_partkey = p_partkey) straight_join  supplier on  lo_suppkey = s_suppkey)  straight_join dates ON  lo_orderdate = d_datekey where p_category ='MFGR#12' and s_region = 'AMERICA' group by year, p_brand order by year,p_brand;

在 MySQL 上指定 join 秩序，

同时开启动并行查问：set local innodb_parallel_read_threads=16

执行工夫：1 min 53.72 sec

而 PostgreSQL 的执行打算：

执行工夫：1133ms

PostgreSQL 主动采纳最优的 join 秩序，大大减少了最初排序的数据量，并且启动了 3 个并行宰割数据集，因而执行的性能比手动优化后的 MySQL 还快 10 倍。

论断：MySQL 执行 TPC- H 的测试，须要手动优化查问语句，即使如此，性能任然远远低于 PostgreSQL，而 PostgreSQL 默认的配置就能够达到绝对好的性能。

Q1.1

select sum(lo_revenue) as revenue from lineorder 
join dates on lo_orderdate = d_datekey
where d_year  = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;

PostgreSQL 执行打算：

MySQL 执行打算：

Q1.2

 select sum(lo_revenue) as revenue   from lineorder
join dates on lo_orderdate = d_datekey
where d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;

PostgreSQL 执行打算：

MySQL 执行打算：

Q1.3

select sum(lo_revenue) as revenue   from lineorder
join dates on lo_orderdate = d_datekey
where d_weeknuminyear = 6 and year(d_datekey)  = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;

PostgreSQL 执行打算：

MySQL 执行打算：

Q2.1

select sum(lo_revenue) as lo_revenue, d_year as year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12' and s_region = 'AMERICA'
group by year, p_brand
order by year, p_brand;

PostgreSQL 执行打算：

MySQL 执行打算：

Q2.2

select sum(lo_revenue) as lo_revenue, year(d_datekey) as year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'
group by year, p_brand
order by year, p_brand;

PostgreSQL 执行打算：

MySQL 执行打算：

Q2.3

select sum(lo_revenue) as lo_revenue, year(d_datekey) as year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand = 'MFGR#2239' and s_region = 'EUROPE'
group by year, p_brand
order by year, p_brand;

PostgreSQL 执行打算：

MySQL 执行打算：

Q3.1

select c_nation, s_nation, year(d_datekey) as year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_region = 'ASIA' and s_region = 'ASIA' and year(d_datekey) between 1992 and 1997
group by c_nation, s_nation, year
order by year asc, lo_revenue desc;

PostgreSQL 执行打算：

MySQL 执行打算：

Q3.2

select c_city, s_city, year(d_datekey) as year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and year(d_datekey) between 1992 and 1997
group by c_city, s_city, year
order by year asc, lo_revenue desc;

PostgreSQL 执行打算：

MySQL 执行打算：

Q3.3

select c_city, s_city, year(d_datekey) as year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and year(d_datekey) between 1992 and 1997
group by c_city, s_city, year
order by year asc, lo_revenue desc;

PostgreSQL 执行打算：

MySQL 执行打算：

Q4.1

select year(d_datekey) as year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by year, c_nation
order by year, c_nation;

PostgreSQL 执行打算：

MySQL 执行打算：

Q4.2

select year(d_datekey) as year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_region = 'AMERICA'
and (year(d_datekey) = 1997 or year(d_datekey) = 1998)
and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by year, s_nation, p_category
order by year, s_nation, p_category;

PostgreSQL 执行打算：

MySQL 执行打算：

昆仑数据库是一个 HTAP NewSQL 分布式数据库管理系统，能够满足用户对海量关系数据的存储管理和利用的全方位需要。

利用开发者和 DBA 的应用昆仑数据库的体验与单机 MySQL 和单机 PostgreSQL 简直完全相同，因为首先昆仑数据库反对 PostgreSQL 和 MySQL 双协定，反对规范 SQL:2011 的 DML 语法和性能以及 PostgreSQL 和 MySQL 对规范 SQL 的扩大。同时，昆仑数据库集群反对程度弹性扩容，数据主动拆分，分布式事务处理和分布式查询处理，强壮的容错容灾能力，欠缺直观的监测剖析告警能力，集群数据备份和复原等罕用的 DBA 数据管理和操作。所有这些性能无需任何利用零碎侧的编码工作，也无需 DBA 人工染指，不停服不影响业务失常运行。

昆仑数据库具备全面的 OLAP 数据分析能力，通过了 TPC- H 和 TPC-DS 规范测试集，能够实时剖析最新的业务数据，帮忙用户发掘出数据的价值。昆仑数据库反对私有云和公有云环境的部署，能够与 docker，k8s 等云基础设施无缝合作，能够轻松搭建云数据库服务。

请拜访 http://www.zettadb.com/ 获取更多信息并且下载昆仑数据库软件、文档和材料。

KunlunDB 我的项目已开源
【GitHub：】
https://github.com/zettadb
【Gitee：】
https://gitee.com/zettadb

关于mysql:PostgreSQL-vs-MySQL-TPCH-测试

一、测试环境

二、测试数据

三、测试后果

四、附录：测试 SQL 及查问打算

END