剖析类查问即使是TPC-H中的简略剖析查问,PostgreSQL的性能都要远远优于MySQL。 昆仑数据库继承并扩大了PostgreSQL在剖析类SQL查询处理方面的弱小能力,能够反对TPC-H和TPC-DS的所有查问。同时,一个昆仑数据库集群能够治理的数据规模远远大于 一个PostgreSQL 实例。
一、测试环境
服务器配置:PostgreSQL和MySQL别离部署在一台:亚马逊m5.4xlarge(CPU 8cores 16 Threads,内存 64G,存储gp3, 通用型SSD卷 3000IOPS ,125 MB/s 吞吐量)上。
软件版本:
PostgreSQL:PostgreSQL 12.7 on x86_64-koji-linux-gn
MySQL: percona 8.0.26-16
数据库参数配置:
PostgreSQL:shared_buffers 8192MB
MySQL: innodb_buffer_pool_size 8192MB
测试背景:PostgreSQL和MySQL采纳默认的装置配置,只调整了内存参数, 整个测试过程PostgreSQL没有任何优化行为。
二、测试数据
表的信息:
1张事实表:lineorder
4张维度表:customer,part,supplier,dates
表占用操作系统存储空间:19G
数据查问:11条规范SQL查问测试语句(统计查问、多表关联、sum、简单条件、group by、order by等组合形式)。
具体SQL语句:附录1 TPC-H测试SQL。
三、测试后果
MySQL错误信息:ERROR 3 (HY000): Error writing file '/kunlun2/data10/6010/tmp/MYfd=332'(OS errno 28-No space left on device)
剖析及总结:通过比照同一SQL语句在PostgreSQL和MySQL执行打算,会发现MySQL的执行打算没有采纳最优的join秩序及并行操作,导致性能差。
譬如Q2.1 MySQL执行打算:
执行打算剖析:上述执行打算首先将几个维度表做join(dates和supplier和part),失去的后果再与事实表lineorder join,因此得出了一个超级大的两头后果集,数据量达到10的15次方的数量(查问打算第五行返回的 rows),最终导致临时文件耗尽磁盘空间而未能实现查问。
优化计划:通过在SQL语句中强制指定表join秩序:首先与part表join失去一个最小的数据子集,而后再与supplier和dates join,逐渐放大范畴,查问语句及查问打算的成果如下:
explain format=tree selectsum(lo_revenue) as lo_revenue, d_year as year, p_brand from ((lineorderstraight_join part on lo_partkey = p_partkey) straight_join supplier on lo_suppkey = s_suppkey) straight_join dates ON lo_orderdate = d_datekey where p_category ='MFGR#12' and s_region = 'AMERICA' group by year, p_brand order by year,p_brand;
在MySQL上指定join秩序,
同时开启动并行查问:set local innodb_parallel_read_threads=16
执行工夫:1 min 53.72 sec
而PostgreSQL的执行打算:
执行工夫:1133ms
PostgreSQL主动采纳最优的join秩序,大大减少了最初排序的数据量,并且启动了3个并行宰割数据集,因而执行的性能比手动优化后的MySQL还快10倍。
论断:MySQL执行TPC-H的测试,须要手动优化查问语句,即使如此,性能任然远远低于PostgreSQL,而PostgreSQL默认的配置就能够达到绝对好的性能。
四、附录:测试SQL及查问打算
Q1.1
select sum(lo_revenue) as revenue from lineorder join dates on lo_orderdate = d_datekeywhere d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
PostgreSQL执行打算:
MySQL执行打算:
Q1.2
select sum(lo_revenue) as revenue from lineorderjoin dates on lo_orderdate = d_datekeywhere d_yearmonthnum = 199401and lo_discount between 4 and 6and lo_quantity between 26 and 35;
PostgreSQL执行打算:
MySQL执行打算:
Q1.3
select sum(lo_revenue) as revenue from lineorderjoin dates on lo_orderdate = d_datekeywhere d_weeknuminyear = 6 and year(d_datekey) = 1994and lo_discount between 5 and 7and lo_quantity between 26 and 35;
PostgreSQL执行打算:
MySQL执行打算:
Q2.1
select sum(lo_revenue) as lo_revenue, d_year as year, p_brandfrom lineorderjoin dates on lo_orderdate = d_datekeyjoin part on lo_partkey = p_partkeyjoin supplier on lo_suppkey = s_suppkeywhere p_category = 'MFGR#12' and s_region = 'AMERICA'group by year, p_brandorder by year, p_brand;
PostgreSQL执行打算:
MySQL执行打算:
Q2.2
select sum(lo_revenue) as lo_revenue, year(d_datekey) as year, p_brandfrom lineorderjoin dates on lo_orderdate = d_datekeyjoin part on lo_partkey = p_partkeyjoin supplier on lo_suppkey = s_suppkeywhere p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'group by year, p_brandorder by year, p_brand;
PostgreSQL执行打算:
MySQL执行打算:
Q2.3
select sum(lo_revenue) as lo_revenue, year(d_datekey) as year, p_brandfrom lineorderjoin dates on lo_orderdate = d_datekeyjoin part on lo_partkey = p_partkeyjoin supplier on lo_suppkey = s_suppkeywhere p_brand = 'MFGR#2239' and s_region = 'EUROPE'group by year, p_brandorder by year, p_brand;
PostgreSQL执行打算:
MySQL执行打算:
Q3.1
select c_nation, s_nation, year(d_datekey) as year, sum(lo_revenue) as lo_revenuefrom lineorderjoin dates on lo_orderdate = d_datekeyjoin customer on lo_custkey = c_custkeyjoin supplier on lo_suppkey = s_suppkeywhere c_region = 'ASIA' and s_region = 'ASIA' and year(d_datekey) between 1992 and 1997group by c_nation, s_nation, yearorder by year asc, lo_revenue desc;
PostgreSQL执行打算:
MySQL执行打算:
Q3.2
select c_city, s_city, year(d_datekey) as year, sum(lo_revenue) as lo_revenuefrom lineorderjoin dates on lo_orderdate = d_datekeyjoin customer on lo_custkey = c_custkeyjoin supplier on lo_suppkey = s_suppkeywhere c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'and year(d_datekey) between 1992 and 1997group by c_city, s_city, yearorder by year asc, lo_revenue desc;
PostgreSQL执行打算:
MySQL执行打算:
Q3.3
select c_city, s_city, year(d_datekey) as year, sum(lo_revenue) as lo_revenuefrom lineorderjoin dates on lo_orderdate = d_datekeyjoin customer on lo_custkey = c_custkeyjoin supplier on lo_suppkey = s_suppkeywhere (c_city='UNITED KI1' or c_city='UNITED KI5')and (s_city='UNITED KI1' or s_city='UNITED KI5')and year(d_datekey) between 1992 and 1997group by c_city, s_city, yearorder by year asc, lo_revenue desc;
PostgreSQL执行打算:
MySQL执行打算:
Q4.1
select year(d_datekey) as year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profitfrom lineorderjoin dates on lo_orderdate = d_datekeyjoin customer on lo_custkey = c_custkeyjoin supplier on lo_suppkey = s_suppkeyjoin part on lo_partkey = p_partkeywhere c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')group by year, c_nationorder by year, c_nation;
PostgreSQL执行打算:
MySQL执行打算:
Q4.2
select year(d_datekey) as year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profitfrom lineorderjoin dates on lo_orderdate = d_datekeyjoin customer on lo_custkey = c_custkeyjoin supplier on lo_suppkey = s_suppkeyjoin part on lo_partkey = p_partkeywhere c_region = 'AMERICA'and s_region = 'AMERICA'and (year(d_datekey) = 1997 or year(d_datekey) = 1998)and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')group by year, s_nation, p_categoryorder by year, s_nation, p_category;
PostgreSQL执行打算:
MySQL执行打算:
END
昆仑数据库是一个HTAP NewSQL分布式数据库管理系统,能够满足用户对海量关系数据的存储管理和利用的全方位需要。
利用开发者和DBA的应用昆仑数据库的体验与单机MySQL和单机PostgreSQL简直完全相同,因为首先昆仑数据库反对PostgreSQL和MySQL双协定,反对规范SQL:2011的 DML 语法和性能以及PostgreSQL和MySQL对规范 SQL的扩大。同时,昆仑数据库集群反对程度弹性扩容,数据主动拆分,分布式事务处理和分布式查询处理,强壮的容错容灾能力,欠缺直观的监测剖析告警能力,集群数据备份和复原等 罕用的DBA 数据管理和操作。所有这些性能无需任何利用零碎侧的编码工作,也无需DBA人工染指,不停服不影响业务失常运行。
昆仑数据库具备全面的OLAP 数据分析能力,通过了TPC-H和TPC-DS规范测试集,能够实时剖析最新的业务数据,帮忙用户发掘出数据的价值。昆仑数据库反对私有云和公有云环境的部署,能够与docker,k8s等云基础设施无缝合作,能够轻松搭建云数据库服务。
请拜访 http://www.zettadb.com/ 获取更多信息并且下载昆仑数据库软件、文档和材料。
KunlunDB我的项目已开源
【GitHub:】
https://github.com/zettadb
【Gitee:】
https://gitee.com/zettadb