简介:本文将会介绍在Hologres中如何基于TPCH数据集做性能测试,并提供测试后果参考,不便您进行产品规格选型。
背景信息
TPC-H(商业智能计算测试)是美国交易解决效力委员会(TPC,Transaction Processing Performance Council)组织制订的用来模仿决策反对类利用的一个测试集。目前在学术界和工业界广泛采纳它来评估决策反对技术方面利用的性能。TPC-H 是依据实在的生产运行环境来建模,模仿了一套销售零碎的数据仓库。其共蕴含 8 张表,数据量可设定从 1G~3T 不等。其基准测试共蕴含了22个查问,次要评估指标各个查问的响应工夫,即从提交查问到后果返回所需工夫。其测试后果可综合反映零碎解决查问时的能力。详情参考TPCH 文档。
数据集介绍
该数据集蕴含如下 8 张表,互相间的关系如下图所示。
测试详情
测试数据量阐明
测试数据量会间接影响测试后果,TPC-H 的生成工具中应用 SF ( scale factor ) 管制生成数据的数据量的大小,1 SF 对应 1 GB。
留神:以上提及的数据量仅仅为原始数据的数据量,不包含索引等空间占用,所以筹备环境时,须要预留更多的空间。
测试环境
本次测试应用了独享实例(按量付费)的实例,因为仅为测试示意应用,所以计算资源配置抉择了8核32G。
测试场景
本测试场景次要蕴含3局部:
- OLAP查问场景测试,次要应用列存表,间接应用TPCH测试中的22条查问;
- Key/Value点查场景测试,次要应用行存表,针对orders应用行存表后,进行主键过滤的点查;
根底环境筹备
- 该步骤次要用于筹备OLAP查问场景和Key/Value点查场景所需的数据;
根底环境筹备
1. 创立 ECS 实例
登陆阿里云,创立一个 ECS 实例,用于数据生成、向 Hologres 导入数据、客户端测试。倡议规格:
- ecs.g6.4xlarge 规格
- CentOS 7.9 零碎
- ESSD 数据盘,具体数据容量依据须要测试的数据量大小决定
- 倡议 ECS 与 Hologres 实例用雷同 Region 和 VPC 网络
2. 创立 Hologres 实例
- 登陆阿里云,进入 Hologres 产品控制台,点击新增引擎实例
- 抉择配置,并填写实例名称,具体阐明请参考官网文档。
3. 创立测试数据库
- 在创立实例后,您须要登陆您创立的 Hologres 实例,创立一个数据库,本测试中命名数据库为
tpch_1sf
,具体操作步骤请参考官网文档
生成 TPC-H 数据
1. 筹备数据生成工具
- 近程链接 ECS 实例
- 更新所有库
yum update
- 装置 git
yum install git
- 装置gcc
yum install gcc
- 下载 TPC-H 数据生成代码
git clone https://github.com/gregrahn/tpch-kit.git
- 进入数据生成工具代码目录
cd tpch-kit/dbgen
- 编译数据生成工具代码
make
2. 生成数据
- 编译胜利后,您能够应用如下代码查看代码生成工具的相干参数。
./dbgen --help
- 本次测试仅生成 1 GB 数据,所以运行如下代码生成数据。
./dbgen -vf -s 1
如您须要生成更多数据量的数据,能够调整 SF 的参数,例如您能够应用如下代码生成 1 T 数据
./dbgen -vf -s 1000
- 个别状况下,32CU 能够跑 TPCH SF10,256CU 能够跑 TPCH SF50
- 数据生成后,您能够应用如下代码查看生成的文件。能够看到生成工具生成了 8 个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'
OLAP查问场景测试
筹备数据
1. 创立表
- 因为本文次要应用 psql 进行数据导入操作,须要先在 ECS 中运行如下命令装置 psql
yum install postgresql-server
- 装置 psql 后,您能够应用如下命令登陆 Hologres 实例
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
- 应用psql连贯Hologres后,您能够应用如下建表语句创立数据库表
DROP TABLE IF EXISTS LINEITEM;BEGIN;CREATE TABLE LINEITEM( L_ORDERKEY INT NOT NULL, L_PARTKEY INT NOT NULL, L_SUPPKEY INT NOT NULL, L_LINENUMBER INT NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG TEXT NOT NULL, L_LINESTATUS TEXT NOT NULL, L_SHIPDATE TIMESTAMPTZ NOT NULL, L_COMMITDATE TIMESTAMPTZ NOT NULL, L_RECEIPTDATE TIMESTAMPTZ NOT NULL, L_SHIPINSTRUCT TEXT NOT NULL, L_SHIPMODE TEXT NOT NULL, L_COMMENT TEXT NOT NULL, PRIMARY KEY (L_ORDERKEY,L_LINENUMBER));CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS ORDERS;BEGIN;CREATE TABLE ORDERS( O_ORDERKEY INT NOT NULL PRIMARY KEY, O_CUSTKEY INT NOT NULL, O_ORDERSTATUS TEXT NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE timestamptz NOT NULL, O_ORDERPRIORITY TEXT NOT NULL, O_CLERK TEXT NOT NULL, O_SHIPPRIORITY INT NOT NULL, O_COMMENT TEXT NOT NULL);CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS PARTSUPP;BEGIN;CREATE TABLE PARTSUPP( PS_PARTKEY INT NOT NULL, PS_SUPPKEY INT NOT NULL, PS_AVAILQTY INT NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT TEXT NOT NULL, PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY));CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS PART;BEGIN;CREATE TABLE PART( P_PARTKEY INT NOT NULL PRIMARY KEY, P_NAME TEXT NOT NULL, P_MFGR TEXT NOT NULL, P_BRAND TEXT NOT NULL, P_TYPE TEXT NOT NULL, P_SIZE INT NOT NULL, P_CONTAINER TEXT NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT TEXT NOT NULL);CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');CALL set_table_property('PART', 'colocate_with', 'LINEITEM');CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS CUSTOMER;BEGIN;CREATE TABLE CUSTOMER( C_CUSTKEY INT NOT NULL PRIMARY KEY, C_NAME TEXT NOT NULL, C_ADDRESS TEXT NOT NULL, C_NATIONKEY INT NOT NULL, C_PHONE TEXT NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT TEXT NOT NULL, C_COMMENT TEXT NOT NULL);CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS SUPPLIER;BEGIN;CREATE TABLE SUPPLIER( S_SUPPKEY INT NOT NULL PRIMARY KEY, S_NAME TEXT NOT NULL, S_ADDRESS TEXT NOT NULL, S_NATIONKEY INT NOT NULL, S_PHONE TEXT NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT TEXT NOT NULL);CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS NATION;BEGIN;CREATE TABLE NATION( N_NATIONKEY INT NOT NULL PRIMARY KEY, N_NAME text NOT NULL, N_REGIONKEY INT NOT NULL, N_COMMENT text NOT NULL);CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS REGION;BEGIN;CREATE TABLE REGION( R_REGIONKEY INT NOT NULL PRIMARY KEY, R_NAME TEXT NOT NULL, R_COMMENT TEXT);CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');COMMIT;
- 创立结束后,您能在 psql 中应用如下代码查看是否创立胜利
tpch_1sf=# \dt
- 若胜利,事实成果如下
tpch_1sf=# \dt List of relations Schema | Name | Type | Owner --------+----------+-------+-------------------- public | customer | table | tpch_1sf_developer public | lineitem | table | tpch_1sf_developer public | nation | table | tpch_1sf_developer public | orders | table | tpch_1sf_developer public | part | table | tpch_1sf_developer public | partsupp | table | tpch_1sf_developer public | region | table | tpch_1sf_developer public | supplier | table | tpch_1sf_developer(8 rows)
2. 导入数据
- 本测试计划次要应用
COPY FROM STDIN
的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。 - 您能够在数据生成工具的目录中参考如下
shell
脚本导入数据
for i in `ls *.tbl`; do echo $i; name=`echo $i| cut -d'.' -f1`; PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;done
- 至此您已实现数据导入
3. 收集统计信息
- 为了更好的执行查问,能够在 psql 中应用如下语句,使 Hologres 收集各张表特色信息。
vacuum region;vacuum nation;vacuum supplier;vacuum customer;vacuum part;vacuum partsupp;vacuum orders;vacuum lineitem;analyze nation;analyze region;analyze lineitem;analyze orders;analyze customer;analyze part;analyze partsupp;analyze supplier;
执行查问
- 为了不便统计查问信息,须要应用pgbench工具,您能够应用如下命令装置pgbench(如果测试机上已有pgbench,请确保版本大于9.6以上,最好大版本是13以上,否则以下测试会遇到各种不兼容)
yum install postgresql-contrib
- 为了不便查问,您能够间接通过以下连贯,下载所需的22条SQL
tpch\_data\_tpch\_query.zip
- 而后上传至ECS
- 进入ECS,并进入上传文件的目录,应用如下shell命令解压缩文件
unzip tpch_data_tpch_query
- 至此,您曾经实现了筹备工作,即可应用pgbench进行测试,您能够应用如下命令执行单条查问
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
- 参数解释
配置项 | 参数 | 阐明 |
---|---|---|
-h | Hologres实例的endpoint | 在Hologres管控台查看 |
-p | Hologres实例的端口地址 | 在Hologres管控台查看 |
-d | Hologres指定实例中的数据库名 | |
-c | 客户端数目(并发度) | 示例:1,因为该测试仅测试查问性能,不测试并发,所以并发度置为1即可 |
-t | 每个客户端须要执行的压测query数目 | 50 |
-f | 压测的sql | 示例:6.sql |
- 也能够间接执行如下
shell
脚本,间接批量执行22条查问,并将后果输入到文件hologres\_tpch\_test.out中
rm -f hologres_tpch_test.outecho `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.outfor i in {1..22}do PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.outdone
查看hologres\_tpch\_test.out即可失去查问后果,样例如下
- transaction type:阐明了执行的具体的SQL文件
- latency average:记录了对应SQL文件的3次查问的均匀工夫
2021-03-23 03:50:54 beginpghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100transaction type: ./tpch_data_tpch_query/1.sqlscaling factor: 1query mode: simplenumber of clients: 1number of threads: 1number of transactions per client: 3number of transactions actually processed: 3/3latency average = 76.936 mstps = 12.997850 (including connections establishing)tps = 15.972757 (excluding connections establishing)...
TPCH 22条查问语句
Q1
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_orderfrom lineitemwhere l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatusorder by l_returnflag, l_linestatus;
Q2
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_commentfrom part, supplier, partsupp, nation, regionwhere p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' )order by s_acctbal desc, n_name, s_name, p_partkeylimit 100;
Q3
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriorityfrom customer, orders, lineitemwhere c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-15' and l_shipdate > date '1995-03-15'group by l_orderkey, o_orderdate, o_shippriorityorder by revenue desc, o_orderdatelimit 10;
Q4
select o_orderpriority, count(*) as order_countfrom orderswhere o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate )group by o_orderpriorityorder by o_orderpriority;
Q5
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenuefrom customer, orders, lineitem, supplier, nation, regionwhere c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' yeargroup by n_nameorder by revenue desc;
Q6
select sum(l_extendedprice * l_discount) as revenuefrom lineitemwhere l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between 6 - 1 and 6 + 1 and l_quantity < 2400
Q7
set hg_experimental_enable_double_equivalent=on;select supp_nation, cust_nation, l_year, sum(volume) as revenuefrom ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shippinggroup by supp_nation, cust_nation, l_yearorder by supp_nation, cust_nation, l_year;
Q8
set hg_experimental_enable_double_equivalent=on;select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_sharefrom ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STANDARD POLISHED TIN' ) as all_nationsgroup by o_yearorder by o_year;
Q9
set hg_experimental_enable_double_equivalent=on;select nation, o_year, sum(amount) as sum_profitfrom ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profitgroup by nation, o_yearorder by nation, o_year desc;
Q10
select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_commentfrom customer, orders, lineitem, nationwhere c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkeygroup by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_commentorder by revenue desclimit 20;
Q11
select ps_partkey, sum(ps_supplycost * ps_availqty) as valuefrom partsupp, supplier, nationwhere ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY'group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0000010000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' )order by value desc limit 100;
Q12
select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_countfrom orders, lineitemwhere o_orderkey = l_orderkey and l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1994-01-01' and l_receiptdate < date '1994-01-01' + interval '1' yeargroup by l_shipmodeorder by l_shipmode;
Q13
select c_count, count(*) as custdistfrom ( select c_custkey, count(o_orderkey) from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders (c_custkey, c_count)group by c_countorder by custdist desc, c_count desc;
Q14
select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenuefrom lineitem, partwhere l_partkey = p_partkey and l_shipdate >= date '1995-09-01' and l_shipdate < date '1995-09-01' + interval '1' month;
Q15
with revenue0(SUPPLIER_NO, TOTAL_REVENUE) as ( select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1995-12-01' and l_shipdate < date '1995-12-01' + interval '3' month group by l_suppkey )select s_suppkey, s_name, s_address, s_phone, total_revenuefrom supplier, revenue0where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 )order by s_suppkey;
Q16
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cntfrom partsupp, partwhere p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' )group by p_brand, p_type, p_sizeorder by supplier_cnt desc, p_brand, p_type, p_size;
Q17
select sum(l_extendedprice) / 7.0 as avg_yearlyfrom lineitem, partwhere p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey and l_partkey in( select p_partkey from part where p_brand = 'Brand#23' and p_container = 'MED BOX') );
Q18
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)from customer, orders, lineitemwhere o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkeygroup by c_name, c_custkey, o_orderkey, o_orderdate, o_totalpriceorder by o_totalprice desc, o_orderdatelimit 100;
Q19
select sum(l_extendedprice* (1 - l_discount)) as revenuefrom lineitem, partwhere ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#34' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' );
Q20
select s_name, s_addressfrom supplier, nationwhere s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'CANADA'order by s_name;
Q21
select s_name, count(*) as numwaitfrom supplier, lineitem l1, orders, nationwhere s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA'group by s_nameorder by numwait desc, s_namelimit 100;
Q22
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbalfrom ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where substring(c_phone from 1 for 2) in ('13', '31', '23', '29', '30', '18', '17') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone from 1 for 2) in ('13', '31', '23', '29', '30', '18', '17') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsalegroup by cntrycodeorder by cntrycode;
Key/Value点查场景测试
筹备数据
1. 创立表
- 持续应用OLAP查问场景创立的数据库,咱们会应用TPCH数据集中的orders表进行测试,应用psql连贯Hologres后,您能够应用如下建表语句创立数据库表;
留神:点查场景须要应用行存表,所以须要创立一张新表,不能应用OLAP查问场景中应用的表
DROP TABLE IF EXISTS orders_row;BEGIN;CREATE TABLE public.orders_row ( "o_orderkey" int8 NOT NULL, "o_custkey" int8, "o_orderstatus" bpchar(1), "o_totalprice" numeric(15,2), "o_orderdate" date, "o_orderpriority" bpchar(15), "o_clerk" bpchar(15), "o_shippriority" int8, "o_comment" varchar(79),PRIMARY KEY (o_orderkey));CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');COMMIT;
2. COPY形式导入数据
- 本测试计划次要应用
COPY FROM STDIN
的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。 - 您能够在数据生成工具的目录中参考如下命令导入数据
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl
3. INSERT INTO形式导入数据
- 因为OLAP场景时您曾经导入了orders表的数据,您能够运行如下SQL语句导入数据
INSERT INTO public.orders_rowSELECT *FROM public.orders;
查问
1. 生成查问语句
- Key/Value点查场景次要的查问语句特色如下
SELECT column_a ,column_b ,... ,column_xFROM table_xWHERE pk = value_x;
或
SELECT column_a ,column_b ,... ,column_xFROM table_xWHERE pk IN ( value_a, value_b,..., value_x );
您能够应用如下脚本生成所需的sql,该脚本会生成2条sql
- kv\_query\_single.sql 针对单值筛选的SQL
- kv\_query\_in.sql 针对多值筛选的SQL,该脚本会随机生成一个针对10个值筛选的SQL
rm -rf kv_querymkdir kv_querycd kv_queryecho '\set column_values random(1,99999999)select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sqlecho '\set column_values1 random(1,99999999)\set column_values2 random(1,99999999)\set column_values3 random(1,99999999)\set column_values4 random(1,99999999)\set column_values5 random(1,99999999)\set column_values6 random(1,99999999)\set column_values7 random(1,99999999)\set column_values8 random(1,99999999)\set column_values9 random(1,99999999)\set column_values10 random(1,99999999)select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql
2. 进行查问
- 查问须要应用pgbench,您能够应用如下命令装置pgbench
yum install postgresql-contrib
- 之后您即可应用pgbench进行压测,针对单值筛选的场景
留神,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
- 针对多值筛选的场景
留神,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
- 参数解释
配置项 | 参数 | 阐明 |
---|---|---|
-h | Hologres实例的endpoint | 在Hologres管控台查看 |
-p | Hologres实例的端口地址 | 在Hologres管控台查看 |
-d | Hologres指定实例中的数据库名 | |
-c | 客户端数目(并发度) | 示例:8 |
-t | 每个客户端须要执行的压测query数目 | 50 |
-f | 压测的sql | 示例:6.sql |
测试后果参考
测试数据量:
- 本测试基于TPCH 100G的数据集进行测试,具体数据量如下表所示
表名 | 行数 |
---|---|
LINEITEM | 600,037,902 |
ORDERS | 150,000,000 |
PARTSUPP | 80,000,000 |
PART | 20,000,000 |
CUSTOMER | 15,000,000 |
SUPPLIER | 1,000,000 |
NATION | 25 |
REGION | 5 |
集群规格
计算资源 | 存储容量 | 软件版本 | 备注 |
---|---|---|---|
64 CU | |||
(CPU:64 Core 内存:256 GB) | 100 GB | r0.10.20 | 应用集群默认配置,Shard数量:40 |
128 CU | |||
(CPU:128 Core 内存:512 GB) | 100 GB | r0.10.20 | 应用集群默认配置,Shard数量:80 |
测试工夫:2021年6月
测试后果
数据导入工夫
- 数据导入执行工夫以秒(s)为单位。
- 导入工夫指将数据导入Hologres内表
- 在应用COPY办法导入数据时,一张表对应一个数据文件,并未应用并发导入形式。
- 具体数值如下表所示
阐明:应用COPY形式导入时一张表对应一个数据文件,并未应用并发导入形式
表名 | 行数 | 数据量 | Hologres 64CU | ||
---|---|---|---|---|---|
应用COPY形式导入(公网网络) | 应用COPY形式导入(VPC网络导入) | 应用MaxCompute表面导入 | |||
LINEITEM | 600,037,902 | 73.6GB | 3,070.453 | 694.364 | 148.165 |
ORDERS | 150,000,000 | 16.4GB | 691.060 | 172.529 | 37.741 |
PARTSUPP | 80,000,000 | 2.3GB | 468.560 | 107.092 | 18.488 |
PART | 20,000,000 | 11.3GB | 96.342 | 24.020 | 8.083 |
CUSTOMER | 15,000,000 | 2.3GB | 95.190 | 22.937 | 10.363 |
SUPPLIER | 1,000,000 | 132MB | 5.057 | 1.803 | 1.503 |
NATION | 25 | 2KB | 0.580 | 0.584 | 0.747 |
REGION | 5 | 0.375KB | 0.168 | 0.153 | 0.430 |
Total | 106G | 4427.410 | 1023.482 | 225.52 |
- 下图中蓝色为应用COPY形式在公网条件下导入数据的工夫,绿色为应用COPY形式在VPC网络条件下导入数据的工夫,灰色为应用MaxCompute表面形式导入的工夫
- 纵坐标数值越低,示意导入速度越快
- 横轴:表名。纵轴:数据导入工夫(s)
- 能够看出,因为网络带宽影响,应用COPY形式导入本地文件数据时,应用VPC网络导入数据工夫显著短于应用公网导入数据工夫;应用MaxCompute导入数据工夫显著短于应用COPY形式导入本地文件数据工夫。
查问工夫
- 查问执行工夫以秒(s)为单位。
- 查问后果均基于Hologres内表
- 具体数值如下表所示
TPCH Query编号 | Hologres 64CU | Hologres 128CU |
---|---|---|
1 | 3.120 | 2.150 |
2 | 0.581 | 0.467 |
3 | 1.735 | 1.005 |
4 | 1.558 | 0.836 |
5 | 2.921 | 1.917 |
6 | 0.297 | 0.096 |
7 | 2.006 | 1.029 |
8 | 2.674 | 1.679 |
9 | 5.298 | 2.796 |
10 | 1.944 | 0.924 |
11 | 0.397 | 0.297 |
12 | 1.531 | 0.852 |
13 | 1.741 | 0.971 |
14 | 0.286 | 0.160 |
15 | 0.293 | 0.177 |
16 | 1.223 | 1.020 |
17 | 1.405 | 0.607 |
18 | 3.817 | 2.169 |
19 | 1.400 | 0.622 |
20 | 1.358 | 0.868 |
21 | 4.164 | 2.047 |
22 | 1.121 | 0.654 |
Total | 40.870 | 23.343 |
- 下图中蓝色为64CU的实例的查问后果,绿色为128CU实例的查问后果
- 纵坐标数值越低,示意 TPC-H 性能越好。
- 能够看出随着实例规模的老本增长,查问工夫也在成线性降落趋势
- 横轴:query在文档中的编号。纵轴:query执行工夫(s)
版权申明:本文内容由阿里云实名注册用户自发奉献,版权归原作者所有,阿里云开发者社区不领有其著作权,亦不承当相应法律责任。具体规定请查看《阿里云开发者社区用户服务协定》和《阿里云开发者社区知识产权爱护指引》。如果您发现本社区中有涉嫌剽窃的内容,填写侵权投诉表单进行举报,一经查实,本社区将立即删除涉嫌侵权内容。