简介:本文将会介绍在 Hologres 中如何基于 TPCH 数据集做性能测试,并提供测试后果参考,不便您进行产品规格选型。
背景信息
TPC-H(商业智能计算测试)是美国交易解决效力委员会(TPC,Transaction Processing Performance Council)组织制订的用来模仿决策反对类利用的一个测试集。目前在学术界和工业界广泛采纳它来评估决策反对技术方面利用的性能。TPC-H 是依据实在的生产运行环境来建模,模仿了一套销售零碎的数据仓库。其共蕴含 8 张表,数据量可设定从 1G~3T 不等。其基准测试共蕴含了 22 个查问,次要评估指标各个查问的响应工夫,即从提交查问到后果返回所需工夫。其测试后果可综合反映零碎解决查问时的能力。详情参考 TPCH 文档。
数据集介绍
该数据集蕴含如下 8 张表,互相间的关系如下图所示。
测试详情
测试数据量阐明
测试数据量会间接影响测试后果,TPC-H 的生成工具中应用 SF (scale factor) 管制生成数据的数据量的大小,1 SF 对应 1 GB。
留神:以上提及的数据量仅仅为 原始数据的数据量,不包含索引等空间占用,所以筹备环境时,须要预留更多的空间。
测试环境
本次测试应用了独享实例(按量付费)的实例,因为仅为测试示意应用,所以计算资源配置抉择了 8 核 32G。
测试场景
本测试场景次要蕴含 3 局部:
- OLAP 查问场景测试,次要应用列存表,间接应用 TPCH 测试中的 22 条查问;
- Key/Value 点查场景测试,次要应用行存表,针对 orders 应用行存表后,进行主键过滤的点查;
根底环境筹备
- 该步骤次要用于筹备 OLAP 查问场景和 Key/Value 点查场景所需的数据;
根底环境筹备
1. 创立 ECS 实例
登陆阿里云,创立一个 ECS 实例,用于数据生成、向 Hologres 导入数据、客户端测试。倡议规格:
- ecs.g6.4xlarge 规格
- CentOS 7.9 零碎
- ESSD 数据盘,具体数据容量依据须要测试的数据量大小决定
- 倡议 ECS 与 Hologres 实例用雷同 Region 和 VPC 网络
2. 创立 Hologres 实例
- 登陆阿里云,进入 Hologres 产品控制台,点击 新增引擎实例
- 抉择配置,并填写实例名称,具体阐明请参考官网文档。
3. 创立测试数据库
- 在创立实例后,您须要登陆您创立的 Hologres 实例,创立一个数据库,本测试中命名数据库为
tpch_1sf
,具体操作步骤请参考官网文档
生成 TPC-H 数据
1. 筹备数据生成工具
- 近程链接 ECS 实例
- 更新所有库
yum update
- 装置 git
yum install git
- 装置 gcc
yum install gcc
- 下载 TPC-H 数据生成代码
git clone https://github.com/gregrahn/tpch-kit.git
- 进入数据生成工具代码目录
cd tpch-kit/dbgen
- 编译数据生成工具代码
make
2. 生成数据
- 编译胜利后,您能够应用如下代码查看代码生成工具的相干参数。
./dbgen --help
- 本次测试仅生成 1 GB 数据,所以运行如下代码生成数据。
./dbgen -vf -s 1
如您须要生成更多数据量的数据,能够调整 SF 的参数,例如您能够应用如下代码生成 1 T 数据
./dbgen -vf -s 1000
- 个别状况下,32CU 能够跑 TPCH SF10,256CU 能够跑 TPCH SF50
- 数据生成后,您能够应用如下代码查看生成的文件。能够看到生成工具生成了 8 个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'
OLAP 查问场景测试
筹备数据
1. 创立表
- 因为本文次要应用 psql 进行数据导入操作,须要先在 ECS 中运行如下命令装置 psql
yum install postgresql-server
- 装置 psql 后,您能够应用如下命令登陆 Hologres 实例
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
- 应用 psql 连贯 Hologres 后,您能够应用如下建表语句创立数据库表
DROP TABLE IF EXISTS LINEITEM;
BEGIN;
CREATE TABLE LINEITEM
(
L_ORDERKEY INT NOT NULL,
L_PARTKEY INT NOT NULL,
L_SUPPKEY INT NOT NULL,
L_LINENUMBER INT NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG TEXT NOT NULL,
L_LINESTATUS TEXT NOT NULL,
L_SHIPDATE TIMESTAMPTZ NOT NULL,
L_COMMITDATE TIMESTAMPTZ NOT NULL,
L_RECEIPTDATE TIMESTAMPTZ NOT NULL,
L_SHIPINSTRUCT TEXT NOT NULL,
L_SHIPMODE TEXT NOT NULL,
L_COMMENT TEXT NOT NULL,
PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS ORDERS;
BEGIN;
CREATE TABLE ORDERS
(
O_ORDERKEY INT NOT NULL PRIMARY KEY,
O_CUSTKEY INT NOT NULL,
O_ORDERSTATUS TEXT NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE timestamptz NOT NULL,
O_ORDERPRIORITY TEXT NOT NULL,
O_CLERK TEXT NOT NULL,
O_SHIPPRIORITY INT NOT NULL,
O_COMMENT TEXT NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PARTSUPP;
BEGIN;
CREATE TABLE PARTSUPP
(
PS_PARTKEY INT NOT NULL,
PS_SUPPKEY INT NOT NULL,
PS_AVAILQTY INT NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT TEXT NOT NULL,
PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PART;
BEGIN;
CREATE TABLE PART
(
P_PARTKEY INT NOT NULL PRIMARY KEY,
P_NAME TEXT NOT NULL,
P_MFGR TEXT NOT NULL,
P_BRAND TEXT NOT NULL,
P_TYPE TEXT NOT NULL,
P_SIZE INT NOT NULL,
P_CONTAINER TEXT NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT TEXT NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'colocate_with', 'LINEITEM');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
C_CUSTKEY INT NOT NULL PRIMARY KEY,
C_NAME TEXT NOT NULL,
C_ADDRESS TEXT NOT NULL,
C_NATIONKEY INT NOT NULL,
C_PHONE TEXT NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT TEXT NOT NULL,
C_COMMENT TEXT NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS SUPPLIER;
BEGIN;
CREATE TABLE SUPPLIER
(
S_SUPPKEY INT NOT NULL PRIMARY KEY,
S_NAME TEXT NOT NULL,
S_ADDRESS TEXT NOT NULL,
S_NATIONKEY INT NOT NULL,
S_PHONE TEXT NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT TEXT NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS NATION;
BEGIN;
CREATE TABLE NATION(
N_NATIONKEY INT NOT NULL PRIMARY KEY,
N_NAME text NOT NULL,
N_REGIONKEY INT NOT NULL,
N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS REGION;
BEGIN;
CREATE TABLE REGION
(
R_REGIONKEY INT NOT NULL PRIMARY KEY,
R_NAME TEXT NOT NULL,
R_COMMENT TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
- 创立结束后,您能 在 psql 中 应用如下代码查看是否创立胜利
tpch_1sf=# \dt
- 若胜利,事实成果如下
tpch_1sf=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------+-------+--------------------
public | customer | table | tpch_1sf_developer
public | lineitem | table | tpch_1sf_developer
public | nation | table | tpch_1sf_developer
public | orders | table | tpch_1sf_developer
public | part | table | tpch_1sf_developer
public | partsupp | table | tpch_1sf_developer
public | region | table | tpch_1sf_developer
public | supplier | table | tpch_1sf_developer
(8 rows)
2. 导入数据
- 本测试计划次要应用
COPY FROM STDIN
的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。 - 您能够在数据生成工具的目录中参考如下
shell
脚本导入数据
for i in `ls *.tbl`; do
echo $i;
name=`echo $i| cut -d'.' -f1`;
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter'|'csv;" < $i;
done
- 至此您已实现数据导入
3. 收集统计信息
- 为了更好的执行查问,能够在 psql 中应用如下语句,使 Hologres 收集各张表特色信息。
vacuum region;
vacuum nation;
vacuum supplier;
vacuum customer;
vacuum part;
vacuum partsupp;
vacuum orders;
vacuum lineitem;
analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;
执行查问
- 为了不便统计查问信息,须要应用 pgbench 工具,您能够应用如下命令装置 pgbench(如果测试机上已有 pgbench,请确保版本大于 9.6 以上,最好大版本是 13 以上,否则以下测试会遇到各种不兼容)
yum install postgresql-contrib
- 为了不便查问,您能够间接通过以下连贯,下载所需的 22 条 SQL
tpch\_data\_tpch\_query.zip
- 而后上传至 ECS
- 进入 ECS,并进入上传文件的目录,应用如下 shell 命令解压缩文件
unzip tpch_data_tpch_query
- 至此,您曾经实现了筹备工作,即可应用 pgbench 进行测试,您能够应用如下命令执行单条查问
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
- 参数解释
配置项 | 参数 | 阐明 |
---|---|---|
-h | Hologres 实例的 endpoint | 在 Hologres 管控台查看 |
-p | Hologres 实例的端口地址 | 在 Hologres 管控台查看 |
-d | Hologres 指定实例中的数据库名 | |
-c | 客户端数目(并发度) | 示例:1,因为该测试仅测试查问性能,不测试并发,所以并发度置为 1 即可 |
-t | 每个客户端须要执行的压测 query 数目 | 50 |
-f | 压测的 sql | 示例:6.sql |
- 也能够间接执行如下
shell
脚本,间接批量执行 22 条查问,并将后果输入到文件 hologres\_tpch\_test.out 中
rm -f hologres_tpch_test.out
echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
for i in {1..22}
do
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.out
done
-
查看 hologres\_tpch\_test.out 即可失去查问后果,样例如下
- transaction type:阐明了执行的具体的 SQL 文件
- latency average:记录了对应 SQL 文件的 3 次查问的均匀工夫
2021-03-23 03:50:54 begin
pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
transaction type: ./tpch_data_tpch_query/1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 3
number of transactions actually processed: 3/3
latency average = 76.936 ms
tps = 12.997850 (including connections establishing)
tps = 15.972757 (excluding connections establishing)
...
TPCH 22 条查问语句
Q1
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Q2
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;
Q3
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-15'
and l_shipdate > date '1995-03-15'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
limit 10;
Q4
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
Q5
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= date '1994-01-01'
and o_orderdate < date '1994-01-01' + interval '1' year
group by
n_name
order by
revenue desc;
Q6
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
and l_discount between 6 - 1 and 6 + 1
and l_quantity < 2400
Q7
set hg_experimental_enable_double_equivalent=on;
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and ((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;
Q8
set hg_experimental_enable_double_equivalent=on;
select
o_year,
sum(case
when nation = 'BRAZIL' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'STANDARD POLISHED TIN'
) as all_nations
group by
o_year
order by
o_year;
Q9
set hg_experimental_enable_double_equivalent=on;
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
Q10
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-10-01'
and o_orderdate < date '1993-10-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit 20;
Q11
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0000010000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
)
order by
value desc
limit 100;
Q12
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'SHIP')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
Q13
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc;
Q14
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-09-01' + interval '1' month;
Q15
with revenue0(SUPPLIER_NO, TOTAL_REVENUE) as
(
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1995-12-01'
and l_shipdate < date '1995-12-01' + interval '3' month
group by
l_suppkey
)
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey;
Q16
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#45'
and p_type not like 'MEDIUM POLISHED%'
and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
Q17
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
and l_partkey in(
select p_partkey
from part
where p_brand = 'Brand#23' and p_container = 'MED BOX')
);
Q18
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 300
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
limit 100;
Q19
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 1 and l_quantity <= 1 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#34'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
Q20
select
s_name,
s_address
from
supplier,
nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp
where
ps_partkey in (
select
p_partkey
from
part
where
p_name like 'forest%'
)
and ps_availqty > (
select
0.5 * sum(l_quantity)
from
lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
)
)
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by
s_name;
Q21
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100;
Q22
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone from 1 for 2) as cntrycode,
c_acctbal
from
customer
where
substring(c_phone from 1 for 2) in
('13', '31', '23', '29', '30', '18', '17')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substring(c_phone from 1 for 2) in
('13', '31', '23', '29', '30', '18', '17')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode;
Key/Value 点查场景测试
筹备数据
1. 创立表
- 持续应用 OLAP 查问场景创立的数据库,咱们会应用 TPCH 数据集中的 orders 表进行测试,应用 psql 连贯 Hologres 后,您能够应用如下建表语句创立数据库表;
留神:点查场景须要应用行存表,所以须要创立一张新表,不能应用 OLAP 查问场景中应用的表
DROP TABLE IF EXISTS orders_row;
BEGIN;
CREATE TABLE public.orders_row (
"o_orderkey" int8 NOT NULL,
"o_custkey" int8,
"o_orderstatus" bpchar(1),
"o_totalprice" numeric(15,2),
"o_orderdate" date,
"o_orderpriority" bpchar(15),
"o_clerk" bpchar(15),
"o_shippriority" int8,
"o_comment" varchar(79),
PRIMARY KEY (o_orderkey)
);
CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');
CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
COMMIT;
2. COPY 形式导入数据
- 本测试计划次要应用
COPY FROM STDIN
的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。 - 您能够在数据生成工具的目录中参考如下命令导入数据
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter'|'csv;" < orders.tbl
3. INSERT INTO 形式导入数据
- 因为 OLAP 场景时您曾经导入了 orders 表的数据,您能够运行如下 SQL 语句导入数据
INSERT INTO public.orders_row
SELECT *
FROM public.orders;
查问
1. 生成查问语句
- Key/Value 点查场景次要的查问语句特色如下
SELECT column_a
,column_b
,...
,column_x
FROM table_x
WHERE pk = value_x
;
或
SELECT column_a
,column_b
,...
,column_x
FROM table_x
WHERE pk IN (value_a, value_b,..., value_x)
;
-
您能够应用如下脚本生成所需的 sql,该脚本会生成 2 条 sql
- kv\_query\_single.sql 针对单值筛选的 SQL
- kv\_query\_in.sql 针对多值筛选的 SQL,该脚本会随机生成一个针对 10 个值筛选的 SQL
rm -rf kv_query
mkdir kv_query
cd kv_query
echo '\set column_values random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sql
echo '\set column_values1 random(1,99999999)
\set column_values2 random(1,99999999)
\set column_values3 random(1,99999999)
\set column_values4 random(1,99999999)
\set column_values5 random(1,99999999)
\set column_values6 random(1,99999999)
\set column_values7 random(1,99999999)
\set column_values8 random(1,99999999)
\set column_values9 random(1,99999999)
\set column_values10 random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql
2. 进行查问
- 查问须要应用 pgbench,您能够应用如下命令装置 pgbench
yum install postgresql-contrib
- 之后您即可应用 pgbench 进行压测,针对单值筛选的场景
留神,请在生成 SQL 的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
- 针对多值筛选的场景
留神,请在生成 SQL 的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
- 参数解释
配置项 | 参数 | 阐明 |
---|---|---|
-h | Hologres 实例的 endpoint | 在 Hologres 管控台查看 |
-p | Hologres 实例的端口地址 | 在 Hologres 管控台查看 |
-d | Hologres 指定实例中的数据库名 | |
-c | 客户端数目(并发度) | 示例:8 |
-t | 每个客户端须要执行的压测 query 数目 | 50 |
-f | 压测的 sql | 示例:6.sql |
测试后果参考
测试数据量:
- 本测试基于 TPCH 100G 的数据集进行测试,具体数据量如下表所示
表名 | 行数 |
---|---|
LINEITEM | 600,037,902 |
ORDERS | 150,000,000 |
PARTSUPP | 80,000,000 |
PART | 20,000,000 |
CUSTOMER | 15,000,000 |
SUPPLIER | 1,000,000 |
NATION | 25 |
REGION | 5 |
集群规格
计算资源 | 存储容量 | 软件版本 | 备注 |
---|---|---|---|
64 CU | |||
(CPU:64 Core 内存:256 GB) | 100 GB | r0.10.20 | 应用集群默认配置,Shard 数量:40 |
128 CU | |||
(CPU:128 Core 内存:512 GB) | 100 GB | r0.10.20 | 应用集群默认配置,Shard 数量:80 |
测试工夫:2021 年 6 月
测试后果
数据导入工夫
- 数据导入执行工夫以秒(s)为单位。
- 导入工夫指将数据导入 Hologres 内表
- 在应用 COPY 办法导入数据时,一张表对应一个数据文件,并未应用并发导入形式。
- 具体数值如下表所示
阐明:应用 COPY 形式导入时一张表对应一个数据文件,并未应用并发导入形式
表名 | 行数 | 数据量 | Hologres 64CU | ||
---|---|---|---|---|---|
应用 COPY 形式导入(公网网络) | 应用 COPY 形式导入(VPC 网络导入) | 应用 MaxCompute 表面导入 | |||
LINEITEM | 600,037,902 | 73.6GB | 3,070.453 | 694.364 | 148.165 |
ORDERS | 150,000,000 | 16.4GB | 691.060 | 172.529 | 37.741 |
PARTSUPP | 80,000,000 | 2.3GB | 468.560 | 107.092 | 18.488 |
PART | 20,000,000 | 11.3GB | 96.342 | 24.020 | 8.083 |
CUSTOMER | 15,000,000 | 2.3GB | 95.190 | 22.937 | 10.363 |
SUPPLIER | 1,000,000 | 132MB | 5.057 | 1.803 | 1.503 |
NATION | 25 | 2KB | 0.580 | 0.584 | 0.747 |
REGION | 5 | 0.375KB | 0.168 | 0.153 | 0.430 |
Total | 106G | 4427.410 | 1023.482 | 225.52 |
- 下图中蓝色为应用 COPY 形式在公网条件下导入数据的工夫,绿色为应用 COPY 形式在 VPC 网络条件下导入数据的工夫,灰色为应用 MaxCompute 表面形式导入的工夫
- 纵坐标数值越低,示意导入速度越快
- 横轴:表名。纵轴:数据导入工夫(s)
- 能够看出,因为网络带宽影响,应用 COPY 形式导入本地文件数据时,应用 VPC 网络导入数据工夫显著短于应用公网导入数据工夫;应用 MaxCompute 导入数据工夫显著短于应用 COPY 形式导入本地文件数据工夫。
查问工夫
- 查问执行工夫以秒(s)为单位。
- 查问后果均基于 Hologres 内表
- 具体数值如下表所示
TPCH Query 编号 | Hologres 64CU | Hologres 128CU |
---|---|---|
1 | 3.120 | 2.150 |
2 | 0.581 | 0.467 |
3 | 1.735 | 1.005 |
4 | 1.558 | 0.836 |
5 | 2.921 | 1.917 |
6 | 0.297 | 0.096 |
7 | 2.006 | 1.029 |
8 | 2.674 | 1.679 |
9 | 5.298 | 2.796 |
10 | 1.944 | 0.924 |
11 | 0.397 | 0.297 |
12 | 1.531 | 0.852 |
13 | 1.741 | 0.971 |
14 | 0.286 | 0.160 |
15 | 0.293 | 0.177 |
16 | 1.223 | 1.020 |
17 | 1.405 | 0.607 |
18 | 3.817 | 2.169 |
19 | 1.400 | 0.622 |
20 | 1.358 | 0.868 |
21 | 4.164 | 2.047 |
22 | 1.121 | 0.654 |
Total | 40.870 | 23.343 |
- 下图中蓝色为 64CU 的实例的查问后果,绿色为 128CU 实例的查问后果
- 纵坐标数值越低,示意 TPC-H 性能越好。
- 能够看出随着实例规模的老本增长,查问工夫也在成线性降落趋势
- 横轴:query 在文档中的编号。纵轴:query 执行工夫(s)
版权申明:本文内容由阿里云实名注册用户自发奉献,版权归原作者所有,阿里云开发者社区不领有其著作权,亦不承当相应法律责任。具体规定请查看《阿里云开发者社区用户服务协定》和《阿里云开发者社区知识产权爱护指引》。如果您发现本社区中有涉嫌剽窃的内容,填写侵权投诉表单进行举报,一经查实,本社区将立即删除涉嫌侵权内容。