简介:本文将会介绍在Hologres中如何基于TPCH数据集做性能测试,并提供测试后果参考,不便您进行产品规格选型。

背景信息

TPC-H(商业智能计算测试)是美国交易解决效力委员会(TPC,Transaction Processing Performance Council)组织制订的用来模仿决策反对类利用的一个测试集。目前在学术界和工业界广泛采纳它来评估决策反对技术方面利用的性能。TPC-H 是依据实在的生产运行环境来建模,模仿了一套销售零碎的数据仓库。其共蕴含 8 张表,数据量可设定从 1G~3T 不等。其基准测试共蕴含了22个查问,次要评估指标各个查问的响应工夫,即从提交查问到后果返回所需工夫。其测试后果可综合反映零碎解决查问时的能力。详情参考TPCH 文档。

数据集介绍

该数据集蕴含如下 8 张表,互相间的关系如下图所示。

测试详情

测试数据量阐明

测试数据量会间接影响测试后果,TPC-H 的生成工具中应用 SF ( scale factor ) 管制生成数据的数据量的大小,1 SF 对应 1 GB。

留神:以上提及的数据量仅仅为原始数据的数据量,不包含索引等空间占用,所以筹备环境时,须要预留更多的空间。

测试环境

本次测试应用了独享实例(按量付费)的实例,因为仅为测试示意应用,所以计算资源配置抉择了8核32G。

测试场景

本测试场景次要蕴含3局部:

  1. OLAP查问场景测试,次要应用列存表,间接应用TPCH测试中的22条查问;
  2. Key/Value点查场景测试,次要应用行存表,针对orders应用行存表后,进行主键过滤的点查;

根底环境筹备

  • 该步骤次要用于筹备OLAP查问场景和Key/Value点查场景所需的数据;

根底环境筹备

1. 创立 ECS 实例

登陆阿里云,创立一个 ECS 实例,用于数据生成、向 Hologres 导入数据、客户端测试。倡议规格:

  • ecs.g6.4xlarge 规格
  • CentOS 7.9 零碎
  • ESSD 数据盘,具体数据容量依据须要测试的数据量大小决定
  • 倡议 ECS 与 Hologres 实例用雷同 Region 和 VPC 网络

2. 创立 Hologres 实例

  • 登陆阿里云,进入 Hologres 产品控制台,点击新增引擎实例
  • 抉择配置,并填写实例名称,具体阐明请参考官网文档。

3. 创立测试数据库

  • 在创立实例后,您须要登陆您创立的 Hologres 实例,创立一个数据库,本测试中命名数据库为tpch_1sf,具体操作步骤请参考官网文档

生成 TPC-H 数据

1. 筹备数据生成工具

  • 近程链接 ECS 实例
  • 更新所有库
yum update
  • 装置 git
yum install git
  • 装置gcc
yum install gcc
  • 下载 TPC-H 数据生成代码
git clone https://github.com/gregrahn/tpch-kit.git
  • 进入数据生成工具代码目录
cd tpch-kit/dbgen
  • 编译数据生成工具代码
make

2. 生成数据

  • 编译胜利后,您能够应用如下代码查看代码生成工具的相干参数。
./dbgen --help
  • 本次测试仅生成 1 GB 数据,所以运行如下代码生成数据。
./dbgen -vf -s 1
如您须要生成更多数据量的数据,能够调整 SF 的参数,例如您能够应用如下代码生成 1 T 数据
./dbgen -vf -s 1000
  • 个别状况下,32CU 能够跑 TPCH SF10,256CU 能够跑 TPCH SF50
  • 数据生成后,您能够应用如下代码查看生成的文件。能够看到生成工具生成了 8 个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'

OLAP查问场景测试

筹备数据

1. 创立表

  • 因为本文次要应用 psql 进行数据导入操作,须要先在 ECS 中运行如下命令装置 psql
yum install postgresql-server
  • 装置 psql 后,您能够应用如下命令登陆 Hologres 实例
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
  • 应用psql连贯Hologres后,您能够应用如下建表语句创立数据库表
DROP TABLE IF EXISTS LINEITEM;BEGIN;CREATE TABLE LINEITEM(    L_ORDERKEY      INT         NOT NULL,    L_PARTKEY       INT         NOT NULL,    L_SUPPKEY       INT         NOT NULL,    L_LINENUMBER    INT         NOT NULL,    L_QUANTITY      DECIMAL(15,2) NOT NULL,    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,    L_DISCOUNT      DECIMAL(15,2) NOT NULL,    L_TAX           DECIMAL(15,2) NOT NULL,    L_RETURNFLAG    TEXT        NOT NULL,    L_LINESTATUS    TEXT        NOT NULL,    L_SHIPDATE      TIMESTAMPTZ NOT NULL,    L_COMMITDATE    TIMESTAMPTZ NOT NULL,    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,    L_SHIPINSTRUCT  TEXT        NOT NULL,    L_SHIPMODE      TEXT        NOT NULL,    L_COMMENT       TEXT        NOT NULL,    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER));CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS ORDERS;BEGIN;CREATE TABLE ORDERS(    O_ORDERKEY      INT         NOT NULL PRIMARY KEY,    O_CUSTKEY       INT         NOT NULL,    O_ORDERSTATUS   TEXT        NOT NULL,    O_TOTALPRICE    DECIMAL(15,2) NOT NULL,    O_ORDERDATE     timestamptz NOT NULL,    O_ORDERPRIORITY TEXT        NOT NULL,    O_CLERK         TEXT        NOT NULL,    O_SHIPPRIORITY  INT         NOT NULL,    O_COMMENT       TEXT        NOT NULL);CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS PARTSUPP;BEGIN;CREATE TABLE PARTSUPP(    PS_PARTKEY    INT    NOT NULL,    PS_SUPPKEY    INT    NOT NULL,    PS_AVAILQTY   INT    NOT NULL,    PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,    PS_COMMENT    TEXT   NOT NULL,    PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY));CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS PART;BEGIN;CREATE TABLE PART(    P_PARTKEY     INT    NOT NULL PRIMARY KEY,    P_NAME        TEXT   NOT NULL,    P_MFGR        TEXT   NOT NULL,    P_BRAND       TEXT   NOT NULL,    P_TYPE        TEXT   NOT NULL,    P_SIZE        INT    NOT NULL,    P_CONTAINER   TEXT   NOT NULL,    P_RETAILPRICE DECIMAL(15,2) NOT NULL,    P_COMMENT     TEXT   NOT NULL);CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');CALL set_table_property('PART', 'colocate_with', 'LINEITEM');CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS CUSTOMER;BEGIN;CREATE TABLE CUSTOMER(    C_CUSTKEY    INT    NOT NULL PRIMARY KEY,    C_NAME       TEXT   NOT NULL,    C_ADDRESS    TEXT   NOT NULL,    C_NATIONKEY  INT    NOT NULL,    C_PHONE      TEXT   NOT NULL,    C_ACCTBAL    DECIMAL(15,2) NOT NULL,    C_MKTSEGMENT TEXT   NOT NULL,    C_COMMENT    TEXT   NOT NULL);CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS SUPPLIER;BEGIN;CREATE TABLE SUPPLIER(    S_SUPPKEY   INT    NOT NULL PRIMARY KEY,    S_NAME      TEXT   NOT NULL,    S_ADDRESS   TEXT   NOT NULL,    S_NATIONKEY INT    NOT NULL,    S_PHONE     TEXT   NOT NULL,    S_ACCTBAL   DECIMAL(15,2) NOT NULL,    S_COMMENT   TEXT   NOT NULL);CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS NATION;BEGIN;CREATE TABLE NATION(  N_NATIONKEY INT NOT NULL PRIMARY KEY,  N_NAME text NOT NULL,  N_REGIONKEY INT NOT NULL,  N_COMMENT text NOT NULL);CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');COMMIT;DROP TABLE IF EXISTS REGION;BEGIN;CREATE TABLE REGION(    R_REGIONKEY INT  NOT NULL PRIMARY KEY,    R_NAME      TEXT NOT NULL,    R_COMMENT   TEXT);CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');COMMIT;
  • 创立结束后,您能在 psql 中应用如下代码查看是否创立胜利
tpch_1sf=# \dt
  • 若胜利,事实成果如下
tpch_1sf=# \dt               List of relations Schema |   Name   | Type  |       Owner        --------+----------+-------+-------------------- public | customer | table | tpch_1sf_developer public | lineitem | table | tpch_1sf_developer public | nation   | table | tpch_1sf_developer public | orders   | table | tpch_1sf_developer public | part     | table | tpch_1sf_developer public | partsupp | table | tpch_1sf_developer public | region   | table | tpch_1sf_developer public | supplier | table | tpch_1sf_developer(8 rows)

2. 导入数据

  • 本测试计划次要应用 COPY FROM STDIN 的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。
  • 您能够在数据生成工具的目录中参考如下 shell脚本导入数据
for i in `ls *.tbl`; do    echo $i;    name=`echo $i| cut -d'.' -f1`;    PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;done
  • 至此您已实现数据导入

3. 收集统计信息

  • 为了更好的执行查问,能够在 psql 中应用如下语句,使 Hologres 收集各张表特色信息。
vacuum  region;vacuum  nation;vacuum  supplier;vacuum  customer;vacuum  part;vacuum  partsupp;vacuum  orders;vacuum  lineitem;analyze nation;analyze region;analyze lineitem;analyze orders;analyze customer;analyze part;analyze partsupp;analyze supplier;

执行查问

  • 为了不便统计查问信息,须要应用pgbench工具,您能够应用如下命令装置pgbench(如果测试机上已有pgbench,请确保版本大于9.6以上,最好大版本是13以上,否则以下测试会遇到各种不兼容)
yum install postgresql-contrib
  • 为了不便查问,您能够间接通过以下连贯,下载所需的22条SQL

tpch\_data\_tpch\_query.zip

  • 而后上传至ECS
  • 进入ECS,并进入上传文件的目录,应用如下shell命令解压缩文件
unzip tpch_data_tpch_query
  • 至此,您曾经实现了筹备工作,即可应用pgbench进行测试,您能够应用如下命令执行单条查问
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
  • 参数解释
配置项参数阐明
-hHologres实例的endpoint在Hologres管控台查看
-pHologres实例的端口地址在Hologres管控台查看
-dHologres指定实例中的数据库名
-c客户端数目(并发度)示例:1,因为该测试仅测试查问性能,不测试并发,所以并发度置为1即可
-t每个客户端须要执行的压测query数目50
-f压测的sql示例:6.sql
  • 也能够间接执行如下 shell 脚本,间接批量执行22条查问,并将后果输入到文件hologres\_tpch\_test.out中
rm -f hologres_tpch_test.outecho `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.outfor i in {1..22}do    PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.outdone
  • 查看hologres\_tpch\_test.out即可失去查问后果,样例如下

    • transaction type:阐明了执行的具体的SQL文件
    • latency average:记录了对应SQL文件的3次查问的均匀工夫
2021-03-23 03:50:54 beginpghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100transaction type: ./tpch_data_tpch_query/1.sqlscaling factor: 1query mode: simplenumber of clients: 1number of threads: 1number of transactions per client: 3number of transactions actually processed: 3/3latency average = 76.936 mstps = 12.997850 (including connections establishing)tps = 15.972757 (excluding connections establishing)...

TPCH 22条查问语句

Q1

select    l_returnflag,    l_linestatus,    sum(l_quantity) as sum_qty,    sum(l_extendedprice) as sum_base_price,    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,    avg(l_quantity) as avg_qty,    avg(l_extendedprice) as avg_price,    avg(l_discount) as avg_disc,    count(*) as count_orderfrom    lineitemwhere    l_shipdate <= date '1998-12-01' - interval '90' day group by    l_returnflag,    l_linestatusorder by    l_returnflag,    l_linestatus;

Q2

select    s_acctbal,    s_name,    n_name,    p_partkey,    p_mfgr,    s_address,    s_phone,    s_commentfrom    part,    supplier,    partsupp,    nation,    regionwhere    p_partkey = ps_partkey    and s_suppkey = ps_suppkey    and p_size = 15    and p_type like '%BRASS'    and s_nationkey = n_nationkey    and n_regionkey = r_regionkey    and r_name = 'EUROPE'    and ps_supplycost = (        select            min(ps_supplycost)        from            partsupp,            supplier,            nation,            region        where            p_partkey = ps_partkey            and s_suppkey = ps_suppkey            and s_nationkey = n_nationkey            and n_regionkey = r_regionkey            and r_name = 'EUROPE'    )order by    s_acctbal desc,    n_name,    s_name,    p_partkeylimit 100;

Q3

select    l_orderkey,    sum(l_extendedprice * (1 - l_discount)) as revenue,    o_orderdate,    o_shippriorityfrom    customer,    orders,    lineitemwhere    c_mktsegment = 'BUILDING'    and c_custkey = o_custkey    and l_orderkey = o_orderkey    and o_orderdate < date '1995-03-15'    and l_shipdate > date '1995-03-15'group by    l_orderkey,    o_orderdate,    o_shippriorityorder by    revenue desc,    o_orderdatelimit 10;

Q4

select    o_orderpriority,    count(*) as order_countfrom    orderswhere    o_orderdate >= date '1993-07-01'    and o_orderdate < date '1993-07-01' + interval '3' month    and exists (        select            *        from            lineitem        where            l_orderkey = o_orderkey            and l_commitdate < l_receiptdate    )group by    o_orderpriorityorder by    o_orderpriority;

Q5

select    n_name,    sum(l_extendedprice * (1 - l_discount)) as revenuefrom    customer,    orders,    lineitem,    supplier,    nation,    regionwhere    c_custkey = o_custkey    and l_orderkey = o_orderkey    and l_suppkey = s_suppkey    and c_nationkey = s_nationkey    and s_nationkey = n_nationkey    and n_regionkey = r_regionkey    and r_name = 'ASIA'    and o_orderdate >= date '1994-01-01'    and o_orderdate < date '1994-01-01' + interval '1' yeargroup by    n_nameorder by    revenue desc;

Q6

select    sum(l_extendedprice * l_discount) as revenuefrom    lineitemwhere    l_shipdate >= date '1994-01-01'    and l_shipdate < date '1994-01-01' + interval '1' year    and l_discount between 6 - 1 and 6 + 1    and l_quantity < 2400

Q7

set hg_experimental_enable_double_equivalent=on;select    supp_nation,    cust_nation,    l_year,    sum(volume) as revenuefrom    (        select            n1.n_name as supp_nation,            n2.n_name as cust_nation,            extract(year from l_shipdate) as l_year,            l_extendedprice * (1 - l_discount) as volume        from            supplier,            lineitem,            orders,            customer,            nation n1,            nation n2        where            s_suppkey = l_suppkey            and o_orderkey = l_orderkey            and c_custkey = o_custkey            and s_nationkey = n1.n_nationkey            and c_nationkey = n2.n_nationkey            and (                (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')                or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')            )            and l_shipdate between date '1995-01-01' and date '1996-12-31'    ) as shippinggroup by    supp_nation,    cust_nation,    l_yearorder by    supp_nation,    cust_nation,    l_year;

Q8

set hg_experimental_enable_double_equivalent=on;select    o_year,    sum(case        when nation = 'BRAZIL' then volume        else 0    end) / sum(volume) as mkt_sharefrom    (        select            extract(year from o_orderdate) as o_year,            l_extendedprice * (1 - l_discount) as volume,            n2.n_name as nation        from            part,            supplier,            lineitem,            orders,            customer,            nation n1,            nation n2,            region        where            p_partkey = l_partkey            and s_suppkey = l_suppkey            and l_orderkey = o_orderkey            and o_custkey = c_custkey            and c_nationkey = n1.n_nationkey            and n1.n_regionkey = r_regionkey            and r_name = 'AMERICA'            and s_nationkey = n2.n_nationkey            and o_orderdate between date '1995-01-01' and date '1996-12-31'            and p_type = 'STANDARD POLISHED TIN'    ) as all_nationsgroup by    o_yearorder by    o_year;

Q9

set hg_experimental_enable_double_equivalent=on;select    nation,    o_year,    sum(amount) as sum_profitfrom    (        select            n_name as nation,            extract(year from o_orderdate) as o_year,            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount        from            part,            supplier,            lineitem,            partsupp,            orders,            nation        where            s_suppkey = l_suppkey            and ps_suppkey = l_suppkey            and ps_partkey = l_partkey            and p_partkey = l_partkey            and o_orderkey = l_orderkey            and s_nationkey = n_nationkey            and p_name like '%green%'    ) as profitgroup by    nation,    o_yearorder by    nation,    o_year desc;

Q10

select    c_custkey,    c_name,    sum(l_extendedprice * (1 - l_discount)) as revenue,    c_acctbal,    n_name,    c_address,    c_phone,    c_commentfrom    customer,    orders,    lineitem,    nationwhere    c_custkey = o_custkey    and l_orderkey = o_orderkey    and o_orderdate >= date '1993-10-01'    and o_orderdate < date '1993-10-01' + interval '3' month    and l_returnflag = 'R'    and c_nationkey = n_nationkeygroup by    c_custkey,    c_name,    c_acctbal,    c_phone,    n_name,    c_address,    c_commentorder by    revenue desclimit 20;

Q11

select    ps_partkey,    sum(ps_supplycost * ps_availqty) as valuefrom    partsupp,    supplier,    nationwhere    ps_suppkey = s_suppkey    and s_nationkey = n_nationkey    and n_name = 'GERMANY'group by    ps_partkey having        sum(ps_supplycost * ps_availqty) > (            select                sum(ps_supplycost * ps_availqty) * 0.0000010000            from                partsupp,                supplier,                nation            where                ps_suppkey = s_suppkey                and s_nationkey = n_nationkey                and n_name = 'GERMANY'        )order by    value desc    limit 100;

Q12

select    l_shipmode,    sum(case        when o_orderpriority = '1-URGENT'            or o_orderpriority = '2-HIGH'            then 1        else 0    end) as high_line_count,    sum(case        when o_orderpriority <> '1-URGENT'            and o_orderpriority <> '2-HIGH'            then 1        else 0    end) as low_line_countfrom    orders,    lineitemwhere    o_orderkey = l_orderkey    and l_shipmode in ('MAIL', 'SHIP')    and l_commitdate < l_receiptdate    and l_shipdate < l_commitdate    and l_receiptdate >= date '1994-01-01'    and l_receiptdate < date '1994-01-01' + interval '1' yeargroup by    l_shipmodeorder by    l_shipmode;

Q13

select    c_count,    count(*) as custdistfrom    (        select            c_custkey,            count(o_orderkey)        from            customer left outer join orders on                c_custkey = o_custkey                and o_comment not like '%special%requests%'        group by            c_custkey    ) as c_orders (c_custkey, c_count)group by    c_countorder by    custdist desc,    c_count desc;

Q14

select    100.00 * sum(case        when p_type like 'PROMO%'            then l_extendedprice * (1 - l_discount)        else 0    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenuefrom    lineitem,    partwhere    l_partkey = p_partkey    and l_shipdate >= date '1995-09-01'    and l_shipdate < date '1995-09-01' + interval '1' month;

Q15

with revenue0(SUPPLIER_NO, TOTAL_REVENUE)  as    (    select        l_suppkey,        sum(l_extendedprice * (1 - l_discount))    from        lineitem    where        l_shipdate >= date '1995-12-01'        and l_shipdate < date '1995-12-01' + interval '3' month    group by        l_suppkey    )select    s_suppkey,    s_name,    s_address,    s_phone,    total_revenuefrom    supplier,    revenue0where    s_suppkey = supplier_no    and total_revenue = (        select            max(total_revenue)        from            revenue0    )order by    s_suppkey;

Q16

select    p_brand,    p_type,    p_size,    count(distinct ps_suppkey) as supplier_cntfrom    partsupp,    partwhere    p_partkey = ps_partkey    and p_brand <> 'Brand#45'    and p_type not like 'MEDIUM POLISHED%'    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)    and ps_suppkey not in (        select            s_suppkey        from            supplier        where            s_comment like '%Customer%Complaints%'    )group by    p_brand,    p_type,    p_sizeorder by    supplier_cnt desc,    p_brand,    p_type,    p_size;

Q17

select    sum(l_extendedprice) / 7.0 as avg_yearlyfrom    lineitem,    partwhere    p_partkey = l_partkey    and p_brand = 'Brand#23'    and p_container = 'MED BOX'    and l_quantity < (        select            0.2 * avg(l_quantity)        from            lineitem        where            l_partkey = p_partkey            and l_partkey in(        select p_partkey         from part         where p_brand = 'Brand#23' and p_container = 'MED BOX')    );

Q18

select    c_name,    c_custkey,    o_orderkey,    o_orderdate,    o_totalprice,    sum(l_quantity)from    customer,    orders,    lineitemwhere    o_orderkey in (        select            l_orderkey        from            lineitem        group by            l_orderkey having                sum(l_quantity) > 300    )    and c_custkey = o_custkey    and o_orderkey = l_orderkeygroup by    c_name,    c_custkey,    o_orderkey,    o_orderdate,    o_totalpriceorder by    o_totalprice desc,    o_orderdatelimit 100;

Q19

select    sum(l_extendedprice* (1 - l_discount)) as revenuefrom    lineitem,    partwhere    (        p_partkey = l_partkey        and p_brand = 'Brand#12'        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')        and l_quantity >= 1 and l_quantity <= 1 + 10        and p_size between 1 and 5        and l_shipmode in ('AIR', 'AIR REG')        and l_shipinstruct = 'DELIVER IN PERSON'    )    or    (        p_partkey = l_partkey        and p_brand = 'Brand#23'        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')        and l_quantity >= 10 and l_quantity <= 10 + 10        and p_size between 1 and 10        and l_shipmode in ('AIR', 'AIR REG')        and l_shipinstruct = 'DELIVER IN PERSON'    )    or    (        p_partkey = l_partkey        and p_brand = 'Brand#34'        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')        and l_quantity >= 20 and l_quantity <= 20 + 10        and p_size between 1 and 15        and l_shipmode in ('AIR', 'AIR REG')        and l_shipinstruct = 'DELIVER IN PERSON'    );

Q20

select    s_name,    s_addressfrom    supplier,    nationwhere    s_suppkey in (        select            ps_suppkey        from            partsupp        where            ps_partkey in (                select                    p_partkey                from                    part                where                    p_name like 'forest%'            )            and ps_availqty > (                select                    0.5 * sum(l_quantity)                from                    lineitem                where                    l_partkey = ps_partkey                    and l_suppkey = ps_suppkey                    and l_shipdate >= date '1994-01-01'                    and l_shipdate < date '1994-01-01' + interval '1' year            )    )    and s_nationkey = n_nationkey    and n_name = 'CANADA'order by    s_name;

Q21

select    s_name,    count(*) as numwaitfrom    supplier,    lineitem l1,    orders,    nationwhere    s_suppkey = l1.l_suppkey    and o_orderkey = l1.l_orderkey    and o_orderstatus = 'F'    and l1.l_receiptdate > l1.l_commitdate    and exists (        select            *        from            lineitem l2        where            l2.l_orderkey = l1.l_orderkey            and l2.l_suppkey <> l1.l_suppkey    )    and not exists (        select            *        from            lineitem l3        where            l3.l_orderkey = l1.l_orderkey            and l3.l_suppkey <> l1.l_suppkey            and l3.l_receiptdate > l3.l_commitdate    )    and s_nationkey = n_nationkey    and n_name = 'SAUDI ARABIA'group by    s_nameorder by    numwait desc,    s_namelimit 100;

Q22

select    cntrycode,    count(*) as numcust,    sum(c_acctbal) as totacctbalfrom    (        select            substring(c_phone from 1 for 2) as cntrycode,            c_acctbal        from            customer        where            substring(c_phone from 1 for 2) in                ('13', '31', '23', '29', '30', '18', '17')            and c_acctbal > (                select                    avg(c_acctbal)                from                    customer                where                    c_acctbal > 0.00                    and substring(c_phone from 1 for 2) in                        ('13', '31', '23', '29', '30', '18', '17')            )            and not exists (                select                    *                from                    orders                where                    o_custkey = c_custkey            )    ) as custsalegroup by    cntrycodeorder by    cntrycode;

Key/Value点查场景测试

筹备数据

1. 创立表

  • 持续应用OLAP查问场景创立的数据库,咱们会应用TPCH数据集中的orders表进行测试,应用psql连贯Hologres后,您能够应用如下建表语句创立数据库表;
留神:点查场景须要应用行存表,所以须要创立一张新表,不能应用OLAP查问场景中应用的表
DROP TABLE IF EXISTS orders_row;BEGIN;CREATE TABLE public.orders_row ( "o_orderkey" int8 NOT NULL, "o_custkey" int8, "o_orderstatus" bpchar(1), "o_totalprice" numeric(15,2), "o_orderdate" date, "o_orderpriority" bpchar(15), "o_clerk" bpchar(15), "o_shippriority" int8, "o_comment" varchar(79),PRIMARY KEY (o_orderkey));CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');COMMIT;

2. COPY形式导入数据

  • 本测试计划次要应用 COPY FROM STDIN 的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。
  • 您能够在数据生成工具的目录中参考如下命令导入数据
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl

3. INSERT INTO形式导入数据

  • 因为OLAP场景时您曾经导入了orders表的数据,您能够运行如下SQL语句导入数据
INSERT INTO public.orders_rowSELECT  *FROM    public.orders;

查问

1. 生成查问语句

  • Key/Value点查场景次要的查问语句特色如下
SELECT  column_a        ,column_b        ,...        ,column_xFROM    table_xWHERE   pk = value_x;

SELECT  column_a        ,column_b        ,...        ,column_xFROM    table_xWHERE   pk IN ( value_a, value_b,..., value_x );
  • 您能够应用如下脚本生成所需的sql,该脚本会生成2条sql

    • kv\_query\_single.sql 针对单值筛选的SQL
    • kv\_query\_in.sql 针对多值筛选的SQL,该脚本会随机生成一个针对10个值筛选的SQL
rm -rf kv_querymkdir kv_querycd kv_queryecho '\set column_values random(1,99999999)select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sqlecho '\set column_values1 random(1,99999999)\set column_values2 random(1,99999999)\set column_values3 random(1,99999999)\set column_values4 random(1,99999999)\set column_values5 random(1,99999999)\set column_values6 random(1,99999999)\set column_values7 random(1,99999999)\set column_values8 random(1,99999999)\set column_values9 random(1,99999999)\set column_values10 random(1,99999999)select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql

2. 进行查问

  • 查问须要应用pgbench,您能够应用如下命令装置pgbench
yum install postgresql-contrib
  • 之后您即可应用pgbench进行压测,针对单值筛选的场景
留神,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
  • 针对多值筛选的场景
留神,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
  • 参数解释
配置项参数阐明
-hHologres实例的endpoint在Hologres管控台查看
-pHologres实例的端口地址在Hologres管控台查看
-dHologres指定实例中的数据库名
-c客户端数目(并发度)示例:8
-t每个客户端须要执行的压测query数目50
-f压测的sql示例:6.sql

测试后果参考

测试数据量:

  • 本测试基于TPCH 100G的数据集进行测试,具体数据量如下表所示
表名行数
LINEITEM600,037,902
ORDERS150,000,000
PARTSUPP80,000,000
PART20,000,000
CUSTOMER15,000,000
SUPPLIER1,000,000
NATION25
REGION5

集群规格

计算资源存储容量软件版本备注
64 CU
(CPU:64 Core 内存:256 GB)100 GBr0.10.20应用集群默认配置,Shard数量:40
128 CU
(CPU:128 Core 内存:512 GB)100 GBr0.10.20应用集群默认配置,Shard数量:80
测试工夫:2021年6月

测试后果

数据导入工夫

  • 数据导入执行工夫以秒(s)为单位。
  • 导入工夫指将数据导入Hologres内表
  • 在应用COPY办法导入数据时,一张表对应一个数据文件,并未应用并发导入形式。
  • 具体数值如下表所示
阐明:应用COPY形式导入时一张表对应一个数据文件,并未应用并发导入形式
表名行数数据量Hologres 64CU
应用COPY形式导入(公网网络)应用COPY形式导入(VPC网络导入)应用MaxCompute表面导入
LINEITEM600,037,90273.6GB3,070.453694.364148.165
ORDERS150,000,00016.4GB691.060172.52937.741
PARTSUPP80,000,0002.3GB468.560107.09218.488
PART20,000,00011.3GB96.34224.0208.083
CUSTOMER15,000,0002.3GB95.19022.93710.363
SUPPLIER1,000,000132MB5.0571.8031.503
NATION252KB0.5800.5840.747
REGION50.375KB0.1680.1530.430
Total106G4427.4101023.482225.52
  • 下图中蓝色为应用COPY形式在公网条件下导入数据的工夫,绿色为应用COPY形式在VPC网络条件下导入数据的工夫,灰色为应用MaxCompute表面形式导入的工夫
  • 纵坐标数值越低,示意导入速度越快
  • 横轴:表名。纵轴:数据导入工夫(s)

  • 能够看出,因为网络带宽影响,应用COPY形式导入本地文件数据时,应用VPC网络导入数据工夫显著短于应用公网导入数据工夫;应用MaxCompute导入数据工夫显著短于应用COPY形式导入本地文件数据工夫。

查问工夫

  • 查问执行工夫以秒(s)为单位。
  • 查问后果均基于Hologres内表
  • 具体数值如下表所示
TPCH Query编号Hologres 64CUHologres 128CU
13.1202.150
20.5810.467
31.7351.005
41.5580.836
52.9211.917
60.2970.096
72.0061.029
82.6741.679
95.2982.796
101.9440.924
110.3970.297
121.5310.852
131.7410.971
140.2860.160
150.2930.177
161.2231.020
171.4050.607
183.8172.169
191.4000.622
201.3580.868
214.1642.047
221.1210.654
Total40.87023.343
  • 下图中蓝色为64CU的实例的查问后果,绿色为128CU实例的查问后果
  • 纵坐标数值越低,示意 TPC-H 性能越好。
  • 能够看出随着实例规模的老本增长,查问工夫也在成线性降落趋势
  • 横轴:query在文档中的编号。纵轴:query执行工夫(s)

版权申明:本文内容由阿里云实名注册用户自发奉献,版权归原作者所有,阿里云开发者社区不领有其著作权,亦不承当相应法律责任。具体规定请查看《阿里云开发者社区用户服务协定》和《阿里云开发者社区知识产权爱护指引》。如果您发现本社区中有涉嫌剽窃的内容,填写侵权投诉表单进行举报,一经查实,本社区将立即删除涉嫌侵权内容。