关于sql:Hologres基于TPCH的性能测试介绍

33次阅读

共计 23304 个字符,预计需要花费 59 分钟才能阅读完成。

简介:本文将会介绍在 Hologres 中如何基于 TPCH 数据集做性能测试,并提供测试后果参考,不便您进行产品规格选型。

背景信息

TPC-H(商业智能计算测试)是美国交易解决效力委员会(TPC,Transaction Processing Performance Council)组织制订的用来模仿决策反对类利用的一个测试集。目前在学术界和工业界广泛采纳它来评估决策反对技术方面利用的性能。TPC-H 是依据实在的生产运行环境来建模,模仿了一套销售零碎的数据仓库。其共蕴含 8 张表,数据量可设定从 1G~3T 不等。其基准测试共蕴含了 22 个查问,次要评估指标各个查问的响应工夫,即从提交查问到后果返回所需工夫。其测试后果可综合反映零碎解决查问时的能力。详情参考 TPCH 文档。

数据集介绍

该数据集蕴含如下 8 张表,互相间的关系如下图所示。

测试详情

测试数据量阐明

测试数据量会间接影响测试后果,TPC-H 的生成工具中应用 SF (scale factor) 管制生成数据的数据量的大小,1 SF 对应 1 GB。

留神:以上提及的数据量仅仅为 原始数据的数据量,不包含索引等空间占用,所以筹备环境时,须要预留更多的空间。

测试环境

本次测试应用了独享实例(按量付费)的实例,因为仅为测试示意应用,所以计算资源配置抉择了 8 核 32G。

测试场景

本测试场景次要蕴含 3 局部:

  1. OLAP 查问场景测试,次要应用列存表,间接应用 TPCH 测试中的 22 条查问;
  2. Key/Value 点查场景测试,次要应用行存表,针对 orders 应用行存表后,进行主键过滤的点查;

根底环境筹备

  • 该步骤次要用于筹备 OLAP 查问场景和 Key/Value 点查场景所需的数据;

根底环境筹备

1. 创立 ECS 实例

登陆阿里云,创立一个 ECS 实例,用于数据生成、向 Hologres 导入数据、客户端测试。倡议规格:

  • ecs.g6.4xlarge 规格
  • CentOS 7.9 零碎
  • ESSD 数据盘,具体数据容量依据须要测试的数据量大小决定
  • 倡议 ECS 与 Hologres 实例用雷同 Region 和 VPC 网络

2. 创立 Hologres 实例

  • 登陆阿里云,进入 Hologres 产品控制台,点击 新增引擎实例
  • 抉择配置,并填写实例名称,具体阐明请参考官网文档。

3. 创立测试数据库

  • 在创立实例后,您须要登陆您创立的 Hologres 实例,创立一个数据库,本测试中命名数据库为tpch_1sf,具体操作步骤请参考官网文档

生成 TPC-H 数据

1. 筹备数据生成工具

  • 近程链接 ECS 实例
  • 更新所有库
yum update
  • 装置 git
yum install git
  • 装置 gcc
yum install gcc
  • 下载 TPC-H 数据生成代码
git clone https://github.com/gregrahn/tpch-kit.git
  • 进入数据生成工具代码目录
cd tpch-kit/dbgen
  • 编译数据生成工具代码
make

2. 生成数据

  • 编译胜利后,您能够应用如下代码查看代码生成工具的相干参数。
./dbgen --help
  • 本次测试仅生成 1 GB 数据,所以运行如下代码生成数据。
./dbgen -vf -s 1

如您须要生成更多数据量的数据,能够调整 SF 的参数,例如您能够应用如下代码生成 1 T 数据

./dbgen -vf -s 1000
  • 个别状况下,32CU 能够跑 TPCH SF10,256CU 能够跑 TPCH SF50
  • 数据生成后,您能够应用如下代码查看生成的文件。能够看到生成工具生成了 8 个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'

OLAP 查问场景测试

筹备数据

1. 创立表

  • 因为本文次要应用 psql 进行数据导入操作,须要先在 ECS 中运行如下命令装置 psql
yum install postgresql-server
  • 装置 psql 后,您能够应用如下命令登陆 Hologres 实例
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
  • 应用 psql 连贯 Hologres 后,您能够应用如下建表语句创立数据库表
DROP TABLE IF EXISTS LINEITEM;

BEGIN;
CREATE TABLE LINEITEM
(
    L_ORDERKEY      INT         NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS ORDERS;

BEGIN;
CREATE TABLE ORDERS
(
    O_ORDERKEY      INT         NOT NULL PRIMARY KEY,
    O_CUSTKEY       INT         NOT NULL,
    O_ORDERSTATUS   TEXT        NOT NULL,
    O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
    O_ORDERDATE     timestamptz NOT NULL,
    O_ORDERPRIORITY TEXT        NOT NULL,
    O_CLERK         TEXT        NOT NULL,
    O_SHIPPRIORITY  INT         NOT NULL,
    O_COMMENT       TEXT        NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS PARTSUPP;

BEGIN;
CREATE TABLE PARTSUPP
(
    PS_PARTKEY    INT    NOT NULL,
    PS_SUPPKEY    INT    NOT NULL,
    PS_AVAILQTY   INT    NOT NULL,
    PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
    PS_COMMENT    TEXT   NOT NULL,
    PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS PART;

BEGIN;
CREATE TABLE PART
(
    P_PARTKEY     INT    NOT NULL PRIMARY KEY,
    P_NAME        TEXT   NOT NULL,
    P_MFGR        TEXT   NOT NULL,
    P_BRAND       TEXT   NOT NULL,
    P_TYPE        TEXT   NOT NULL,
    P_SIZE        INT    NOT NULL,
    P_CONTAINER   TEXT   NOT NULL,
    P_RETAILPRICE DECIMAL(15,2) NOT NULL,
    P_COMMENT     TEXT   NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'colocate_with', 'LINEITEM');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;



DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
    C_CUSTKEY    INT    NOT NULL PRIMARY KEY,
    C_NAME       TEXT   NOT NULL,
    C_ADDRESS    TEXT   NOT NULL,
    C_NATIONKEY  INT    NOT NULL,
    C_PHONE      TEXT   NOT NULL,
    C_ACCTBAL    DECIMAL(15,2) NOT NULL,
    C_MKTSEGMENT TEXT   NOT NULL,
    C_COMMENT    TEXT   NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS SUPPLIER;

BEGIN;
CREATE TABLE SUPPLIER
(
    S_SUPPKEY   INT    NOT NULL PRIMARY KEY,
    S_NAME      TEXT   NOT NULL,
    S_ADDRESS   TEXT   NOT NULL,
    S_NATIONKEY INT    NOT NULL,
    S_PHONE     TEXT   NOT NULL,
    S_ACCTBAL   DECIMAL(15,2) NOT NULL,
    S_COMMENT   TEXT   NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS NATION;

BEGIN;
CREATE TABLE NATION(
  N_NATIONKEY INT NOT NULL PRIMARY KEY,
  N_NAME text NOT NULL,
  N_REGIONKEY INT NOT NULL,
  N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS REGION;

BEGIN;
CREATE TABLE REGION
(
    R_REGIONKEY INT  NOT NULL PRIMARY KEY,
    R_NAME      TEXT NOT NULL,
    R_COMMENT   TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
  • 创立结束后,您能 在 psql 中 应用如下代码查看是否创立胜利
tpch_1sf=# \dt
  • 若胜利,事实成果如下
tpch_1sf=# \dt
               List of relations
 Schema |   Name   | Type  |       Owner        
--------+----------+-------+--------------------
 public | customer | table | tpch_1sf_developer
 public | lineitem | table | tpch_1sf_developer
 public | nation   | table | tpch_1sf_developer
 public | orders   | table | tpch_1sf_developer
 public | part     | table | tpch_1sf_developer
 public | partsupp | table | tpch_1sf_developer
 public | region   | table | tpch_1sf_developer
 public | supplier | table | tpch_1sf_developer
(8 rows)

2. 导入数据

  • 本测试计划次要应用 COPY FROM STDIN 的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。
  • 您能够在数据生成工具的目录中参考如下 shell脚本导入数据
for i in `ls *.tbl`; do
    echo $i;
    name=`echo $i| cut -d'.' -f1`;
    PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter'|'csv;" < $i;
done
  • 至此您已实现数据导入

3. 收集统计信息

  • 为了更好的执行查问,能够在 psql 中应用如下语句,使 Hologres 收集各张表特色信息。
vacuum  region;
vacuum  nation;
vacuum  supplier;
vacuum  customer;
vacuum  part;
vacuum  partsupp;
vacuum  orders;
vacuum  lineitem;

analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;

执行查问

  • 为了不便统计查问信息,须要应用 pgbench 工具,您能够应用如下命令装置 pgbench(如果测试机上已有 pgbench,请确保版本大于 9.6 以上,最好大版本是 13 以上,否则以下测试会遇到各种不兼容)
yum install postgresql-contrib
  • 为了不便查问,您能够间接通过以下连贯,下载所需的 22 条 SQL

tpch\_data\_tpch\_query.zip

  • 而后上传至 ECS
  • 进入 ECS,并进入上传文件的目录,应用如下 shell 命令解压缩文件
unzip tpch_data_tpch_query
  • 至此,您曾经实现了筹备工作,即可应用 pgbench 进行测试,您能够应用如下命令执行单条查问
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
  • 参数解释
配置项 参数 阐明
-h Hologres 实例的 endpoint 在 Hologres 管控台查看
-p Hologres 实例的端口地址 在 Hologres 管控台查看
-d Hologres 指定实例中的数据库名
-c 客户端数目(并发度) 示例:1,因为该测试仅测试查问性能,不测试并发,所以并发度置为 1 即可
-t 每个客户端须要执行的压测 query 数目 50
-f 压测的 sql 示例:6.sql
  • 也能够间接执行如下 shell 脚本,间接批量执行 22 条查问,并将后果输入到文件 hologres\_tpch\_test.out 中
rm -f hologres_tpch_test.out
echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
for i in {1..22}
do
    PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.out
done
  • 查看 hologres\_tpch\_test.out 即可失去查问后果,样例如下

    • transaction type:阐明了执行的具体的 SQL 文件
    • latency average:记录了对应 SQL 文件的 3 次查问的均匀工夫
2021-03-23 03:50:54 begin
pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
transaction type: ./tpch_data_tpch_query/1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 3
number of transactions actually processed: 3/3
latency average = 76.936 ms
tps = 12.997850 (including connections establishing)
tps = 15.972757 (excluding connections establishing)
...

TPCH 22 条查问语句

Q1

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '90' day 
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

Q2

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 15
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = (
        select
            min(ps_supplycost)
        from
            partsupp,
            supplier,
            nation,
            region
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
    )
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
limit 100;

Q3

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'BUILDING'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-15'
    and l_shipdate > date '1995-03-15'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
limit 10;

Q4

select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1993-07-01'
    and o_orderdate < date '1993-07-01' + interval '3' month
    and exists (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;

Q5

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= date '1994-01-01'
    and o_orderdate < date '1994-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc;

Q6

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < date '1994-01-01' + interval '1' year
    and l_discount between 6 - 1 and 6 + 1
    and l_quantity < 2400

Q7

set hg_experimental_enable_double_equivalent=on;
select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(year from l_shipdate) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and ((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
            )
            and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

Q8

set hg_experimental_enable_double_equivalent=on;
select
    o_year,
    sum(case
        when nation = 'BRAZIL' then volume
        else 0
    end) / sum(volume) as mkt_share
from
    (
        select
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) as volume,
            n2.n_name as nation
        from
            part,
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2,
            region
        where
            p_partkey = l_partkey
            and s_suppkey = l_suppkey
            and l_orderkey = o_orderkey
            and o_custkey = c_custkey
            and c_nationkey = n1.n_nationkey
            and n1.n_regionkey = r_regionkey
            and r_name = 'AMERICA'
            and s_nationkey = n2.n_nationkey
            and o_orderdate between date '1995-01-01' and date '1996-12-31'
            and p_type = 'STANDARD POLISHED TIN'
    ) as all_nations
group by
    o_year
order by
    o_year;

Q9

set hg_experimental_enable_double_equivalent=on;
select
    nation,
    o_year,
    sum(amount) as sum_profit
from
    (
        select
            n_name as nation,
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
        from
            part,
            supplier,
            lineitem,
            partsupp,
            orders,
            nation
        where
            s_suppkey = l_suppkey
            and ps_suppkey = l_suppkey
            and ps_partkey = l_partkey
            and p_partkey = l_partkey
            and o_orderkey = l_orderkey
            and s_nationkey = n_nationkey
            and p_name like '%green%'
    ) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc;

Q10

select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1993-10-01'
    and o_orderdate < date '1993-10-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
limit 20;

Q11

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'GERMANY'
group by
    ps_partkey having
        sum(ps_supplycost * ps_availqty) > (
            select
                sum(ps_supplycost * ps_availqty) * 0.0000010000
            from
                partsupp,
                supplier,
                nation
            where
                ps_suppkey = s_suppkey
                and s_nationkey = n_nationkey
                and n_name = 'GERMANY'
        )
order by
    value desc
    limit 100;

Q12

select
    l_shipmode,
    sum(case
        when o_orderpriority = '1-URGENT'
            or o_orderpriority = '2-HIGH'
            then 1
        else 0
    end) as high_line_count,
    sum(case
        when o_orderpriority <> '1-URGENT'
            and o_orderpriority <> '2-HIGH'
            then 1
        else 0
    end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('MAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode;

Q13

select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                c_custkey = o_custkey
                and o_comment not like '%special%requests%'
        group by
            c_custkey
    ) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc;

Q14

select
    100.00 * sum(case
        when p_type like 'PROMO%'
            then l_extendedprice * (1 - l_discount)
        else 0
    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-09-01'
    and l_shipdate < date '1995-09-01' + interval '1' month;

Q15

with revenue0(SUPPLIER_NO, TOTAL_REVENUE)  as
    (
    select
        l_suppkey,
        sum(l_extendedprice * (1 - l_discount))
    from
        lineitem
    where
        l_shipdate >= date '1995-12-01'
        and l_shipdate < date '1995-12-01' + interval '3' month
    group by
        l_suppkey
    )
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey;

Q16

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45'
    and p_type not like 'MEDIUM POLISHED%'
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
    and ps_suppkey not in (
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

Q17

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
    lineitem,
    part
where
    p_partkey = l_partkey
    and p_brand = 'Brand#23'
    and p_container = 'MED BOX'
    and l_quantity < (
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
            and l_partkey in(
        select p_partkey 
        from part 
        where p_brand = 'Brand#23' and p_container = 'MED BOX')
    );

Q18

select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 300
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
limit 100;

Q19

select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#12'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 1 and l_quantity <= 1 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 10 and l_quantity <= 10 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#34'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 20 and l_quantity <= 20 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

Q20

select
    s_name,
    s_address
from
    supplier,
    nation
where
    s_suppkey in (
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'forest%'
            )
            and ps_availqty > (
                select
                    0.5 * sum(l_quantity)
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date '1994-01-01'
                    and l_shipdate < date '1994-01-01' + interval '1' year
            )
    )
    and s_nationkey = n_nationkey
    and n_name = 'CANADA'
order by
    s_name;

Q21

select
    s_name,
    count(*) as numwait
from
    supplier,
    lineitem l1,
    orders,
    nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists (
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists (
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'SAUDI ARABIA'
group by
    s_name
order by
    numwait desc,
    s_name
limit 100;

Q22

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from
    (
        select
            substring(c_phone from 1 for 2) as cntrycode,
            c_acctbal
        from
            customer
        where
            substring(c_phone from 1 for 2) in
                ('13', '31', '23', '29', '30', '18', '17')
            and c_acctbal > (
                select
                    avg(c_acctbal)
                from
                    customer
                where
                    c_acctbal > 0.00
                    and substring(c_phone from 1 for 2) in
                        ('13', '31', '23', '29', '30', '18', '17')
            )
            and not exists (
                select
                    *
                from
                    orders
                where
                    o_custkey = c_custkey
            )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

Key/Value 点查场景测试

筹备数据

1. 创立表

  • 持续应用 OLAP 查问场景创立的数据库,咱们会应用 TPCH 数据集中的 orders 表进行测试,应用 psql 连贯 Hologres 后,您能够应用如下建表语句创立数据库表;

留神:点查场景须要应用行存表,所以须要创立一张新表,不能应用 OLAP 查问场景中应用的表

DROP TABLE IF EXISTS orders_row;

BEGIN;
CREATE TABLE public.orders_row (
 "o_orderkey" int8 NOT NULL,
 "o_custkey" int8,
 "o_orderstatus" bpchar(1),
 "o_totalprice" numeric(15,2),
 "o_orderdate" date,
 "o_orderpriority" bpchar(15),
 "o_clerk" bpchar(15),
 "o_shippriority" int8,
 "o_comment" varchar(79),
PRIMARY KEY (o_orderkey)
);
CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');
CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
COMMIT;

2. COPY 形式导入数据

  • 本测试计划次要应用 COPY FROM STDIN 的形式导入数据具体能够参考官网文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创立的表中。
  • 您能够在数据生成工具的目录中参考如下命令导入数据
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter'|'csv;" < orders.tbl

3. INSERT INTO 形式导入数据

  • 因为 OLAP 场景时您曾经导入了 orders 表的数据,您能够运行如下 SQL 语句导入数据
INSERT INTO public.orders_row
SELECT  *
FROM    public.orders;

查问

1. 生成查问语句

  • Key/Value 点查场景次要的查问语句特色如下
SELECT  column_a
        ,column_b
        ,...
        ,column_x
FROM    table_x
WHERE   pk = value_x
;

SELECT  column_a
        ,column_b
        ,...
        ,column_x
FROM    table_x
WHERE   pk IN (value_a, value_b,..., value_x)
;
  • 您能够应用如下脚本生成所需的 sql,该脚本会生成 2 条 sql

    • kv\_query\_single.sql 针对单值筛选的 SQL
    • kv\_query\_in.sql 针对多值筛选的 SQL,该脚本会随机生成一个针对 10 个值筛选的 SQL
rm -rf kv_query
mkdir kv_query
cd kv_query
echo '\set column_values random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sql
echo '\set column_values1 random(1,99999999)
\set column_values2 random(1,99999999)
\set column_values3 random(1,99999999)
\set column_values4 random(1,99999999)
\set column_values5 random(1,99999999)
\set column_values6 random(1,99999999)
\set column_values7 random(1,99999999)
\set column_values8 random(1,99999999)
\set column_values9 random(1,99999999)
\set column_values10 random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql

2. 进行查问

  • 查问须要应用 pgbench,您能够应用如下命令装置 pgbench
yum install postgresql-contrib
  • 之后您即可应用 pgbench 进行压测,针对单值筛选的场景

留神,请在生成 SQL 的目录执行如下命令

PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
  • 针对多值筛选的场景

留神,请在生成 SQL 的目录执行如下命令

PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
  • 参数解释
配置项 参数 阐明
-h Hologres 实例的 endpoint 在 Hologres 管控台查看
-p Hologres 实例的端口地址 在 Hologres 管控台查看
-d Hologres 指定实例中的数据库名
-c 客户端数目(并发度) 示例:8
-t 每个客户端须要执行的压测 query 数目 50
-f 压测的 sql 示例:6.sql

测试后果参考

测试数据量:

  • 本测试基于 TPCH 100G 的数据集进行测试,具体数据量如下表所示
表名 行数
LINEITEM 600,037,902
ORDERS 150,000,000
PARTSUPP 80,000,000
PART 20,000,000
CUSTOMER 15,000,000
SUPPLIER 1,000,000
NATION 25
REGION 5

集群规格

计算资源 存储容量 软件版本 备注
64 CU
(CPU:64 Core 内存:256 GB) 100 GB r0.10.20 应用集群默认配置,Shard 数量:40
128 CU
(CPU:128 Core 内存:512 GB) 100 GB r0.10.20 应用集群默认配置,Shard 数量:80

测试工夫:2021 年 6 月

测试后果

数据导入工夫

  • 数据导入执行工夫以秒(s)为单位。
  • 导入工夫指将数据导入 Hologres 内表
  • 在应用 COPY 办法导入数据时,一张表对应一个数据文件,并未应用并发导入形式。
  • 具体数值如下表所示

阐明:应用 COPY 形式导入时一张表对应一个数据文件,并未应用并发导入形式

表名 行数 数据量 Hologres 64CU
应用 COPY 形式导入(公网网络) 应用 COPY 形式导入(VPC 网络导入) 应用 MaxCompute 表面导入
LINEITEM 600,037,902 73.6GB 3,070.453 694.364 148.165
ORDERS 150,000,000 16.4GB 691.060 172.529 37.741
PARTSUPP 80,000,000 2.3GB 468.560 107.092 18.488
PART 20,000,000 11.3GB 96.342 24.020 8.083
CUSTOMER 15,000,000 2.3GB 95.190 22.937 10.363
SUPPLIER 1,000,000 132MB 5.057 1.803 1.503
NATION 25 2KB 0.580 0.584 0.747
REGION 5 0.375KB 0.168 0.153 0.430
Total 106G 4427.410 1023.482 225.52
  • 下图中蓝色为应用 COPY 形式在公网条件下导入数据的工夫,绿色为应用 COPY 形式在 VPC 网络条件下导入数据的工夫,灰色为应用 MaxCompute 表面形式导入的工夫
  • 纵坐标数值越低,示意导入速度越快
  • 横轴:表名。纵轴:数据导入工夫(s)

  • 能够看出,因为网络带宽影响,应用 COPY 形式导入本地文件数据时,应用 VPC 网络导入数据工夫显著短于应用公网导入数据工夫;应用 MaxCompute 导入数据工夫显著短于应用 COPY 形式导入本地文件数据工夫。

查问工夫

  • 查问执行工夫以秒(s)为单位。
  • 查问后果均基于 Hologres 内表
  • 具体数值如下表所示
TPCH Query 编号 Hologres 64CU Hologres 128CU
1 3.120 2.150
2 0.581 0.467
3 1.735 1.005
4 1.558 0.836
5 2.921 1.917
6 0.297 0.096
7 2.006 1.029
8 2.674 1.679
9 5.298 2.796
10 1.944 0.924
11 0.397 0.297
12 1.531 0.852
13 1.741 0.971
14 0.286 0.160
15 0.293 0.177
16 1.223 1.020
17 1.405 0.607
18 3.817 2.169
19 1.400 0.622
20 1.358 0.868
21 4.164 2.047
22 1.121 0.654
Total 40.870 23.343
  • 下图中蓝色为 64CU 的实例的查问后果,绿色为 128CU 实例的查问后果
  • 纵坐标数值越低,示意 TPC-H 性能越好。
  • 能够看出随着实例规模的老本增长,查问工夫也在成线性降落趋势
  • 横轴:query 在文档中的编号。纵轴:query 执行工夫(s)

版权申明:本文内容由阿里云实名注册用户自发奉献,版权归原作者所有,阿里云开发者社区不领有其著作权,亦不承当相应法律责任。具体规定请查看《阿里云开发者社区用户服务协定》和《阿里云开发者社区知识产权爱护指引》。如果您发现本社区中有涉嫌剽窃的内容,填写侵权投诉表单进行举报,一经查实,本社区将立即删除涉嫌侵权内容。

正文完
 0