TPC与TPC-H 介绍
据TPC官网(http://tpc.org)介绍,TPC是一家非盈利公司,专一于开发以数据为核心的基准规范,并向行业流传主观、可验证的数据。TPC公布的被业界宽泛承受应用的测试基准有TPC-C 、TPC-H、TPC-DS等,TPC-C是针对OLTP的测试基准,TPC-H和TPC-DS则通常是针对OLAP的测试基准。
TPC-H是一个决策反对基准(Decision Support Benchmark),它由一套面向业务的特地查问和并发数据批改组成。查问和填充数据库的数据具备宽泛的行业相关性。这个基准测试演示了查看大量数据、执行高度简单的查问并答复要害业务问题的决策支持系统。TPC-H报告的性能指标称为TPC-H每小时复合查问性能指标(QphH@Size),反映了零碎解决查问能力的多个方面。这些方面包含执行查问时所抉择的数据库大小,由单个流提交查问时的查询处理能力,以及由多个并发用户提交查问时的查问吞吐量。
在2021年2月18日,TPC-H 标准规范公布了V3版本,下载链接:http://tpc.org/tpc_documents_...
TPC-H 表构造与关系
TPC-H 测试规范模仿了一个零部件在线销售的零碎,共定义了8个表:
- REGION 区域表
- NATION 国家表
- SUPPLIER 供应商表
- PART 零部件表
- PARTSUPP 零部件供给表
- CUSTOMER 客户表
- ORDERS 订单表
- LINEITEM 订单明细表
各表的构造、数据量和相互之间的关系如下图,其中表名前面的括号内是该表的列名前缀,箭头指向为表之间一对多关系的方向,表名下方的数字或者公式是表的基数(行数)。 LINEITEM表中的基数是近似值,基数中的sf是比例因子,用于生成不同数据规模的数据集,sf必须从下列固定值中抉择:1/10/30/100/1000/3000/10000,相当于1GB/10GB/30GB/100GB/1TB/3TB/1PB。
关注公众号大数据研学社:BigDataRLC,第一工夫看最新文章。
TPC-H 查问SQL语句
Q1:定价汇总报表查问
定价汇总报表查问是分组、排序、汇集操作并存的单表查问操作,报告已开票、发货和退回的业务量; 为给定日期发货的所有订单项提供汇总定价报告,按l_returnflag和l_linestatus分组并排序,列出了扩大价格、折扣扩大价格、折扣扩大价格加税、均匀数量、均匀扩大价格、均匀折扣、各个分组的总行数。
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_orderfrom lineitemwhere l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3) //DELTA是60~120内的值group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
Q2:最低老本供应商查问
最低老本供应商查问是带有排序、汇集操作、子查问并存的多表查问操作,查找应该抉择那个供应商来为给定地区的给定整机下订单;在给定的区域中,针对特定类型和尺寸的每个整机,找到可能以最低老本供给的供应商。如果该地区的多个供应商以雷同的(最低)老本供货。查问列出了供应商的账户余额、名称和国家,零部件编号、制造商,供应商的地址、电话号码和评论信息。
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment /*查问供应者的帐户余额、名字、国家、整机的号码、生产者、供应者的地址、电话号码、备注信息 */from part, supplier, partsupp, nation, region //五表连贯where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = [SIZE] //指定大小,在区间[1, 50]内随机抉择 and p_type like '%[TYPE]' //指定类型,在TPC-H规范指定的范畴内随机抉择 and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' //指定地区,在TPC-H规范指定的范畴内随机抉择 and ps_supplycost = ( //子查问 select min(ps_supplycost) //汇集函数 from partsupp, supplier, nation, region //与父查问的表有重叠 where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' )order by //排序 s_acctbal desc, n_name, s_name, p_partkey;
Q3:运输优先级查问
运输优先级查问是带有分组、排序、汇集操作并存的三表查问操作,检索价值最高的未发货订单;在给定日期尚未发货的订单中支出最大的订单的运输优先级和潜在支出(l_extendedprice * (1-l_discount)的总和),订单依照支出的降序排序。
select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, //潜在的支出,汇集操作 o_orderdate, o_shippriorityfrom customer, orders, lineitem //三表连贯where c_mktsegment = '[SEGMENT]' //在TPC-H规范指定的范畴内随机抉择 and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '[DATE]' //指定日期段,在在[1995-03-01, 1995-03-31]中随机抉择 and l_shipdate > date '[DATE]'group by //分组操作 l_orderkey, //订单标识 o_orderdate, //订单日期 o_shippriority //运输优先级order by //排序操作 revenue desc, //降序排序,把潜在最大支出列在后面 o_orderdate;
Q4:订单优先级查问
订单优先级查问是带有分组、排序、汇集操作、子查问并存的单表查问操作,确定订单优先零碎的工作状况并评估客户满意度;在给定季度中的订购数量,其中客户在承诺日志之后收到至多一个订单项,列出了按优先级升序排序的每个订单优先级的此类订单的计数。
select o_orderpriority, //订单优先级 count(*) as order_count //订单优先级计数from orders //单表查问where o_orderdate >= date '[DATE]' and o_orderdate < date '[DATE]' + interval '3' month //指定订单的时间段+三个月,DATE是在1993年1月和1997年10月之间随机抉择的一个月的第一天 and exists ( //子查问 select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate )group by //按订单优先级分组 o_orderpriorityorder by //按订单优先级排序 o_orderpriority;
Q5:本地供应商支出量查问
本地供应商支出量查问是带有分组、排序、汇集操作、子查问并存的多表连贯查问操作,列出了通过本地供应商实现的支出量;为一个地区的每个国家列出了由订单项交易产生的支出量,以确定是否在给定区域建设本地配送核心。
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue //汇集操作from customer,orders,lineitem,supplier,nation,region //六表连贯where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' //指定地区,在TPC-H规范指定的范畴内随机抉择 and o_orderdate >= date '[DATE]' //DATE是从1993年到1997年中随机抉择的一年的1月1日 and o_orderdate < date '[DATE]' + interval '1' yeargroup by //按名字分组 n_nameorder by //按支出降序排序,留神分组和排序子句不同 revenue desc;
Q6:预测支出变动查问
预测支出变动查问是带有汇集操作的单表查问操作,量化了在给定年份中给定百分比范畴内打消某些公司范畴内的折扣所导致的支出增长量,用于寻找增加收入的办法;在给定的年份发货的所有订单项,折扣介于“DISCOUNT-0.01”和“DISCOUNT+0.01”之间,如果l_quantity小于quantity的订单的折扣被打消后总收入减少的数量。
select sum(l_extendedprice*l_discount) as revenue //潜在的支出增加量from lineitem //单表查问where l_shipdate >= date '[DATE]' //DATE是从[1993, 1997]中随机抉择的一年的1月1日 and l_shipdate < date '[DATE]' + interval '1' year //一年内 and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01 //between and l_quantity < [QUANTITY]; // QUANTITY在区间[24, 25]中随机抉择
Q7:批量出货查问
批量出货查问是带有分组、排序、汇集、子查问操作并存的多表查问操作,确定在某些国家间接运输的货物的价值以帮忙从新会谈运输合同;对于两个给定的国家,在1995年到1996年期间,从其中一个国家的供应商想另一个国家的客户运送整机的订单项取得的总折扣支出。
select supp_nation, //供货商国家 cust_nation, //顾客国家 l_year, sum(volume) as revenue //年度、年度的货运支出from ( //子查问 select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier,lineitem,orders,customer,nation n1,nation n2 //六表连贯 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( // NATION2和NATION1的值不同,示意查问的是跨国的货运状况 (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]') or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shippinggroup by supp_nation, cust_nation, l_yearorder by supp_nation, cust_nation, l_year;
Q8:全国市场份额查问
全国市场份额查问带有分组、排序、汇集、子查问操作并存的查问操作,确定给定区域内给定国家的市场份额在两年内对于给定整机类型的变动状况; 给定区域内给定国家的市场份额被定义为支出的一部分,即 l_extendedprice * (1-l_discount)的总和。
select o_year, //年份 sum(case when nation = '[NATION]'//指定国家,在TPC-H规范指定的范畴内随机抉择 then volume else 0 end) / sum(volume) as mkt_share //市场份额:特定品种的产品支出的百分比;汇集操作from ( //子查问 select extract(year from o_orderdate) as o_year, //合成出年份 l_extendedprice * (1-l_discount) as volume, //特定品种的产品支出 n2.n_name as nation from part,supplier,lineitem,orders,customer,nation n1,nation n2,region //八表连贯 where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = '[REGION]' //指定地区,在TPC-H规范指定的范畴内随机抉择 and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' //只查95、96年的状况 and p_type = '[TYPE]' //指定整机类型,在TPC-H规范指定的范畴内随机抉择) as all_nationsgroup by //按年分组 o_yearorder by //按年排序 o_year;
Q9:产品类型利润度量查问
产品类型利润度量查问是带有分组、排序、汇集、子查问操作并存的查问操作,确定按供应商国家和年份细分的给定整机系列的利润。
select nation, o_year, sum(amount) as sum_profit //每个国家每一年所有被定购的整机在一年中的总利润from( select n_name as nation, //国家 extract(year from o_orderdate) as o_year, //取出年份 l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount //利润 from part,supplier,lineitem,partsupp,orders,nation //六表连贯 where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%[COLOR]%' //LIKE操作,查问优化器可能进行优化) as profitgroup by //按国家和年份分组 nation, o_yearorder by //按国家和年份排序,年份大者靠前 nation, o_year desc;
Q10:退货报告查问
退货报告查问是带有分组、排序、汇集操作并存的多表连贯查问操作,辨认可能对运送给他们的整机有问题的客户;依据对给定季度支出损失的影响,找出排名靠前的已退回整机的客户。
select c_custkey, c_name, //客户信息 sum(l_extendedprice * (1 - l_discount)) as revenue, //支出损失 c_acctbal, n_name, c_address, c_phone, c_comment //国家、地址、电话、意见信息等from customer, orders, lineitem, nationwhere c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '[DATE]' // DATE是位于1993年一月到1994年十二月中任一月的一号 and o_orderdate < date '[DATE]' + interval '3' month //3个月内 and l_returnflag = 'R' //货物被回退 and c_nationkey = n_nationkeygroup by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_commentorder by revenue desc;
Q11:库存价值查问
库存价值查问是带有分组、排序、汇集、子查问操作并存的多表连贯查问操作,查找给定国家/地区供应商的库存价值; 扫描给定国家的供应商的可用库存来查找占有所有可用整机总价值的重要百分比的所有整机。
select ps_partkey, sum(ps_supplycost * ps_availqty) as value //汇集操作,商品的总价值from partsupp, supplier, nationwhere ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = '[NATION]'group by ps_partkey having //带有HAVING子句的分组操作 sum(ps_supplycost * ps_availqty) > ( //HAVING子句中包含有子查问 select sum(ps_supplycost * ps_availqty) * [FRACTION] //子查问中存在汇集操作;FRACTION为0.0001/SF1 from partsupp, supplier, nation //与父查问的表连贯统一 where //与父查问的WHEWR条件统一 ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = '[NATION]' //指定国家 )order by //按商品的价值降序排序 value desc;
Q12:运送形式和订单优先级查问
运送形式和订单优先级查问是带有分组、排序、汇集操作并存的两表连贯查问操作,确定抉择较便宜的运送形式是否会对要害优先级订单产生负面影响,导致消费者更多的在合同日期之后收到货物。
select l_shipmode, sum(//汇集操作 case when o_orderpriority ='1-URGENT' //OR运算,二者满足其一即可,选出URGENT或HIGH的 or o_orderpriority ='2-HIGH' then 1 else 0 end) as high_line_count, sum( case when o_orderpriority <> '1-URGENT' //AND运算,二者都不满足,非URGENT非HIGH的 and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_countfrom orders,lineitemwhere o_orderkey = l_orderkey and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') /* 指定货运模式的类型,在TPC-H规范指定的范畴内随机抉择,SHIPMODE2必须有别于SHIPMODE1 */ and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '[DATE]' //从1993年到1997年中任一年的一月一号 and l_receiptdate < date '[DATE]' + interval '1' year //1年之内group by //分组操作 l_shipmodeorder by //排序操作 l_shipmode;
Q13:客户散布查问
客户散布查问是带有分组、排序、汇集、子查问、左外连贯操作并存的查问操作;根究客户的订单数量确定客户的散布,计算并报告有多少客户没有订单,有多个客户有1/2/3 等个订单。
select c_count, count(*) as custdist //汇集操作,统计每个组的个数from (//子查问 select c_custkey, count(o_orderkey) from customer left outer join orders on //子查问中包含左外连贯操作 c_custkey = o_custkey and o_comment not like ‘%[WORD1]%[WORD2]%’ //LIKE操作 //WORD1 为以下四个可能值中任意一个:special、pending、unusual、express //WORD2 为以下四个可能值中任意一个:packages、requests、accounts、deposits group by //子查问中的分组操作 c_custkey) as c_orders (c_custkey, c_count)group by //分组操作 c_countorder by //排序操作 custdist desc, //从大到小降序排序 c_count desc;
Q14:促销成果查问
促销成果查问是带有分组、排序、汇集、子查问、左外连贯操作并存的查问操作,监控市场对促销的反馈;确定在给定年份和月份的支出中有多少百分比来之与促销局部。
select 100.00 * sum( case when p_type like 'PROMO%' //促销整机 then l_extendedprice*(1-l_discount) //某一特定工夫的支出 else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenuefrom lineitem, partwhere l_partkey = p_partkey and l_shipdate >= date '[DATE]' // DATE是从1993年到1997年中任一年的任一月的一号 and l_shipdate < date '[DATE]' + interval '1' month;
Q15:顶级供应商查问
顶级供应商查问是带有分排序、汇集、汇集子查问操作并存的一般表与视图的连贯操作,确定顶级供应商以对其进行处分、给予更多业务或取得非凡认可;在给定的年费的给定季度对总出货量奉献最大的供应商。
create view revenue[STREAM_ID](supplier_no, total_revenue) as //创立简单视图(带有分组操作)select l_suppkey, sum(l_extendedprice * (1 - l_discount)) //获取供货商为公司带来的总利润from lineitemwhere l_shipdate >= date '[DATE]' //DATE 是从1993年一月到1997年十月中任一月的一号 and l_shipdate < date '[DATE]' + interval '3' month //3个月内group by //分组键与查问对象之一雷同 l_suppkey;//查问语句select s_suppkey, s_name, s_address, s_phone, total_revenuefrom supplier,revenue[STREAM_ID] //一般表与简单视图进行连贯操作where s_suppkey = supplier_no and total_revenue = (//汇集子查问 select max(total_revenue) from revenue[STREAM_ID] //汇集子查问从视图取得数据 )order by s_suppkey; //删除视图drop view revenue[STREAM_ID];
TPC-H规范定义了Q15语句等价的变形SQL,应用了WITH子句,而后用WITH的对象与表进行连贯。变形SQL的语句如下:
WITH revenue (supplier_no, total_revenue) as ( SELECT l_suppkey, SUM(l_extendedprice * (1-l_discount)) FROM lineitem WHERE l_shipdate >= date ':1' AND l_shipdate < date ':1' + interval '3' month GROUP BY l_suppkey)SELECT s_suppkey, s_name, s_address, s_phone, total_revenueFROM supplier, revenueWHERE s_suppkey = supplier_no AND total_revenue = ( SELECT MAX(total_revenue) FROM revenue )ORDER BY s_suppkey;
Q16:零部件/供货商关系查问
零部件/供货商关系查问是带有分组、排序、汇集、去重、NOT IN子查问操作并存的两表连贯操作;找出有多少供应商能够提供具备给定属性的整机。
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt //汇集、去重操作from partsupp, partwhere p_partkey = ps_partkey and p_brand <> '[BRAND]' // BRAND=Brand#MN ,M和N是两个字母,代表两个数值,互相独立,取值在1到5之间 and p_type not like '[TYPE]%' //消费者不感兴趣的类型和尺寸 and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8]) //TYPEX是在1到50之间任意抉择的一组八个不同的值 and ps_suppkey not in ( //NOT IN子查问,消费者排除某些供货商 select s_suppkey from supplier where s_comment like '%Customer%Complaints%' )group by //分组操作 p_brand, p_type, p_sizeorder by //排序操作 supplier_cnt desc, //按数量降序排列,按品牌、品种、尺寸升序排列 p_brand, p_type, p_size;
Q17:小额订单支出查问
小额订单支出查问是带有汇集、汇集子查问操作并存的两表连贯操作;确定如果不在为大量特定整机实现订单,均匀每年将损失多少支出,通过集中在更大的出货量上来缩小管理费用;给定品牌和给定容器类型的零部件,确定在7年内数据库中所有订单订购的此类整机的均匀订单数量,如果不再接管低于20%平均水平的订单,均匀每年的总收入损失多少。
select sum(l_extendedprice) / 7.0 as avg_yearly //汇集操作from lineitem, partwhere p_partkey = l_partkey and p_brand = '[BRAND]' /*指定品牌。 BRAND=’Brand#MN’ ,M和N是两个字母,代表两个数值,互相独立,取值在1到5之间 */ and p_container = '[CONTAINER]' //指定包装类型。在TPC-H规范指定的范畴内随机抉择 and l_quantity < ( //汇集子查问 select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey);
Q18:大批量客户查问
大批量客户查问是带有分组、排序、汇集、IN子查问操作并存的三表连贯操作,依据客户的大批量订单对客户进行排名。
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, //根本信息 sum(l_quantity) //订货总数from customer, orders, lineitemwhere o_orderkey in ( //带有分组操作的IN子查问 select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > [QUANTITY] // QUANTITY是位于312到315之间的任意值 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalpriceorder by o_totalprice desc, o_orderdate;
Q19:折扣支出查问
折扣支出查问是带有分组、排序、汇集、IN子查问操作并存的三表连贯操作,归因于以特定形式解决的选定零部件的总折扣支出。本查问是用数据挖掘工具产生格式化代码的一个例子;对一些空运或人工运输整机三个不同品种的所有订单的总折扣支出。整机的抉择思考特定品牌、包装和尺寸范畴。
select sum(l_extendedprice * (1 - l_discount) ) as revenuefrom lineitem, partwhere ( p_partkey = l_partkey and p_brand = ‘[BRAND1]’ /*特定品牌。BRAND1、BRAND2、BRAND3=‘Brand#MN’,M和N是两个字母,代表两个数值,互相独立,取值在1到5之间 */ and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’) //包装范畴 and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10 /* QUANTITY1 是1到10之间的任意取值 */ and p_size between 1 and 5 //尺寸范畴 and l_shipmode in (‘AIR’, ‘AIR REG’) //运输模式,如下带有暗影的粗体示意的条件是雷同的,存在条件化简的可能 and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkey and p_brand = ‘[BRAND2]’ and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’) and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10 /* QUANTITY2 是10到20之间的任意取值 */ and p_size between 1 and 10 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ ) or ( p_partkey = l_partkey and p_brand = ‘[BRAND3]’ and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’) and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10 /* QUANTITY3 是20到30之间的任意取值 */ and p_size between 1 and 15 and l_shipmode in (‘AIR’, ‘AIR REG’) and l_shipinstruct = ‘DELIVER IN PERSON’ );
Q20:潜在零部件促销查问
Q20语句查问确定在某一年内,找出指定国家的能对某一整机商品提供更有竞争力价格的供货货。所谓更有竞争力的供货商,是指那些整机有过剩的供货商,超过供或商在某一年中货运给定国的某一整机的50%则为过剩。
潜在零部件促销查问是带有排序、汇集、IN子查问、一般子查问操作并存的两表连贯操作,辨认特定国家的供应商,他们抉择的零部件可能是促销报价的候选者;可辨认提供给定整机过多的供应商,过多被定义为超过50%的零部件。
select s_name, s_addressfrom supplier, nationwhere s_suppkey in ( //第一层的IN子查问 select ps_suppkey from partsupp where ps_partkey in ( //第二层嵌套的IN子查问 select p_partkey from part where p_name like '[COLOR]%' //COLOR为产生P_NAME的值的列表中的任意值 ) and ps_availqty > (//第二层嵌套的子查问 select 0.5 * sum(l_quantity) //汇集子查问 from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('[DATE]') // DATE为在1993年至1997年的任一年的一月一号 and l_shipdate < date('[DATE]') + interval ‘1’ year //1年内 ) ) and s_nationkey = n_nationkey and n_name = '[NATION]' //TPC-H规范定义的任意值order by s_name;
Q21:供应商留单期待查问
供应商留单期待查问是带有分组、排序、汇集、EXISTS子查问、NOT EXISTS子查问操作并存的四表连贯操作,标识无奈及时发货的供应商。
select s_name, count(*) as numwaitfrom supplier, lineitem l1, orders, nationwhere s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( //EXISTS子查问 select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( //NOT EXISTS子查问 select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = '[NATION]' //TPC-H规范定义的任意值group by s_nameorder by numwait desc, s_name;
Q22:寰球销售机会查问
寰球销售机会查问是带有分组、排序、汇集、EXISTS子查问、NOT EXISTS子查问操作并存的四表连贯操作,确定有可能进行购买的客户所在的地理位置,计算特定国家代码范畴内有多少客户7年未下订单,但他们的“正”账户余额高于平均水平。
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbalfrom ( //第一层子查问 select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where // I1…I7是在TPC-H中定义国家代码的可能值中不反复的任意值 substring(c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]') and c_acctbal > (//第二层汇集子查问 select avg(c_acctbal) from customer where c_acctbal > 0.00 and substr (c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]') ) and not exists (//第二层NOT EXISTS子查问 select * from orders where o_custkey = c_custkey ) ) as custsalegroup by cntrycodeorder by cntrycode;
参考资料
- http://tpc.org/tpch/default5.asp
- https://developer.aliyun.com/...
关注公众号大数据研学社:BigDataRLC,第一工夫看最新文章。