1. 简介

在数据库SQL处理中,常常有行转列(Pivot)和列转行(Unpivot)的数据处理需求。本文以示例说明在Data Lake Analytics(https://www.aliyun.com/product/datalakeanalytics)中,如何使用SQL的一些技巧,达到行转列(Pivot)和列转行(Unpivot)的目的。另外,DLA支持函数式表达式的处理逻辑、丰富的JSON数据处理函数和UNNEST的SQL语法,结合这些功能,能够实现非常丰富、强大的SQL数据处理语义和能力,本文也以JSON数据列展开为示例,说明在DLA中使用这种SQL的技巧。

2. 行转列(Pivot)

2.1 样例数据

test_pivot表内容:

+------+----------+---------+--------+| id   | username | subject | source |+------+----------+---------+--------+| 1    | 张三     | 语文    | 60     || 2    | 李四     | 数学    | 70     || 3    | 王五     | 英语    | 80     || 4    | 王五     | 数学    | 75     || 5    | 王五     | 语文    | 57     || 6    | 李四     | 语文    | 80     || 7    | 张三     | 英语    | 100    |+------+----------+---------+--------+

2.2 方法一:通过CASE WHEN语句

SQL语句:

SELECT    username,   max(CASE WHEN subject = '语文' THEN source END) AS `语文`,   max(CASE WHEN subject = '数学' THEN source END) AS `数学`,   max(CASE WHEN subject = '英语' THEN source END) AS `英语`FROM test_pivotGROUP BY usernameORDER BY username;

结果:

+----------+--------+--------+--------+| username | 语文   | 数学   | 英语   |+----------+--------+--------+--------+| 张三     | 60     | NULL   | 100    || 李四     | 80     | 70     | NULL   || 王五     | 57     | 75     | 80     |+----------+--------+--------+--------+

2.3 方法二:通过map_agg函数

该方法思路上分为两个步骤:
第一步,通过map_agg函数把两个列的多行的值,映射为map;
第二步,通过map的输出,达到多列输出的目的。

第一步SQL:

SELECT username, map_agg(subject, source) kvFROM test_pivotGROUP BY usernameORDER BY username;

第一步输出:

+----------+-----------------------------------+| username | kv                                |+----------+-----------------------------------+| 张三     | {语文=60, 英语=100}               || 李四     | {数学=70, 语文=80}                || 王五     | {数学=75, 语文=57, 英语=80}       |+----------+-----------------------------------+

可以看到map_agg的输出效果。

最终,该方法的SQL:

SELECT  username,  if(element_at(kv, '语文') = null, null, kv['语文']) AS `语文`,  if(element_at(kv, '数学') = null, null, kv['数学']) AS `数学`,  if(element_at(kv, '英语') = null, null, kv['英语']) AS `英语`FROM (  SELECT username, map_agg(subject, source) kv  FROM test_pivot  GROUP BY username) tORDER BY username;

结果:

+----------+--------+--------+--------+| username | 语文   | 数学   | 英语   |+----------+--------+--------+--------+| 张三     | 60     | NULL   | 100    || 李四     | 80     | 70     | NULL   || 王五     | 57     | 75     | 80     |+----------+--------+--------+--------+

3. 列转行(Unpivot)

3.1 样例数据

test_unpivot表内容:

+----------+--------+--------+--------+| username | 语文   | 数学   | 英语   |+----------+--------+--------+--------+| 张三     | 60     | NULL   | 100    || 李四     | 80     | 70     | NULL   || 王五     | 57     | 75     | 80     |+----------+--------+--------+--------+

3.2 方法一:通过UNION语句

SQL语句:

SELECT username, subject, sourceFROM (  SELECT username, '语文' AS subject, `语文` AS source FROM test_unpivot WHERE `语文` is not null  UNION  SELECT username, '数学' AS subject, `数学` AS source FROM test_unpivot WHERE `数学` is not null  UNION  SELECT username, '英语' AS subject, `英语` AS source FROM test_unpivot WHERE `英语` is not null)ORDER BY username;

结果:

+----------+---------+--------+| username | subject | source |+----------+---------+--------+| 张三     | 语文    | 60     || 张三     | 英语    | 100    || 李四     | 语文    | 80     || 李四     | 数学    | 70     || 王五     | 英语    | 80     || 王五     | 语文    | 57     || 王五     | 数学    | 75     |+----------+---------+--------+

3.3 方法二:通过CROSS JOIN UNNEST语句

SQL语句:

SELECT t1.username, t2.subject, t2.sourceFROM test_unpivot t1CROSS JOIN UNNEST (  array['语文', '数学', '英语'],  array[`语文`, `数学`, `英语`]) t2 (subject, source)WHERE t2.source is not null

结果:

+----------+---------+--------+| username | subject | source |+----------+---------+--------+| 张三     | 语文    | 60     || 张三     | 英语    | 100    || 李四     | 语文    | 80     || 李四     | 数学    | 70     || 王五     | 语文    | 57     || 王五     | 数学    | 75     || 王五     | 英语    | 80     |+----------+---------+--------+

4. JSON数据列展开

JSON数据的表达能力非常灵活,因此在数据库和SQL中,常常需要处理JSON数据,常常碰到稍复杂的需求,就是将JSON数据中的某些属性字段,进行展开转换,转成行、列的关系型表达。

4.1 基本思路和步骤

  • 使用JSON函数,对JSON字符串进行解析和数据提取;
  • 提取、转换为ARRAY或者MAP的数据结构,如有需要,可以使用Lambda函数式表达式进行转换处理;
  • 利用UNNEST语法进行列展开。

下面以多个示例说明。

4.2 用UNNEST对MAP进行关系型展开

SQL示例:

SELECT t.m, t.nFROM (  SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data)CROSS JOIN unnest(map_data) AS t(m, n);

结果:

+------+------+| m    | n    |+------+------+| foo  |    1 || bar  |    2 |+------+------+

4.3 用UNNEST对JSON数据进行关系型展开

SQL示例:

SELECT json_extract(t.a, '$.a') AS a,        json_extract(t.a, '$.b') AS bFROM (    SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')            AS array<JSON>) AS package_array)CROSS JOIN UNNEST(package_array) AS t(a);

结果:

+------+------+| a    | b    |+------+------+| 1    | 2    || 3    | 4    |+------+------+

SQL示例:

SELECT t.m AS _col1, t.n AS _col2FROM (    SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')            AS array<JSON>) AS array_1,            cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x')            AS array<JSON>) AS array_2)CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

结果:

+---------------+-----------------+| _col1         | _col2           |+---------------+-----------------+| {"a":1,"b":2} | {"a":5,"b":6}   || {"a":3,"b":4} | {"a":7,"b":8}   || NULL          | {"a":9,"b":10}  || NULL          | {"a":11,"b":12} |+---------------+-----------------+

SQL示例:

SELECT json_extract(t.m, '$.a') AS _col1,        json_extract(t.m, '$.b') AS _col2,        json_extract(t.n, '$.a') AS _col3,        json_extract(t.n, '$.b') AS _col4 FROM (    SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')            AS array<JSON>) AS array_1,            cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x')            AS array<JSON>) AS array_2)CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

结果:

+-------+-------+-------+-------+| _col1 | _col2 | _col3 | _col4 |+-------+-------+-------+-------+| 1     | 2     | 5     | 6     || 3     | 4     | 7     | 8     || NULL  | NULL  | 9     | 10    || NULL  | NULL  | 11    | 12    |+-------+-------+-------+-------+

4.4 结合Lambda表达式,用UNNEST对JSON数据进行关系型展开

SQL示例:

SELECT count(*) AS cnt,        package_name FROM (     SELECT t.a AS package_name     FROM (         SELECT transform(packages_map_array, x -> Element_at(x, 'packageName'))                AS package_array         FROM (            SELECT cast(Json_extract(data_json, '$.packages')                    AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array            FROM (                SELECT json_parse(data) AS data_json                FROM (                     SELECT '{                              "packages": [                                {                                  "appName": "铁路12306",                                  "packageName": "com.MobileTicket",                                  "versionName": "4.1.9",                                  "versionCode": "194"                                },                                {                                  "appName": "QQ飞车",                                  "packageName": "com.tencent.tmgp.speedmobile",                                  "versionName": "1.11.0.13274",                                  "versionCode": "1110013274"                                },                                {                                  "appName": "掌阅",                                  "packageName": "com.chaozh.iReaderFree",                                  "versionName": "7.11.0",                                  "versionCode": "71101"                                }                             ]                           }'                    AS data                 )            )        )     ) AS x (package_array)    CROSS JOIN UNNEST(package_array) AS t (a))GROUP BY package_name ORDER BY cnt DESC;

结果:

+------+------------------------------+| cnt  | package_name                 |+------+------------------------------+|    1 | com.MobileTicket             ||    1 | com.tencent.tmgp.speedmobile ||    1 | com.chaozh.iReaderFree       |+------+------------------------------+



本文作者:julian.zhou

阅读原文

本文为云栖社区原创内容,未经允许不得转载。