作者:杨杰

简介

PostgreSQL FDW是一种内部拜访接口,它能够被用来拜访存储在内部的数据,这些数据能够是内部的PG数据库,也能够mysql、ClickHouse等数据库。

ClickHouse是一款疾速的开源OLAP数据库管理系统,它是面向列的,容许应用SQL查问实时生成剖析报告。

clickhouse_fdw是一个开源的内部数据包装器(FDW)用于拜访ClickHouse列存数据库。

目前有以下两款clickhouse_fdw:
https://github.com/adjust/cli...

始终继续一直的有提交,目前反对PostgreSQL 11-13
https://github.com/Percona-La...

之前有一年工夫没有动静,最近一段时间刚从adjust/clickhouse_fdw merge了一下,目前也反对PostgreSQL 11-13。

本文就以adjust/clickhouse_fdw为例。

装置

# libcurl >= 7.43.0yum install libcurl-devel libuuid-develgit clone https://github.com/adjust/clickhouse_fdw.gitcd clickhouse_fdwmkdir build && cd buildcmake ..make && make install

应用

CH端:
生成测试表及数据,这里咱们应用CH官网提供的Star Schema Benchmark

https://clickhouse.tech/docs/...

模仿数据量:5张数据表,数据次要集中在lineorder*表,单表9000w rows左右、22G存储。

[root@vm101 ansible]# clickhouse clientClickHouse client version 20.8.9.6.Connecting to localhost:9000 as user default.Connected to ClickHouse server version 20.8.9 revision 54438.vm101 :) show tables;SHOW TABLES┌─name───────────┐│ customer ││ lineorder ││ lineorder_flat ││ part ││ supplier │└────────────────┘5 rows in set. Elapsed: 0.004 sec. vm101 :) select count(*) from lineorder_flat;SELECT count(*)FROM lineorder_flat┌──count()─┐│ 89987373 │└──────────┘1 rows in set. Elapsed: 0.005 sec. [root@vm101 ansible]# du -sh /clickhouse/data/default/lineorder_flat/22G /clickhouse/data/default/lineorder_flat/

PG端:
创立FDW插件

postgres=# create extension clickhouse_fdw ;CREATE EXTENSIONpostgres=# \dewList of foreign-data wrappersName | Owner | Handler | Validator ----------------+----------+--------------------------+----------------------------clickhouse_fdw | postgres | clickhousedb_fdw_handler | clickhousedb_fdw_validator(1 row)

创立CH内部服务器

postgres=# CREATE SERVER clickhouse_svr FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(host '10.0.0.101', port '9000', dbname 'default', driver 'binary');CREATE SERVERpostgres=# \desList of foreign serversName | Owner | Foreign-data wrapper ----------------+----------+----------------------clickhouse_svr | postgres | clickhouse_fdw(1 row)

创立用户映射

postgres=# CREATE USER MAPPING FOR CURRENT_USER SERVER clickhouse_svr OPTIONS (user 'default', password '');CREATE USER MAPPINGpostgres=# \deuList of user mappingsServer | User name ----------------+-----------clickhouse_svr | postgres(1 row)

创立内部表

postgres=# IMPORT FOREIGN SCHEMA "default" FROM SERVER clickhouse_svr INTO public;IMPORT FOREIGN SCHEMApostgres=# \detList of foreign tablesSchema | Table | Server --------+----------------+----------------public | customer | clickhouse_svrpublic | lineorder | clickhouse_svrpublic | lineorder_flat | clickhouse_svrpublic | part | clickhouse_svrpublic | supplier | clickhouse_svr(5 rows)

查问

postgres=# select count(*) from lineorder_flat ;count ----------89987373(1 row)postgres=# select "LO_ORDERKEY","C_NAME" from lineorder_flat limit 5;LO_ORDERKEY | C_NAME -------------+--------------------3271 | Customer#0000991733271 | Customer#0000991733271 | Customer#0000991733271 | Customer#0000991735607 | Customer#000273061(5 rows)

须要留神的是CH是辨别大小写的以及一些函数兼容问题,下面的示例也有展现。

测试SQL间接应用CH SSB提供的13条SQL,SQL根本相似,选一条做下测试,运行工夫根本是统一的。

CH:

vm101 :) SELECT:-] toYear(LO_ORDERDATE) AS year,:-] C_NATION,:-] sum(LO_REVENUE - LO_SUPPLYCOST) AS profit:-] FROM lineorder_flat:-] WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2'):-] GROUP BY:-] year,:-] C_NATION:-] ORDER BY:-] year ASC,:-] C_NATION ASC;SELECT toYear(LO_ORDERDATE) AS year,C_NATION,sum(LO_REVENUE - LO_SUPPLYCOST) AS profitFROM lineorder_flatWHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))GROUP BY year,C_NATIONORDER BY year ASC,C_NATION ASC┌─year─┬─C_NATION──────┬───────profit─┐│ 1992 │ ARGENTINA │ 157402521853 │...│ 1998 │ UNITED STATES │ 89854580268 │└──────┴───────────────┴──────────────┘35 rows in set. Elapsed: 0.195 sec. Processed 89.99 million rows, 1.26 GB (460.70 million rows/s., 6.46 GB/s.)PG:postgres=# SELECTdate_part('year', "LO_ORDERDATE") AS year,"C_NATION",sum("LO_REVENUE" - "LO_SUPPLYCOST") AS profitFROM lineorder_flatWHERE "C_REGION" = 'AMERICA' AND "S_REGION" = 'AMERICA' AND ("P_MFGR" = 'MFGR#1' OR "P_MFGR" = 'MFGR#2')GROUP BYyear,"C_NATION"ORDER BYyear ASC,"C_NATION" ASC;year | C_NATION | profit ------+---------------+--------------1992 | ARGENTINA | 157402521853...1998 | UNITED STATES | 89854580268(35 rows)Time: 195.102 ms

相干

https://github.com/adjust/cli...
https://github.com/Percona-La...
https://github.com/ClickHouse...
https://clickhouse.tech/docs/...

理解更多PostgreSQL技术干货、热点文集、行业动态、新闻资讯、精彩流动,请拜访中国PostgreSQL社区网站:www.postgresqlchina.com