ClickHouse 性能测试
为了验证ClickHouse性能,将结合实际业务场景对clickhouse进行多维度测试。
造测试数据
在理论业务中最常见的业务场景,有二张表,订单主表和订单明细表
通常二张表会join
查问,或者group by
查问,上面就会应用clickhouse
对这种状况进行测试
定义表构造
test_order
: 主表
表构造:
CREATE TABLE `test_order` ( `id` bigint(11) NOT NULL AUTO_INCREMENT, `field_name_1` varchar(60) NOT NULL, `field_name_2` varchar(60) NOT NULL, `field_name_3` varchar(60) NOT NULL, `field_name_4` varchar(60) NOT NULL, `field_name_5` varchar(60) NOT NULL, `field_name_6` varchar(60) NOT NULL, `field_name_7` varchar(60) NOT NULL, `field_name_8` varchar(60) NOT NULL, `field_name_9` varchar(60) NOT NULL, `field_name_10` varchar(60) NOT NULL, `field_id_1` int(11) NOT NULL, `field_id_2` int(11) NOT NULL, `field_id_3` int(11) NOT NULL, `field_id_4` int(11) NOT NULL, `field_id_5` int(11) NOT NULL, `field_id_6` int(11) NOT NULL, `field_id_7` int(11) NOT NULL, `field_id_8` int(11) NOT NULL, `field_id_9` int(11) NOT NULL, `field_id_10` int(11) NOT NULL, `field_date_1` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_2` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_3` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_4` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_5` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_6` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_7` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_8` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_9` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `idx_field_1` (`field_name_1`,`field_id_1`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=1043 DEFAULT CHARSET=utf8mb4;
test_order_detail
: 明细表,为了减少sql
查问简单的,定义了41个字段
表构造
CREATE TABLE `test_order_detail` ( `id` bigint(11) NOT NULL AUTO_INCREMENT, `order_id` bigint(11) NOT NULL, `field_name_1` varchar(60) NOT NULL, `field_name_2` varchar(60) NOT NULL, `field_name_3` varchar(60) NOT NULL, `field_name_4` varchar(60) NOT NULL, `field_name_5` varchar(60) NOT NULL, `field_name_6` varchar(60) NOT NULL, `field_name_7` varchar(60) NOT NULL, `field_name_8` varchar(60) NOT NULL, `field_name_9` varchar(60) NOT NULL, `field_name_10` varchar(60) NOT NULL, `field_name_11` varchar(60) NOT NULL, `field_name_12` varchar(60) NOT NULL, `field_name_13` varchar(60) NOT NULL, `field_name_14` varchar(60) NOT NULL, `field_name_15` varchar(60) NOT NULL, `field_name_16` varchar(60) NOT NULL, `field_name_17` varchar(60) NOT NULL, `field_name_18` varchar(60) NOT NULL, `field_name_19` varchar(60) NOT NULL, `field_name_20` varchar(60) NOT NULL, `field_id_1` int(11) NOT NULL, `field_id_2` int(11) NOT NULL, `field_id_3` int(11) NOT NULL, `field_id_4` int(11) NOT NULL, `field_id_5` int(11) NOT NULL, `field_id_6` int(11) NOT NULL, `field_id_7` int(11) NOT NULL, `field_id_8` int(11) NOT NULL, `field_id_9` int(11) NOT NULL, `field_id_10` int(11) NOT NULL, `field_date_1` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_2` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_3` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_4` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_5` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_6` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_7` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_8` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, `field_date_9` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `idx_order_id` (`order_id`) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=18129081 DEFAULT CHARSET=utf8mb4;
写入测试数据到mysql
test_order
是主表,插入1024
行数据
test_order_detail
表是重头戏,这里分批次写入1800
万行数据,每列数据均应用随机函数生成,代码比较简单,就不展现了
到mysql
数据存储目录,.ibd
文件是test_order_detail
表的数据和索引文件内容,曾经达到了13G
,数据量很大了
-rw-r-----@ 1 jiao staff 14K 8 15 12:46 test_order_detail.frm-rw-r-----@ 1 jiao staff 13G 8 16 20:30 test_order_detail.ibd
从mysql查问数据写到.csv
利用clickhouse
能够间接读取csv
文件插入到表中个性
这里从mysql
中每次读10万
数据写入一个csv
文件
生成了180
多个.csv
文件
➜ csv lltotal 29852872-rw-r--r-- 1 jiao staff 71M 8 21 18:10 1.csv-rw-r--r-- 1 jiao staff 74M 8 21 18:10 10.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:15 100.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:15 101.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:15 102.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:15 103.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:15 104.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 105.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 106.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 107.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 108.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 109.csv-rw-r--r-- 1 jiao staff 75M 8 21 18:10 11.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 110.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 111.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 112.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 113.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 114.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 115.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 116.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 117.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:16 118.csv-rw-r--r-- 1 jiao staff 78M 8 21 18:17 119.csv
应用php将csv文件插入到clickhouse
装置php
语言clickhouse
第三方包:https://github.com/smi2/phpClickHouse
该第三方包应用的是http
协定
先在clickhouse
中创立表
CREATE TABLE test.test_order_detail( `id` Int64, `order_id` Int64, `field_name_1` String, `field_name_2` String, `field_name_3` String, `field_name_4` String, `field_name_5` String, `field_name_6` String, `field_name_7` String, `field_name_8` String, `field_name_9` String, `field_name_10` String, `field_name_11` String, `field_name_12` String, `field_name_13` String, `field_name_14` String, `field_name_15` String, `field_name_16` String, `field_name_17` String, `field_name_18` String, `field_name_19` String, `field_name_20` String, `field_id_1` Int64, `field_id_2` Int64, `field_id_3` Int64, `field_id_4` Int64, `field_id_5` Int64, `field_id_6` Int64, `field_id_7` Int64, `field_id_8` Int64, `field_id_9` Int64, `field_id_10` Int64, `field_date_1` DateTime, `field_date_2` DateTime, `field_date_3` DateTime, `field_date_4` DateTime, `field_date_5` DateTime, `field_date_6` DateTime, `field_date_7` DateTime, `field_date_8` DateTime, `field_date_9` DateTime)ENGINE = MergeTreeORDER BY idSETTINGS index_granularity = 8192
执行脚本php
脚本,代码比较简单,局部代码如下
$begin = microtime(true); $config = [ 'host' => '172.16.101.134', 'port' => '8123', 'username' => 'caps', 'password' => '123456' ]; $db = new Client($config); $db->database('test'); $db->setTimeout(60); // 10 seconds $db->setConnectTimeOut(50); // 5 seconds// $tables = $db->showTables(); //insert from csv $connect = microtime(true); for ($j = 1; $j <= 1; $j++) { $file_data_names = []; for ($i = 1; $i <= 1; $i++) { $file_data_names[] = __DIR__ . DIRECTORY_SEPARATOR . 'csv' . DIRECTORY_SEPARATOR . ($j) . '.csv'; } $db->insertBatchFiles('test_order_detail_tmp', $file_data_names); usleep(1000); } echo microtime(true) - $begin . PHP_EOL; echo microtime(true) - $connect . PHP_EOL;
插入数据性能测试
表没有定义分区,每行数据随机生成,一共有42列,每行数据量0.8k左右
批量插入行数 | 耗时 | 数据量 |
---|---|---|
1千 | 0.05s | 0.7M |
1万 | 0.25s | 7.1M |
5万 | 1.0s | 36M |
10万 | 2.0s | 73M |
20万 | 3.6s | 146M |
在不同机器上测试后果可能出入很大,从本机器测试后果来看,每次插入数据适宜1k - 5w,能够保障1秒之内就能胜利。
插入数据可能会呈现的谬误
1.若设置了分区键,而插入的数据会导致分区太多,则插入失败,默认最大100个分区
2.插入数据太多导致的内存溢出
数据压缩比
1800万数据量
Mysql占用存储空间:13G
ClickHouse中占用:4.1G
因为所有字段都是随机生成,3倍多数据压缩比曾经很高了,且lz4压缩算法的解压效率也十分高
查问性能测试
test_order_detail
表1800
万数据test_order
表1000
行数据
上面对业务中比拟罕用的sql
进行测试
Test1
select count(*) from test.test_order_detail
统计总条数,十分常见的sql
了吧,ClickHouse
在count.txt
文件中保留了总条数,所以返回的确很快
Mysql耗时 | ClickHouse耗时 |
---|---|
20s | 0.003s |
clieckhouse 查问后果
1 rows in set. Elapsed: 0.003 sec.
Test2
select a.order_id,sum(a.field_id_1),sum(a.field_id_2) from test.test_order_detail as a join test.test_order as b on a.order_id = b.id group by a.order_id;
join表聚合数据 这个级别的数据mysql曾经扛不住了
Mysql耗时 | ClickHouse耗时 |
---|---|
-- | 0.450s |
clieckhouse 查问后果,因为没有应用所有,扫描了全表,总共解决1800万行数据,没秒竟然能够解决4000万行数据,效率十分高
1042 rows in set. Elapsed: 0.450 sec. Processed 18.13 million rows, 435.11 MB (40.28 million rows/s., 966.66 MB/s.)
Test3
select a.order_id,sum(a.field_id_1),sum(a.field_id_2) from test.test_order_detail as a join test.test_order as b on a.order_id = b.id group by a.order_id limit 1,20;
加个limit试试 等了很久mysql仍然没有返回后果
Mysql耗时 | ClickHouse耗时 |
---|---|
-- | 0.574s |
clieckhouse 查问后果
20 rows in set. Elapsed: 0.574 sec. Processed 18.13 million rows, 435.11 MB (31.60 million rows/s., 758.37 MB/s.)
Test4
select count(*) from test.test_order_detail
单表聚合数据 等了很久mysql仍然没有返回后果
Mysql耗时 | ClickHouse耗时 |
---|---|
-- | 0.212 |
clieckhouse 查问后果)
20 rows in set. Elapsed: 0.212 sec. Processed 18.13 million rows, 435.10 MB (85.63 million rows/s., 2.06 GB/s.)
总结
在数据量比拟少的状况,且sql比较简单的场景下,mysql还是十分不便的,但在大数据场景下,mysql就顾此失彼了,通过本文的以下简略测试,就是发现clickhouse非常适合大数据场景下的数据查问,利用列式存储
, 数据压缩
个性,能够高效率解决数据,另外SummingMergeTree
、AggregatingMergeTree
更高效率的进行数据预聚合,有工夫会进一步分享更多内容。