作者:杨涛涛
资深数据库专家,专研 MySQL 十余年。善于 MySQL、PostgreSQL、MongoDB 等开源数据库相干的备份复原、SQL 调优、监控运维、高可用架构设计等。目前任职于爱可生,为各大运营商及银行金融企业提供 MySQL 相干技术支持、MySQL 相干课程培训等工作。
本文起源:原创投稿
* 爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。
MySQL 数据库能够用来解决绝大多数在线业务场景,而且能解决得很好,无论从单节点性能,或者说从多机扩大后的总体吞吐量来讲,都很占优势。不过,万事无相对,MySQL 在某些场景下性能并不能达到预期。例如在各种繁冗的关系解决方面,MySQL 解决起来就有些吃力。此类场景下,NoSQL 就比关系型数据库要更加适合。本篇咱们用图数据库 Neo4J(NoSQL 的一种)来举例说明在这种场景下如何补救 MySQL 的有余。
这里先用简略的人物关系来举例说明。
对于简略的人物关系,比如说小宋、小李、小杨、小爱、小徐、小娜、小乔几个人之间的关系链条可能如下:
- 小宋“意识”小李。
- 小李”意识“小杨。
- 小杨“意识“小爱。
- 小爱”意识”小徐。
- 小徐“意识“小娜。
- 小娜”意识”小乔。
其中“意识”即为几个人之间的关系。这样的关系有很多种,比方“意识”、“见过”、”好友“、”共事“,”暗恋“、”恋人“等等。本篇咱们先来看根本的关系:”意识“。也就是说关系仅仅“意识”而已,别无其他。
假如有如下三个需要:
-
找出用户总数目。
-
找出“意识”小杨 并且 小杨 又“意识”的人。
-
找出小杨“意识”的“意识”的“意识”的“意识”的人。
对于这样几个需要,咱们先基于 MySQL 来设计两张表:(如果仅仅实现最初两个需要,只须要表 2 即可。)
表 1:用户表,存储用户记录;表 2:用户关系表,存储用户之间的关系。为了简化起见,两张表无主键,由 MySQL 外部调配。
mysql:ytt>create table user1 (name varchar(64));
Query OK, 0 rows affected (0.09 sec)
mysql:ytt>insert user1 values ("小宋"),("小李"),("小杨"),("小爱"),("小徐"),("小娜"),("小乔");
Query OK, 7 rows affected (0.01 sec)
Records: 7 Duplicates: 0 Warnings: 0
mysql:ytt>create table relation1 (user_name varchar(64),friend_name varchar(64));
Query OK, 0 rows affected (0.07 sec)
mysql:ytt>insert relation1 values ("小宋","小李"),("小李","小杨"),("小杨","小爱"),("小爱","小徐"),("小徐","小娜"),("小娜","小乔");
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
那咱们来实现以上三个需要:
- 找出用户总数目:非常简单,间接求 count(*) 即可。
mysql:ytt>select count(*) as total_number from user1;
+--------------+
| total_number |
+--------------+
| 7 |
+--------------+
1 row in set (0.00 sec)
- 找出“意识”小杨 并且 小杨 又“意识”的人:因为记录条数少,间接全表过滤。
mysql:ytt>select * from (select (case when friend_name = "小杨" then user_name when user_name = "小杨" then friend_name end) as user_name from relation1) as sub_table where user_name is not null;
+-----------+
| user_name |
+-----------+
| 小李 |
| 小爱 |
+-----------+
2 rows in set (0.00 sec)
- 找出小杨“意识”的“意识”的“意识”的“意识”的人:也就是找到以小杨为终点的四层关系网的最终用户名。
mysql:ytt>select d.friend_name from relation1 b
-> inner join relation1 a on b.user_name = "小杨" and b.friend_name = a.user_name
-> inner join relation1 c on a.friend_name = c.user_name
-> inner join relation1 d on c.friend_name = d.user_name;
+-------------+
| friend_name |
+-------------+
| 小乔 |
+-------------+
1 row in set (0.00 sec)
以上三个需要,特地是第三个。找出以小杨为终点的关系网,层数越多,须要 join 的表个数越多。在 MySQL 里,表关联数量越多,性能也就越差,前期我将会在“SQL 优化”主题里持续延长探讨这个问题。
咱们用图数据库 Neo4J 来解决同样的需要。
创立用户节点以及用户之间的关系,
neo4j@ytt> create (x1:user {name:"小宋"}),
(x2:user {name:"小李"}),
(x3:user {name:"小杨"}),
(x4:user {name:"小爱"}),
(x5:user {name:"小徐"}),
(x6:user {name:"小娜"}),
(x7:user {name:"小乔"})
with x1,x2,x3,x4,x5,x6,x7
create (x1)-[: 意识]->(x2),
(x2)-[: 意识]->(x3),
(x3)-[: 意识]->(x4),
(x4)-[: 意识]->(x5),
(x5)-[: 意识]->(x6),
(x6)-[: 意识]->(x7);
0 rows
ready to start consuming query after 269 ms, results consumed after another 0 ms
Added 7 nodes, Created 6 relationships, Set 7 properties, Added 7 labels
对应的关系图展现如下:
以上需要在 Neo4J 里的实现如下:
- 需要一:
neo4j@ytt> match(x:user) return count(*) as total_number;
+--------------+
| total_number |
+--------------+
| 7 |
+--------------+
1 row
ready to start consuming query after 21 ms, results consumed after another 1 ms
- 需要二:
neo4j@ytt> match (y1:user)-[: 意识]->(x:user {name:"小杨"})-[: 意识]->(y2:user) return y1.name,y2.name;
+-------------------+
| y1.name | y2.name |
+-------------------+
| "小李" | "小爱" |
+-------------------+
1 row
ready to start consuming query after 95 ms, results consumed after another 2 ms
- 需要三:
neo4j@ytt> match(x:user {name:"小杨"})-[: 意识 *4]->(y) return y.name;
+--------+
| y.name |
+--------+
| "小乔" |
+--------+
1 row
ready to start consuming query after 398 ms, results consumed after another 174 ms
单从三个需要的查问后果上看,MySQL 和 Neo4J 性能差不多,区别只是写法上有些差别。然而如果把数据量放大,特地是对需要三的解决,MySQL 就有点吃力了。
咱们来把数据放大数倍,用户表记录数减少到千条,关系表记录数减少到十万条。
别离给用户表和关系表造点数据:(user1.csv 里蕴含 1100 个 用户,relation1.csv 里蕴含 1W 条记录,每个用户大略“意识”100 集体,并且给关系表加上索引。)
mysql:ytt>truncate table user1;
Query OK, 0 rows affected (0.19 sec)
mysql:ytt>load data infile '/var/lib/mysql-files/user1.csv' into table user1 fields terminated by ',' enclosed by '"';
Query OK, 1100 rows affected (0.10 sec)
Records: 1100 Deleted: 0 Skipped: 0 Warnings: 0
mysql:ytt>truncate table relation1;
Query OK, 0 rows affected (0.11 sec)
mysql:ytt>load data infile '/var/lib/mysql-files/relation1.csv' into table relation1 fields terminated by ',' enclosed by '"';
Query OK, 100000 rows affected (1.60 sec)
Records: 100000 Deleted: 0 Skipped: 0 Warnings: 0
mysql:ytt>alter table relation1 add key idx_user_name (user_name), add key idx_friend_name(friend_name);
Query OK, 0 rows affected (4.09 sec)
Records: 0 Duplicates: 0 Warnings: 0
表局部数据如下:
mysql:ytt>table user1 limit 2;
+-------+
| name |
+-------+
| user1 |
| user2 |
+-------+
2 rows in set (0.00 sec)
mysql:ytt>table relation1 limit 2;
+-----------+-------------+
| user_name | friend_name |
+-----------+-------------+
| user1 | user101 |
| user2 | user101 |
+-----------+-------------+
2 rows in set (0.00 sec)
接下来实现第三个需要:
这里把用户“小杨”替换为“user1”,因为后果很多,所以只求个总数即可,破费工夫 4 分多钟,对于用户端来讲,曾经无奈承受。
mysql:ytt>select count(distinct d.friend_name) as cnt from relation1 b
-> inner join relation1 a on b.user_name = "user1" and b.friend_name = a.user_name
-> inner join relation1 c on a.friend_name = c.user_name
-> inner join relation1 d on c.friend_name = d.user_name;
+-----+
| cnt |
+-----+
| 100 |
+-----+
1 row in set (4 min 15.47 sec)
接下来把 MySQL 数据导入到 Neo4J,来持续实现同样的需要:
导入节点:
neo4j@ytt> load csv from "file:///user1.csv" as x create (a:user {name:x[0]});
0 rows
ready to start consuming query after 523 ms, results consumed after another 0 ms
Added 1100 nodes, Set 1100 properties, Added 1100 labels
导入关系:
neo4j@ytt> load csv from "file:///relation1.csv" as x with x match (a:user {name:x[0]}), (b:user {name:x[1]}) merge (a)-[: 意识]-(b);
Created 100000 relationships, completed after 94636 ms.
给节点加上索引:
neo4j@ytt> create index on :user(name);
0 rows
ready to start consuming query after 31 ms, results consumed after another 0 ms
Added 1 indexes
Neo4J 实现第三个需要:执行工夫小于 1 秒,曾经比 MySQL 快了很多倍。
neo4j@ytt> match(x:user {name:"user1"})-[*4]->(y)
return count(distinct y.name) as total_friend;
+--------------+
| total_friend |
+--------------+
| 100 |
+--------------+
1 row
ready to start consuming query after 44 ms,
results consumed after another 692 ms
总结:
本篇基于图数据库在解决人物关系上优于关系型数据库的根底上做了简略介绍,更多关系解决,请持续浏览后续篇章。