关于图数据库:用-NetworkX-Gephi-Nebula-Graph-分析权力的游戏人物关系下篇

23次阅读

共计 8427 个字符,预计需要花费 22 分钟才能阅读完成。

在上一篇 [1] 中,咱们通过 NetworkX 和 Gephi 展现了 < 势力的游戏 > 中的人物关系。在本篇中,咱们将展现如何通过 NetworkX 拜访图数据库 Nebula Graph。

NetworkX

NetworkX [2] 是一个用 Python 语言开发的图论与简单网络建模工具,内置了大量罕用的图与简单网络分析算法,能够不便地进行简单网络数据分析、仿真建模等工作,功能丰富,简略易用。

在 NetworkX 中,图是由顶点、边和可选的属性形成的数据结构。顶点示意数据,边是由两个顶点惟一确定的,示意两个顶点之间的关系。顶点和边也能够领有更多的属性,以存储更多的信息。

NetworkX 反对 4 种类型的图:

  • Graph:无向图
  • DiGraph: 有向图
  • MultiGraph: 多重无向图
  • MultiDiGraph: 多重有向图

在 NetworkX 中创立一个无向图:

import networkx as nx
G = nx.Graph()

增加顶点:

G.add_node(1)
G.add_nodes_from([2,3,4])
G.add_node(2,name='Tom',age=23)

增加边:

G.add_edge(2,3)
G.add_edges_from([(1,2),(1,3)])
g.add_edge(1, 2, start_year=1996, end_year=2019)

在上一篇文章(一)中,咱们曾经演示了 NetworkX 的 Girvan-Newman 社区发现算法。

图数据库 Nebula Graph

NetworkX 通常应用本地文件作为数据源,这在动态网络钻研的时候没什么问题,但如果图网络常常会发生变化——例如某些核心节点曾经不存在 (Fig.1) 或者引入了重要的网络拓扑变动 (Fig.2)——每次生成全新的动态文件再加载剖析就有些麻烦,最好整个变动过程能够长久化在一个数据库中,并且能够实时地间接从数据库中加载子图或者全图做剖析。本文选用 Nebula Graph [3] 作为存储图数据的图数据库。


Fig. 1


Fig. 2

Nebula Graph 提供了两种形式来获取图构造:

  1. 编写一个查问语句,拉取一个子图;
  2. 全量扫描底层存储,获取一个残缺的全图。

第一种形式适宜在一个大规模的图网络中通过精密的过滤和剪枝条件来获取合乎需要的若干个点和边。第二种形式更适宜于全图的剖析,这通常是在项目前期对全图进行一些启发式摸索,当有进一步认知后再用第一种形式做精密的剪枝剖析。

剖析完 Nebula Graph 两种获取图构造形式后,上面来查看 Nebula Graph 的 Python 客户端代码,nebula-python/nebula/ngStorage/StorageClient.py 与 nebula-python/nebula/ngMeta/MetaClient.py 就是和底层存储交互的 API, 外面有扫描点、扫描边、读取一堆属性等等一系列丰盛的接口。

上面两个接口能够用来读取所有的点、边数据:

def scan_vertex(self, space, return_cols, all_cols, limit, start_time, end_time)
def scan_edge(self, space, return_cols, all_cols, limit, start_time, end_time)

1) 初始化一个客户端,和一个 scan_edge_processor。scan_edge_processor 用来对读出来的边数据进行解码:

meta_client = MetaClient([('192.168.8.16', 45500)])
meta_client.connect()
storage_client = StorageClient(meta_client)
scan_edge_processor = ScanEdgeProcessor(meta_client)

2) 初始化 scan_edge 接口的各项参数:

space_name = 'nba' # 要读取的图空间名称
return_cols = {} # 要返回的边(或点)及其属性列
return_cols['serve'] = ['start_year', 'end_year']
return_cols['follow'] = ['degree']
allCols = False # 是否返回所有属性列,当该值为 False 时,仅返回在 returnCols 里指定的属性列,当为 True 时,返回所有属性列
limit = 100 # 最多返回的数据条数
start_time = 0 
end_time = sys.maxsize

3) 调用 scan_part_edge 接口,该接口会返回一个 scan_edge_response 对象的迭代器:

scan_edge_response_iterator = storage_client.scan_edge(space_name, return_cols, all_cols, limit, start_time, end_time)

4) 一直读取该迭代器所指向的 scan_edge_response 对象中的数据,直到读取完所有数据:

while scan_edge_response_iterator.has_next():
    scan_edge_response = scan_edge_response_iterator.next()
    if scan_edge_response is None:
        print("Error occurs while scaning edge")
        break
    process_edge(space, scan_edge_response)

其中,process_edge 是自定义的一个解决读出来边数据的函数,该函数能够先应用 scan_edge_processor 对 scan_edge_response 中的数据进行解码,解码后的数据能够间接打印进去,也能够做一些简略解决,另作他用,比方:将这些数据读入计算框架 NetworkX 里。

5) 解决数据。在这里咱们将读出来的所有边都增加到 NetworkX 中的图 G 里:

def process_edge(space, scan_edge_response):
    result = scan_edge_processor.process(space, scan_edge_response)
    # Get the corresponding rows by edge_name
    for edge_name, edge_rows in result.rows.items():
        for row in edge_rows:
            srcId = row.default_properties[0].get_value()
            dstId = row.default_properties[2].get_value()
            print('%d -> %d' % (srcId, dstId))
            props = {}
            for prop in row.properties:
                prop_name = prop.get_name()
                prop_value = prop.get_value()
                props[prop_name] = prop_value
            G.add_edges_from([(srcId, dstId, props)]) # 增加边到 NetworkX 中的图 G 

读取顶点数据的办法和下面的流程相似。

此外,对于分布式的一些图计算框架 [4] 来说,Nebula Graph 还提供了依据分片 (partition) 并发地批量读取存储的性能,这会在之后的文章中演示。

在 NetworkX 中进行图剖析

当咱们把所有点和边数据都依照上述流程读入 NetworkX 后,咱们还能够做一些根本的图剖析和图计算:

1) 绘制图:

nx.draw(G, with_labels=True, font_weight='bold')
import matplotlib.pyplot as plt
plt.show()
plt.savefig('./test.png')

绘制进去的图:

2) 打印出图中的所有点和边:

print('nodes:', list(G.nodes))
print('edges:', list(G.edges))
输入的后果:
nodes:  [109, 119, 129, 139, 149, 209, 219, 229, 108, 118, 128, 138, 148, 208, 218, 228, 107, 117, 127, 137, 147, 207, 217, 227, 106, 116, 126, 136, 146, 206, 216, 226, 101, 111, 121, 131, 141, 201, 211, 221, 100, 110, 120, 130, 140, 150, 200, 210, 220, 102, 112, 122, 132, 142, 202, 212, 222, 103, 113, 123, 133, 143, 203, 213, 223, 104, 114, 124, 134, 144, 204, 214, 224, 105, 115, 125, 135, 145, 205, 215, 225]
edges:  [(109, 100), (109, 125), (109, 204), (109, 219), (109, 222), (119, 200), (119, 205), (119, 113), (129, 116), (129, 121), (129, 128), (129, 216), (129, 221), (129, 229), (129, 137), (139, 138), (139, 212), (139, 218), (149, 130), (149, 219), (209, 123), (219, 130), (219, 112), (219, 104), (229, 147), (229, 116), (229, 141), (229, 144), (108, 100), (108, 101), (108, 204), (108, 206), (108, 214), (108, 215), (108, 222), (118, 120), (118, 131), (118, 205), (118, 113), (128, 116), (128, 121), (128, 201), (128, 202), (128, 205), (128, 223), (138, 115), (138, 204), (138, 210), (138, 212), (138, 221), (138, 225), (148, 127), (148, 136), (148, 137), (148, 214), (148, 223), (148, 227), (148, 213), (208, 127), (208, 103), (208, 104), (208, 124), (218, 127), (218, 110), (218, 103), (218, 104), (218, 114), (218, 105), (228, 146), (228, 145), (107, 100), (107, 204), (107, 217), (107, 224), (117, 200), (117, 136), (117, 142), (127, 114), (127, 212), (127, 213), (127, 214), (127, 222), (127, 226), (127, 227), (137, 136), (137, 213), (137, 150), (147, 136), (147, 214), (147, 223), (207, 121), (207, 140), (207, 122), (207, 134), (217, 126), (217, 141), (217, 124), (217, 144), (106, 204), (106, 212), (106, 113), (116, 141), (116, 126), (116, 210), (116, 216), (116, 121), (116, 113), (116, 105), (126, 216), (136, 210), (136, 213), (136, 214), (146, 202), (146, 210), (146, 215), (146, 222), (146, 226), (206, 123), (216, 144), (216, 105), (226, 140), (226, 112), (226, 114), (226, 144), (101, 100), (101, 102), (101, 125), (101, 204), (101, 215), (101, 113), (101, 104), (111, 200), (111, 204), (111, 215), (111, 220), (121, 202), (121, 215), (121, 113), (121, 134), (131, 205), (131, 220), (141, 124), (141, 205), (141, 225), (201, 145), (211, 124), (221, 104), (221, 124), (100, 125), (100, 204), (100, 102), (100, 113), (100, 104), (100, 144), (100, 105), (110, 204), (110, 220), (120, 150), (120, 202), (120, 205), (120, 113), (140, 114), (140, 214), (140, 224), (150, 143), (150, 213), (200, 142), (200, 104), (200, 145), (210, 124), (210, 144), (210, 115), (210, 145), (102, 203), (102, 204), (102, 103), (102, 135), (112, 204), (122, 213), (122, 223), (132, 225), (202, 133), (202, 114), (212, 103), (222, 104), (103, 204), (103, 114), (113, 104), (113, 105), (113, 125), (113, 204), (133, 114), (133, 144), (143, 213), (143, 223), (203, 135), (213, 124), (213, 145), (104, 105), (104, 204), (104, 215), (114, 115), (114, 204), (134, 224), (144, 145), (144, 214), (204, 105), (204, 125)]

3) 常见的,能够计算两个点之间的最短门路:

p1 = nx.shortest_path(G, source=114, target=211)
print('顶点 114 到顶点 211 的最短门路:', p1)
输入的后果:
顶点 114 到顶点 211 的最短门路:  [114, 127, 208, 124, 211]

4) 也计算图中每个点的 PageRank 值,来看各自的影响力:

print(nx.pagerank(G))

输入的后果:

{109: 0.011507076520104863, 119: 0.007835838669313514, 129: 0.015304593799331218, 139: 0.007772926737873626, 149: 0.0073896601012629825, 209: 0.0065558926178649985, 219: 0.014100908598251508, 229: 0.011454115940170253, 108: 0.01645334474680034, 118: 0.01010598371500564, 128: 0.01594717876199238, 138: 0.01671097227127263, 148: 0.015898676579503977, 208: 0.009437234075904938, 218: 0.0153795416919104, 228: 0.005900393773635255, 107: 0.009745182763645681, 117: 0.008716335675518244, 127: 0.021565565312365507, 137: 0.011642680498867146, 147: 0.009721031073465738, 207: 0.01040504770909835, 217: 0.012054472529765329, 227: 0.005615576255373405, 106: 0.007371191843767635, 116: 0.020955704443679106, 126: 0.007589432032220849, 136: 0.015987209357117116, 146: 0.013922108926721374, 206: 0.008554794629575304, 216: 0.011219193251536395, 226: 0.013613173390725904, 101: 0.016680863106330837, 111: 0.010121524312495604, 121: 0.017545503989576015, 131: 0.008531567756846938, 141: 0.014598319866130227, 201: 0.0058643663430632525, 211: 0.003936285336338021, 221: 0.009587911774927793, 100: 0.02243017302167168, 110: 0.007928429795381916, 120: 0.011875669801396205, 130: 0.0073896601012629825, 140: 0.01205992633948699, 150: 0.010045605782606326, 200: 0.015289870550944322, 210: 0.017716629501785937, 220: 0.008666577509181518, 102: 0.014865431161046641, 112: 0.007931095811770324, 122: 0.008087439927630492, 132: 0.004659566123187912, 142: 0.006487446038191551, 202: 0.013579313206377282, 212: 0.01190888044566142, 222: 0.011376739416933006, 103: 0.013438110749144392, 113: 0.02458154500563397, 123: 0.01104978432213578, 133: 0.00743370900670294, 143: 0.008011123394996112, 203: 0.006883198710237787, 213: 0.020392557117890422, 223: 0.012345866520333572, 104: 0.024902235588979776, 114: 0.019369722463816744, 124: 0.017165705442951484, 134: 0.008284361176173354, 144: 0.019363506469972095, 204: 0.03507634139024834, 214: 0.015500649025348538, 224: 0.008320315540621754, 105: 0.01439975542831122, 115: 0.007592722237637133, 125: 0.010808523955754608, 135: 0.006883198710237788, 145: 0.014654713389044883, 205: 0.014660118545887803, 215: 0.01337467974572934, 225: 0.009909720748343093}

此外,也能够和上一篇中一样,接入 Gephi [5]来失去更好的图可视化成果。

本文的代码能够参见[6].

Reference

[1] https://nebula-graph.com.cn/posts/game-of-thrones-relationship-networkx-gephi-nebula-graph/

[2] https://networkx.github.io/

[3] https://github.com/vesoft-inc/nebula

[4] https://spark.apache.org/graphx/

[5] https://gephi.org/

[6] https://github.com/vesoft-inc/nebula-python/pull/31

喜爱这篇文章?来来来,给咱们的 GitHub 点个 star 表激励啦~~ ????‍♂️????‍♀️ [手动跪谢]

交换图数据库技术?交个敌人,Nebula Graph 官网小助手微信:NebulaGraphbot 拉你进交换群~~

作者有话说:Hi,我是王杰,是图数据 Nebula Graph 研发工程师,心愿本次的教训分享能给大家带来帮忙,如有不当之处也心愿能帮忙纠正,谢谢~

正文完
 0