原文作者 Jack Vanlightly,翻译 Sijia@StreamNative,如需转载,请后盾留言。
英文链接:https://jack-vanlightly.com/b…
对于 Apache Pulsar
Apache Pulsar 是 Apache 软件基金会顶级我的项目,是下一代云原生分布式音讯流平台,集音讯、存储、轻量化函数式计算为一体,采纳计算与存储拆散架构设计,反对多租户、长久化存储、多机房跨区域数据复制,具备强一致性、高吞吐、低延时及高可扩展性等流数据存储个性。
GitHub 地址:http://github.com/apache/pulsar/
浏览本文须要大概 15 分钟。
音讯零碎
Apache Pulsar 的工作原理介绍了 Pulsar 的协定和存储模型,次要强调了(1)Pulsar 计算与存储拆散;(2)Pulsar 将 topic 分成 ledger 与分片,无需重均衡,即可向 bookie 主动写入新数据的个性。本文将会应用 Blockade 工具敞开节点、升高网速,并失落网络数据包进行测试。在音讯如何在 RabbitMQ 集群中失落和音讯如何在 Kafka 集群中失落文中,我应用了雷同的自动化测试工具,本文的测试不仅蕴含数据失落,还包含音讯排序和音讯反复。
我曾经将测试应用的代码上传到 GitHub,你能够在 ChaosTesingCode 仓库中的 Pulsar 文件夹里查看。
在每个测试场景中,咱们都创立了新的 blockade 集群,并且进行了雷同的配置:
- Apache Pulsar broker 数量
- Apache BookKeeper 节点(Bookie)数量
- Ensemble size (E)
- Write quorum size (Qw)
- Ack quorum size (Qa)
客户端通过 proxy 与 broker 进行通信。
在每个测试中,咱们都发送音讯并增加烦扰操作,比方敞开 topic 所属的 Pulsar broker,敞开以后 ledger ensemble 中的 1 个 bookie,从 ZooKeeper 中隔离节点等。
音讯发送后,启动 reader 读取并计算音讯条数,确保接管到了所有 ack 的音讯,并且确保音讯是依照正确程序被读取。
如果你想看测试方法,请浏览全文;如果你只想看测试后果,倡议间接看结语局部。
测试分析
在测试中,producer 以最快速度发送音讯,我设置的发送音讯速率为 2 万 – 3 万条 / 秒。有些测试须要较长的工夫,所以我减少了发送音讯的总数,但不升高 producer 的发送速率。
禁用音讯去重。
通过调用 pulsar-test.py
python 脚本进行测试。测试须要以下参数:
- 测试类型(no-fail、kill-broker、kill-bookie、kill-bookies[n]、isolate-broker-from-zk、isolate-bookie-from-zk、custom-isolation[partition|partition])。测试类型会在测试局部进行解释。
- 测试名称
- 测试运行次数
- 发送音讯数
- 开始烦扰操作时音讯的地位。例如,50000 指在接管到第 5 万次 ack 时,烦扰操作开始。启用烦扰操作通常须要一段时间,个别会在启用后的第几十万条音讯处开始。
- Ledger 配置(E、Qw、Qa),格局为 E-Qw-Qa,如 2-2-1
- 节点计数,格局为 brokers-bookies,如 3-3
- 是否启用数据去重(是 / 否)
blockade.yml
文件都存储在 cluster/blockade-files
目录中。每个文件中的 broker 和 bookie 数量各不相同。在调用 blockade 前,需将相应文件复制到 cluster 目录中。
调用 pulsar-test.py
执行以下操作:
- 如果 blockade 集群正在运行,应用
blockade
` destroy ` 命令终止集群。 - 复制相应的 blockade.
yml
- blockade up
- 公布音讯。音讯内容为递增的整数,用于查看音讯程序。
- 开始烦扰操作
- 音讯公布一完结,立即开始读取数据
- 剖析并记录后果
- 反复“测试运行次数“
在终端能够查看所有输入。也能够在 automated/test-output
目录下的 [test-name]-output.txt
文件中查看输入。能够在 [test-name]-duplicates.txt
和 [test-name]-out-of-order.txt
文件中别离查看反复音讯和乱序音讯。
运行后果的格局如下:
Final send count: 2000000
Final ack count: 2000000
Final positive ack count: 2000000
Final negative ack count: 0
Messages received: 2000000
Acked messages missing: 0
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
追踪音讯:
- 音讯收回后,将音讯对应的整数作为键增加到
messages_sent
词典,并增加空列表作为值。 - Ack 音讯接管胜利后,增加音讯对应的整数值到
messages_pos_acked
汇合。 - Ack 音讯接管失败后,增加音讯所对应的整数值到
messages_neg_acked
汇合。
读取音讯时,在 messages_sent
词典中查找,并增加音讯 ID 到相应列表。通过这种形式即可检测到失落的和反复的音讯。
- 失落已 ack 的音讯 = messages_sent 词典与 messages_pos_acked 汇合中重合的所有值为空列表的键
- 接管到未 ack 的音讯 = messages_sent 词典中值为空列表,但不在 messages_pos_acked 汇合中的所有
- 反复音讯 = messages_sent 词典中,值列表蕴含多个音讯 ID 的键
- 乱序音讯 = 整数值小于上一条音讯,词典中值为非空的音讯 ID 列表(即未被生产过)
因为 producer 一次只容许解决 1 万条音讯,当接管的音讯比发送的音讯滞后 1 万时,producer 会暂停并期待 ack 追赶。在测试中,咱们也确实见到过 1 万。
Pulsar 客户端在音讯发送失败时主动重试,直到发送超时。因而,当音讯胜利写入 BookKeeper,但 Pulsar broker 在向客户端发送 ack 前呈现故障时,会呈现音讯反复。测试中默认敞开音讯去重。
默认 E、Qw、Qa
测试中呈现了很奇怪的景象,在敞开 bookie 后,reader 和 consumer 不能进行生产。起初咱们发现这和 E、Qw、Qa 的默认值无关。游标存储在由上述默认值创立的 ledger 中,如果 bookie 数量不够,则无奈创立订阅。
在 blockade.yml
文件中为所有 bookie 的 environment 进行设置:
environment: {
"clusterName": "cluster-1",
"zookeeperServers": "zk1:2181",
"managedLedgerDefaultEnsembleSize": 4,
"managedLedgerDefaultWriteQuorum": 3,
"managedLedgerDefaultAckQuorum": 2
}
要保障集群失常工作,以上配置至多须要 4 个 bookie。因而,在应用大量 bookie 测试时,须要保障 bookie 数量与应用场景起码须要 bookie 的数量雷同(还要思考到 bookie 可能会出故障)。
在所有的测试场景中,E、Qw、Qa 默认值为 2、2、2,bookie 数量不少于 3。这样的设置保障了在敞开 bookie 时,不会失落游标数据,并且能够创立新的 reader 和 consumer。
测试场景 0 – 比照实验组
在第一个测试场景中,发送 200 万条音讯,无烦扰,查看音讯是否全副读取,并且读取程序正确,音讯不反复。对于这一场景,咱们测试了 5 次。
最初一次测试的终端输入如下。
$ python pulsar-test.py no-fail no-fail-test1 5 2000000 50000 2-2-1 3-3 false
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #1 on topic no-fail-test1_5 ------------
... (omitted)
-----------------------------------------------
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #2 on topic no-fail-test1_5 ------------
... (omitted)
-----------------------------------------------
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #3 on topic no-fail-test1_5 ------------
... (omitted)
-----------------------------------------------
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #4 on topic no-fail-test1_5 ------------
... (omitted)
-----------------------------------------------
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #5 on topic no-fail-test1_5 ------------
2018-10-19 16:30:44.795 INFO ConnectionPool:63 | Created connection for pulsar://192.168.208.9:6650
2018-10-19 16:30:44.799 INFO ClientConnection:285 | [192.168.208.1:34476 -> 192.168.208.9:6650] Connected to broker
2018-10-19 16:30:45.965 INFO BatchMessageContainer:41 | {BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://vanlightly/cluster-1/ns1/no-fail-test1_5] [producerName_ =] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer constructed
2018-10-19 16:30:45.965 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/no-fail-test1_5,] Getting connection from pool
2018-10-19 16:30:47.313 INFO ConnectionPool:63 | Created connection for pulsar://pulsar3:6650
2018-10-19 16:30:47.313 INFO ClientConnection:287 | [192.168.208.1:34482 -> 192.168.208.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar3:6650
2018-10-19 16:30:47.894 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/no-fail-test1_5,] Created producer on broker [192.168.208.1:34482 -> 192.168.208.9:6650]
Send count: 56645 Ack count: 50000 Pos: 50000 Neg: 0
No chaos action to perform
Send count: 102275 Ack count: 100000 Pos: 100000 Neg: 0
Send count: 151578 Ack count: 150000 Pos: 150000 Neg: 0
Send count: 207610 Ack count: 200000 Pos: 200000 Neg: 0
Send count: 255241 Ack count: 250000 Pos: 250000 Neg: 0
Send count: 309300 Ack count: 300000 Pos: 300000 Neg: 0
Send count: 354595 Ack count: 350000 Pos: 350000 Neg: 0
Send count: 407730 Ack count: 400000 Pos: 400000 Neg: 0
Send count: 459060 Ack count: 450000 Pos: 450000 Neg: 0
Send count: 503590 Ack count: 500000 Pos: 500000 Neg: 0
Send count: 557272 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 601064 Ack count: 600000 Pos: 600000 Neg: 0
Send count: 653045 Ack count: 650000 Pos: 650000 Neg: 0
Send count: 701051 Ack count: 700000 Pos: 700000 Neg: 0
Send count: 752087 Ack count: 750000 Pos: 750000 Neg: 0
Send count: 800455 Ack count: 800000 Pos: 800000 Neg: 0
Send count: 851194 Ack count: 850000 Pos: 850000 Neg: 0
Send count: 904089 Ack count: 900000 Pos: 900000 Neg: 0
Send count: 958704 Ack count: 950000 Pos: 950000 Neg: 0
Send count: 1007863 Ack count: 1000000 Pos: 1000000 Neg: 0
Send count: 1053645 Ack count: 1050000 Pos: 1050000 Neg: 0
Send count: 1102725 Ack count: 1100000 Pos: 1100000 Neg: 0
Send count: 1150921 Ack count: 1150000 Pos: 1150000 Neg: 0
Send count: 1201317 Ack count: 1200000 Pos: 1200000 Neg: 0
Send count: 1254143 Ack count: 1250000 Pos: 1250000 Neg: 0
Send count: 1303459 Ack count: 1300000 Pos: 1300000 Neg: 0
Send count: 1358449 Ack count: 1350000 Pos: 1350000 Neg: 0
Send count: 1408474 Ack count: 1400000 Pos: 1400000 Neg: 0
Send count: 1456386 Ack count: 1450000 Pos: 1450000 Neg: 0
Send count: 1505650 Ack count: 1500000 Pos: 1500000 Neg: 0
Send count: 1550343 Ack count: 1550000 Pos: 1550000 Neg: 0
Send count: 1601060 Ack count: 1600000 Pos: 1600000 Neg: 0
Send count: 1652008 Ack count: 1650000 Pos: 1650000 Neg: 0
Send count: 1701559 Ack count: 1700000 Pos: 1700000 Neg: 0
Send count: 1751662 Ack count: 1750000 Pos: 1750000 Neg: 0
Send count: 1801550 Ack count: 1800000 Pos: 1800000 Neg: 0
Send count: 1852799 Ack count: 1850000 Pos: 1850000 Neg: 0
Send count: 1901281 Ack count: 1900000 Pos: 1900000 Neg: 0
Send count: 1951079 Ack count: 1950000 Pos: 1950000 Neg: 0
Send count: 2000000 Ack count: 2000000 Pos: 2000000 Neg: 0
2018-10-19 16:31:51.177 INFO ProducerImpl:467 | [persistent://vanlightly/cluster-1/ns1/no-fail-test1_5, cluster-1-2-0] Closed producer
2018-10-19 16:31:51.346 INFO ConnectionPool:63 | Created connection for pulsar://192.168.208.9:6650
2018-10-19 16:31:51.348 INFO ClientConnection:285 | [192.168.208.1:34574 -> 192.168.208.9:6650] Connected to broker
2018-10-19 16:31:51.357 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/no-fail-test1_5, reader-5a64ed, 0] Getting connection from pool
2018-10-19 16:31:51.645 INFO ConnectionPool:63 | Created connection for pulsar://pulsar3:6650
2018-10-19 16:31:51.645 INFO ClientConnection:287 | [192.168.208.1:34578 -> 192.168.208.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar3:6650
2018-10-19 16:31:51.716 INFO ConsumerImpl:168 | [persistent://vanlightly/cluster-1/ns1/no-fail-test1_5, reader-5a64ed, 0] Created consumer on broker [192.168.208.1:34578 -> 192.168.208.9:6650]
Last confirmed entry: [0, 2557]
Start reading from (0,0,0,-1)
Received: 50000 Curr Entry: [0, 58]
Received: 100000 Curr Entry: [0, 115]
Received: 150000 Curr Entry: [0, 177]
Received: 200000 Curr Entry: [0, 236]
Received: 250000 Curr Entry: [0, 300]
Received: 300000 Curr Entry: [0, 386]
Received: 350000 Curr Entry: [0, 446]
Received: 400000 Curr Entry: [0, 506]
Received: 450000 Curr Entry: [0, 569]
Received: 500000 Curr Entry: [0, 629]
Received: 550000 Curr Entry: [0, 695]
Received: 600000 Curr Entry: [0, 754]
Received: 650000 Curr Entry: [0, 816]
Received: 700000 Curr Entry: [0, 892]
Received: 750000 Curr Entry: [0, 955]
Received: 800000 Curr Entry: [0, 1021]
Received: 850000 Curr Entry: [0, 1086]
Received: 900000 Curr Entry: [0, 1152]
Received: 950000 Curr Entry: [0, 1211]
Received: 1000000 Curr Entry: [0, 1268]
Received: 1050000 Curr Entry: [0, 1339]
Received: 1100000 Curr Entry: [0, 1410]
Received: 1150000 Curr Entry: [0, 1472]
Received: 1200000 Curr Entry: [0, 1534]
Received: 1250000 Curr Entry: [0, 1595]
Received: 1300000 Curr Entry: [0, 1654]
Received: 1350000 Curr Entry: [0, 1713]
Received: 1400000 Curr Entry: [0, 1779]
Received: 1450000 Curr Entry: [0, 1836]
Received: 1500000 Curr Entry: [0, 1896]
Received: 1550000 Curr Entry: [0, 1971]
Received: 1600000 Curr Entry: [0, 2043]
Received: 1650000 Curr Entry: [0, 2101]
Received: 1700000 Curr Entry: [0, 2164]
Received: 1750000 Curr Entry: [0, 2226]
Received: 1800000 Curr Entry: [0, 2297]
Received: 1850000 Curr Entry: [0, 2367]
Received: 1900000 Curr Entry: [0, 2433]
Received: 1950000 Curr Entry: [0, 2496]
Received: 2000000 Curr Entry: [0, 2557]
Read phase complete with message (0,2557,378,-1)
2018-10-19 16:32:35.080 INFO ConsumerImpl:761 | [persistent://vanlightly/cluster-1/ns1/no-fail-test1_5, reader-5a64ed, 0] Closed consumer 0
Results --------------------------------------------
Final send count: 2000000
Final ack count: 2000000
Final positive ack count: 2000000
Final negative ack count: 0
Messages received: 2000000
Acked messages missing: 0
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
能够在 test-output/no-fail-test1-output.txt 文件中查看简要后果。
16:11:45 Start test
16:13:54:
16:13:54: Test Run #1 on topic no-fail-test1_1 ------------
16:15:41: Results --------------------------------------------
16:15:41: Final send count: 2000000
16:15:41: Final ack count: 2000000
16:15:41: Final positive ack count: 2000000
16:15:41: Final negative ack count: 0
16:15:41: Messages received: 2000000
16:15:41: Acked messages missing: 0
16:15:41: Non-acked messages received: 0
16:15:41: Out-of-order: 0
16:15:41: Duplicates: 0
16:15:41: ----------------------------------------------------
16:17:57:
16:17:57: Test Run #2 on topic no-fail-test1_2 ------------
16:19:44: Results --------------------------------------------
16:19:44: Final send count: 2000000
16:19:44: Final ack count: 2000000
16:19:44: Final positive ack count: 2000000
16:19:44: Final negative ack count: 0
16:19:44: Messages received: 2000000
16:19:44: Acked messages missing: 0
16:19:44: Non-acked messages received: 0
16:19:44: Out-of-order: 0
16:19:44: Duplicates: 0
16:19:44: ----------------------------------------------------
16:22:01:
16:22:01: Test Run #3 on topic no-fail-test1_3 ------------
16:23:52: Results --------------------------------------------
16:23:52: Final send count: 2000000
16:23:52: Final ack count: 2000000
16:23:52: Final positive ack count: 2000000
16:23:52: Final negative ack count: 0
16:23:52: Messages received: 2000000
16:23:52: Acked messages missing: 0
16:23:52: Non-acked messages received: 0
16:23:52: Out-of-order: 0
16:23:52: Duplicates: 0
16:23:52: ----------------------------------------------------
16:26:20:
16:26:20: Test Run #4 on topic no-fail-test1_4 ------------
16:28:18: Results --------------------------------------------
16:28:18: Final send count: 2000000
16:28:18: Final ack count: 2000000
16:28:18: Final positive ack count: 2000000
16:28:18: Final negative ack count: 0
16:28:18: Messages received: 2000000
16:28:18: Acked messages missing: 0
16:28:18: Non-acked messages received: 0
16:28:18: Out-of-order: 0
16:28:18: Duplicates: 0
16:28:18: ----------------------------------------------------
16:30:43:
16:30:43: Test Run #5 on topic no-fail-test1_5 ------------
16:32:36: Results --------------------------------------------
16:32:36: Final send count: 2000000
16:32:36: Final ack count: 2000000
16:32:36: Final positive ack count: 2000000
16:32:36: Final negative ack count: 0
16:32:36: Messages received: 2000000
16:32:36: Acked messages missing: 0
16:32:36: Non-acked messages received: 0
16:32:36: Out-of-order: 0
16:32:36: Duplicates: 0
16:32:36: ----------------------------------------------------
从终端输入能够看出速度并不是很快。五个测试共用了大概 20 分钟。次要问题在于创立 blockade 集群速度较慢。应用 RabbitMQ 和 Kafka,运行速度很快;但 Pulsar 集群占用了大量 CPU,启动迟缓。起因可能和咱们运行了 8 个节点(1 ZK、1 Proxy、3 Broker、3 Bookie)无关,最多季节点数达到 11 个。
测试中没有呈现音讯失落、反复或乱序的问题。
测试场景 1 – 敞开 topic broker owner
参数配置 2-2-1 3-3。
在测试中,pulsar-test.py
应用 shell 脚本检测 broker owner,并在音讯传输时将其敞开。
参数配置为 E=2、Qw=2、Qa=1,3 个 broker,3 个 bookie,这种配置下冗余最小。每条音讯有两个正本,但 broker 接管到任一 bookie 的一个 ack 后,就会将 ack 发送到客户端。
在音讯传输时敞开 broker 不会造成数据失落。只有在 Qa bookie 长久化音讯到磁盘后,音讯才被 ack,并且由 broker 发送 ack。如果敞开 broker,最差的状况是 reader 读取未被 ack 的音讯。Bookie 可能曾经长久化音讯,但 broker 在向客户端发送 ack 前呈现故障。在启用新 broker 后,复原期间会检测到这些音讯,并确保敞开 ledger 前生成音讯正本,使这些音讯对 reader 可用。
以下是某次测试的残缺输入。在每次测试中,发送 100 万条音讯,在发送到第 5 万条音讯时敞开 owner。咱们进行了 5 次该测试。
$ python pulsar-test.py kill-broker kill-broker-test1 5 1000000 50000 2-2-1 3-3 false
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #1 on topic kill-broker-test1_1 ------------
-------------------------------------------------
WRITE PHASE
-------------------------------------------------
2018-10-19 20:31:48.389 INFO ConnectionPool:63 | Created connection for pulsar://pulsar2:6650
2018-10-19 20:31:48.389 INFO ClientConnection:287 | [192.168.96.1:52202 -> 192.168.96.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar2:6650
Send count: 51706 Ack count: 50000 Pos: 50000 Neg: 0
Send count: 101199 Ack count: 100000 Pos: 100000 Neg: 0
Send count: 155831 Ack count: 150000 Pos: 150000 Neg: 0
Send count: 208015 Ack count: 200000 Pos: 200000 Neg: 0
Send count: 250923 Ack count: 250000 Pos: 250000 Neg: 0
Send count: 303241 Ack count: 300000 Pos: 300000 Neg: 0
Send count: 350970 Ack count: 350000 Pos: 350000 Neg: 0
Send count: 400041 Ack count: 400000 Pos: 400000 Neg: 0
Send count: 450612 Ack count: 450000 Pos: 450000 Neg: 0
-------------------------------------------------
pulsar2 is the topic owner, killing pulsar2!!!!!!
-------------------------------------------------
Send count: 501553 Ack count: 500000 Pos: 500000 Neg: 0
Send count: 551276 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 602319 Ack count: 600000 Pos: 600000 Neg: 0
Send count: 651554 Ack count: 650000 Pos: 650000 Neg: 0
2018-10-19 20:32:07.710 INFO ClientConnection:1237 | [192.168.96.1:52202 -> 192.168.96.9:6650] Connection closed
2018-10-19 20:32:07.710 INFO HandlerBase:129 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test1_1, cluster-1-1-0] Schedule reconnection in 0.1 s
2018-10-19 20:32:07.710 INFO ClientConnection:195 | [192.168.96.1:52202 -> 192.168.96.9:6650] Destroyed connection
2018-10-19 20:32:07.810 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test1_1, cluster-1-1-0] Getting connection from pool
2018-10-19 20:32:07.815 ERROR ClientConnection:792 | [192.168.96.1:52196 -> 192.168.96.9:6650] Failed lookup req_id: 3 error: 6
2018-10-19 20:32:07.815 INFO HandlerBase:129 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test1_1, cluster-1-1-0] Schedule reconnection in 0.192 s
2018-10-19 20:32:08.009 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test1_1, cluster-1-1-0] Getting connection from pool
2018-10-19 20:32:08.500 INFO ConnectionPool:53 | Deleting stale connection from pool for pulsar://pulsar2:6650 use_count: -1 @ 0
2018-10-19 20:32:08.500 INFO ConnectionPool:63 | Created connection for pulsar://pulsar2:6650
2018-10-19 20:32:08.501 INFO ClientConnection:287 | [192.168.96.1:52276 -> 192.168.96.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar2:6650
-------------------------------------------------
pulsar2 KILLED!
-------------------------------------------------
Send count: 702977 Ack count: 700000 Pos: 692977 Neg: 7023
2018-10-19 20:32:38.508 INFO ClientConnection:1237 | [192.168.96.1:52276 -> 192.168.96.9:6650] Connection closed
2018-10-19 20:32:38.509 INFO HandlerBase:129 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, cluster-1-1-0] Schedule reconnection in 0.1 s
2018-10-19 20:32:38.509 INFO ClientConnection:195 | [192.168.96.1:52276 -> 192.168.96.9:6650] Destroyed connection
2018-10-19 20:32:38.610 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, cluster-1-1-0] Getting connection from pool
2018-10-19 20:32:38.647 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-19 20:32:38.647 INFO ClientConnection:287 | [192.168.96.1:52306 -> 192.168.96.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-19 20:32:39.592 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, cluster-1-1-0] Created producer on broker [192.168.96.1:52306 -> 192.168.96.9:6650]
Send count: 750954 Ack count: 750000 Pos: 740000 Neg: 10000
Send count: 805568 Ack count: 800000 Pos: 790000 Neg: 10000
Send count: 853879 Ack count: 850000 Pos: 840000 Neg: 10000
Send count: 904020 Ack count: 900000 Pos: 890000 Neg: 10000
Send count: 953552 Ack count: 950000 Pos: 940000 Neg: 10000
Send count: 1000000 Ack count: 1000000 Pos: 990000 Neg: 10000
2018-10-19 20:32:47.320 INFO ProducerImpl:467 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, cluster-1-1-0] Closed producer
2018-10-19 20:32:47.466 INFO ConnectionPool:63 | Created connection for pulsar://192.168.96.9:6650
2018-10-19 20:32:47.468 INFO ClientConnection:285 | [192.168.96.1:52406 -> 192.168.96.9:6650] Connected to broker
2018-10-19 20:32:47.477 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, reader-63a5b2, 0] Getting connection from pool
2018-10-19 20:32:47.496 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-19 20:32:47.496 INFO ClientConnection:287 | [192.168.96.1:52410 -> 192.168.96.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-19 20:32:47.562 INFO ConsumerImpl:168 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, reader-63a5b2, 0] Created consumer on broker [192.168.96.1:52410 -> 192.168.96.9:6650]
-------------------------------------------------
READ PHASE
-------------------------------------------------
Last confirmed entry: [1, 387]
Start reading from (0,0,0,-1)
Received: 50000 Curr Entry: [0, 72]
Received: 100000 Curr Entry: [0, 148]
Received: 150000 Curr Entry: [0, 207]
Received: 200000 Curr Entry: [0, 272]
Received: 250000 Curr Entry: [0, 335]
Received: 300000 Curr Entry: [0, 400]
Received: 350000 Curr Entry: [0, 466]
Received: 400000 Curr Entry: [0, 534]
Received: 450000 Curr Entry: [0, 611]
Received: 500000 Curr Entry: [0, 688]
Received: 550000 Curr Entry: [0, 747]
Received: 600000 Curr Entry: [0, 815]
Received: 650000 Curr Entry: [0, 913]
Received: 700000 Curr Entry: [1, 7]
Received: 750000 Curr Entry: [1, 83]
Received: 800000 Curr Entry: [1, 157]
Received: 850000 Curr Entry: [1, 227]
Received: 900000 Curr Entry: [1, 287]
Received: 950000 Curr Entry: [1, 343]
Read phase complete with message (1,387,469,-1)
2018-10-19 20:33:11.943 INFO ConsumerImpl:761 | [persistent://vanlightly/cluster-1/ns1/kill-broker-test2_10, reader-63a5b2, 0] Closed consumer 0
Results --------------------------------------------
Final send count: 1000000
Final ack count: 1000000
Final positive ack count: 990000
Final negative ack count: 10000
Messages received: 990000
Acked messages missing: 0
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
能够看到,在音讯传输过程中,辨认 pulsar2 为 owner 后,很快将其敞开。客户端呈现了连贯失败的问题,并通过 proxy 从新连贯到新的 owner pulsar1。重连胜利后,持续发送音讯。
因为 producer 受到限制,通常只会传输 1 万条音讯,而这 1 万条音讯全副接管失败(超时)。这样接管胜利的音讯就有 99 万条,无反复音讯,均被有序读取。
咱们共进行了 5 次测试,测试后果如下:
19:59:08:
19:59:08: Test Run #1 on topic kill-broker-test_1 ------------
20:00:41: Results --------------------------------------------
20:00:41: Final send count: 1000000
20:00:41: Final ack count: 1000000
20:00:41: Final positive ack count: 990000
20:00:41: Final negative ack count: 10000
20:00:41: Messages received: 990000
20:00:41: Acked messages missing: 0
20:00:41: Non-acked messages received: 0
20:00:41: Out-of-order: 0
20:00:41: Duplicates: 0
20:00:41: ----------------------------------------------------
20:02:54:
20:02:54: Test Run #2 on topic kill-broker-test_2 ------------
20:03:54: Results --------------------------------------------
20:03:54: Final send count: 1000000
20:03:54: Final ack count: 1000000
20:03:54: Final positive ack count: 1000000
20:03:54: Final negative ack count: 0
20:03:54: Messages received: 1000000
20:03:54: Acked messages missing: 0
20:03:54: Non-acked messages received: 0
20:03:54: Out-of-order: 0
20:03:54: Duplicates: 0
20:03:54: ----------------------------------------------------
20:06:06:
20:06:06: Test Run #3 on topic kill-broker-test_3 ------------
20:07:31: Results --------------------------------------------
20:07:31: Final send count: 1000000
20:07:31: Final ack count: 1000000
20:07:31: Final positive ack count: 990000
20:07:31: Final negative ack count: 10000
20:07:31: Messages received: 991065
20:07:31: Acked messages missing: 0
20:07:31: Non-acked messages received: 1065
20:07:31: Out-of-order: 0
20:07:31: Duplicates: 0
20:07:31: ----------------------------------------------------
20:09:42:
20:09:42: Test Run #4 on topic kill-broker-test_4 ------------
20:11:18: Results --------------------------------------------
20:11:18: Final send count: 1000000
20:11:18: Final ack count: 1000000
20:11:18: Final positive ack count: 990000
20:11:18: Final negative ack count: 10000
20:11:18: Messages received: 990000
20:11:18: Acked messages missing: 0
20:11:18: Non-acked messages received: 0
20:11:18: Out-of-order: 0
20:11:18: Duplicates: 0
20:11:18: ----------------------------------------------------
20:13:36:
20:13:36: Test Run #5 on topic kill-broker-test_5 ------------
20:15:09: Results --------------------------------------------
20:15:09: Final send count: 1000000
20:15:09: Final ack count: 1000000
20:15:09: Final positive ack count: 990000
20:15:09: Final negative ack count: 10000
20:15:09: Messages received: 990000
20:15:09: Acked messages missing: 0
20:15:09: Non-acked messages received: 0
20:15:09: Out-of-order: 0
20:15:09: Duplicates: 0
20:15:09: ----------------------------------------------------
能够看出没有音讯失落。第 3 次运行时,读取了 1065 条未 ack 的音讯。如果重试发送音讯引起了音讯反复,这种状况就很失常。启用去重就能够解决这一问题,咱们会在前面的场景中进行测试。
正如预期,broker 故障不会导致音讯失落。
测试场景 2 – 敞开以后 Ledger Ensemble 中的 Bookie
参数配置 2-2-1 3-3。
咱们采纳冗余最小的配置来测试 Pulsar 的灵活性,E=2、Qw=2、Qa=1,3 个 broker,3 个 bookie。在发送 100 万条音讯期间,辨认以后 ledger ensemble 中的一个 bookie,并将其敞开。而后,查看 reader 读取 100 万条音讯的程序是否正确。
在这里我做了一个非凡的假如,把以后 ledger 看作第一个 ledger。在进行过无数次测试后,我发现至多前 100 万条音讯会落在第一个 ledger 的 entry 中。因而,我决定只关注第一个 ledger 中的 ensemble,它在 ZooKeeper 中的门路已知。
如果 Qw = 2,只敞开一个 bookie,不应该呈现数据失落。如果敞开了 ack 音讯的 bookie,则在 AutoRecovery 复制数据前,仍有一个 entry 正本,而呈现数据失落的状况是两个正本同时失落。
单次运行的残缺输入如下。
$ python pulsar-test.py kill-bookie kill-bookie-test 5 1000000 50000 2-2-1 3-3 false
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #1 on topic kill-bookie-test_1 ------------
-------------------------------------------------
WRITE PHASE
-------------------------------------------------
2018-10-19 23:46:10.939 INFO ConnectionPool:63 | Created connection for pulsar://192.168.224.9:6650
2018-10-19 23:46:12.877 INFO ConnectionPool:63 | Created connection for pulsar://pulsar3:6650
2018-10-19 23:46:12.877 INFO ClientConnection:287 | [192.168.224.1:50588 -> 192.168.224.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar3:6650
2018-10-19 23:46:13.397 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/kill-bookie-test_1,] Created producer on broker [192.168.224.1:50588 -> 192.168.224.9:6650]
Send count: 52575 Ack count: 50000 Pos: 50000 Neg: 0
Send count: 100560 Ack count: 100000 Pos: 100000 Neg: 0
Send count: 156455 Ack count: 150000 Pos: 150000 Neg: 0
Send count: 203545 Ack count: 200000 Pos: 200000 Neg: 0
Send count: 255199 Ack count: 250000 Pos: 250000 Neg: 0
Send count: 305732 Ack count: 300000 Pos: 300000 Neg: 0
Send count: 357709 Ack count: 350000 Pos: 350000 Neg: 0
Send count: 407634 Ack count: 400000 Pos: 400000 Neg: 0
-------------------------------------------------
bookie1 is in the current ledger ensemble, killing bookie1!!!!!!
-------------------------------------------------
Send count: 451809 Ack count: 450000 Pos: 450000 Neg: 0
Send count: 501234 Ack count: 500000 Pos: 500000 Neg: 0
Send count: 551348 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 601729 Ack count: 600000 Pos: 600000 Neg: 0
Send count: 650566 Ack count: 650000 Pos: 650000 Neg: 0
-------------------------------------------------
bookie1 KILLED!
-------------------------------------------------
Send count: 701961 Ack count: 700000 Pos: 700000 Neg: 0
Send count: 751294 Ack count: 750000 Pos: 750000 Neg: 0
Send count: 802939 Ack count: 800000 Pos: 800000 Neg: 0
Send count: 850390 Ack count: 850000 Pos: 850000 Neg: 0
Send count: 903739 Ack count: 900000 Pos: 900000 Neg: 0
Send count: 953286 Ack count: 950000 Pos: 950000 Neg: 0
Send count: 1000000 Ack count: 1000000 Pos: 1000000 Neg: 0
2018-10-19 23:46:47.281 INFO ProducerImpl:467 | [persistent://vanlightly/cluster-1/ns1/kill-bookie-test_1, cluster-1-2-0] Closed producer
-------------------------------------------------
READ PHASE
-------------------------------------------------
2018-10-19 23:46:57.499 INFO ConnectionPool:63 | Created connection for pulsar://192.168.224.9:6650
2018-10-19 23:46:57.500 INFO ClientConnection:285 | [192.168.224.1:50756 -> 192.168.224.9:6650] Connected to broker
2018-10-19 23:46:57.508 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-bookie-test_1, reader-4d4064, 0] Getting connection from pool
2018-10-19 23:46:57.566 INFO ConnectionPool:63 | Created connection for pulsar://pulsar2:6650
2018-10-19 23:46:57.566 INFO ClientConnection:287 | [192.168.224.1:50760 -> 192.168.224.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar2:6650
2018-10-19 23:46:57.750 INFO ConsumerImpl:168 | [persistent://vanlightly/cluster-1/ns1/kill-bookie-test_1, reader-4d4064, 0] Created consumer on broker [192.168.224.1:50760 -> 192.168.224.9:6650]
LCE. broker pulsar2 lac_line "1:475",
Last confirmed entry: [1, 475]
Start reading from (0,0,0,-1)
Received: 50000 Curr Entry: [0, 70]
Received: 100000 Curr Entry: [0, 165]
Received: 150000 Curr Entry: [0, 270]
Received: 200000 Curr Entry: [0, 339]
Received: 250000 Curr Entry: [0, 403]
Received: 300000 Curr Entry: [0, 462]
Received: 350000 Curr Entry: [0, 519]
Received: 400000 Curr Entry: [0, 579]
Received: 450000 Curr Entry: [0, 658]
Received: 500000 Curr Entry: [0, 717]
Received: 550000 Curr Entry: [0, 782]
Received: 600000 Curr Entry: [0, 848]
Received: 650000 Curr Entry: [0, 942]
Received: 700000 Curr Entry: [1, 57]
Received: 750000 Curr Entry: [1, 119]
Received: 800000 Curr Entry: [1, 195]
Received: 850000 Curr Entry: [1, 267]
Received: 900000 Curr Entry: [1, 357]
Received: 950000 Curr Entry: [1, 418]
Received: 1000000 Curr Entry: [1, 475]
Read phase complete with message (1,475,324,-1)
2018-10-19 23:47:51.236 INFO ConsumerImpl:761 | [persistent://vanlightly/cluster-1/ns1/kill-bookie-test_1, reader-4d4064, 0] Closed consumer 0
Results --------------------------------------------
Final send count: 1000000
Final ack count: 1000000
Final positive ack count: 1000000
Final negative ack count: 0
Messages received: 1000000
Acked messages missing: 0
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
----------------------------------------------------
在音讯传输期间敞开了 1 个 bookie,但音讯传输并未受到影响,reader 按程序胜利读取了 100 万条音讯,并且没有呈现音讯反复。
5 次测试的运行后果如下。
23:52:20 Start test
23:54:49:
23:54:49: Test Run #1 on topic kill-bookie-test_1 ------------
23:56:38: Results --------------------------------------------
23:56:38: Final send count: 1000000
23:56:38: Final ack count: 1000000
23:56:38: Final positive ack count: 1000000
23:56:38: Final negative ack count: 0
23:56:38: Messages received: 1000000
23:56:38: Acked messages missing: 0
23:56:38: Non-acked messages received: 0
23:56:38: Out-of-order: 0
23:56:38: Duplicates: 0
23:56:38: ----------------------------------------------------
23:58:54:
23:58:54: Test Run #2 on topic kill-bookie-test_2 ------------
00:00:50: Results --------------------------------------------
00:00:50: Final send count: 1000000
00:00:50: Final ack count: 1000000
00:00:50: Final positive ack count: 1000000
00:00:50: Final negative ack count: 0
00:00:50: Messages received: 1000000
00:00:50: Acked messages missing: 0
00:00:50: Non-acked messages received: 0
00:00:50: Out-of-order: 0
00:00:50: Duplicates: 0
00:00:50: ----------------------------------------------------
00:03:12:
00:03:12: Test Run #3 on topic kill-bookie-test_3 ------------
00:05:01: Results --------------------------------------------
00:05:01: Final send count: 1000000
00:05:01: Final ack count: 1000000
00:05:01: Final positive ack count: 1000000
00:05:01: Final negative ack count: 0
00:05:01: Messages received: 1000000
00:05:01: Acked messages missing: 0
00:05:01: Non-acked messages received: 0
00:05:01: Out-of-order: 0
00:05:01: Duplicates: 0
00:05:01: ----------------------------------------------------
00:07:15:
00:07:15: Test Run #4 on topic kill-bookie-test_4 ------------
00:09:04: Results --------------------------------------------
00:09:04: Final send count: 1000000
00:09:04: Final ack count: 1000000
00:09:04: Final positive ack count: 1000000
00:09:04: Final negative ack count: 0
00:09:04: Messages received: 1000000
00:09:04: Acked messages missing: 0
00:09:04: Non-acked messages received: 0
00:09:04: Out-of-order: 0
00:09:04: Duplicates: 0
00:09:04: ----------------------------------------------------
00:11:23:
00:11:23: Test Run #5 on topic kill-bookie-test_5 ------------
00:13:23: Results --------------------------------------------
00:13:23: Final send count: 1000000
00:13:23: Final ack count: 1000000
00:13:23: Final positive ack count: 1000000
00:13:23: Final negative ack count: 0
00:13:23: Messages received: 1000000
00:13:23: Acked messages missing: 0
00:13:23: Non-acked messages received: 0
00:13:23: Out-of-order: 0
00:13:23: Duplicates: 0
00:13:23: ----------------------------------------------------
没有呈现音讯失落、反复和乱序。
测试场景 3 – 隔离 Topic Owner Broker 与 ZooKeeper
参数配置 2-2-1 3-3。
在这次测试中,咱们只隔离 topic owner broker 和 ZooKeeper。咱们预测在隔离 ZooKeeper 前,broker owner 会继续接管写入;隔离失效后,broker owner 主动重启,另一个 broker 接管 broker owner 角色。
预计测试后果与敞开 bookie 的后果相似。
单次运行的残缺输入如下。
$ python pulsar-test.py isolate-broker-from-zk iso-broker 1 2000000 50000 2-2-1 3-3 false
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #1 on topic iso-broker_1 ------------
-------------------------------------------------
WRITE PHASE
-------------------------------------------------
2018-10-20 08:22:43.961 INFO ConnectionPool:63 | Created connection for pulsar://172.22.0.9:6650
2018-10-20 08:22:43.963 INFO ClientConnection:285 | [172.22.0.1:46968 -> 172.22.0.9:6650] Connected to broker
2018-10-20 08:22:44.664 INFO BatchMessageContainer:41 | {BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://vanlightly/cluster-1/ns1/iso-broker_1] [producerName_ =] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer constructed
2018-10-20 08:22:44.664 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1,] Getting connection from pool
2018-10-20 08:22:45.254 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-20 08:22:45.254 INFO ClientConnection:287 | [172.22.0.1:46972 -> 172.22.0.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-20 08:22:46.329 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1,] Created producer on broker [172.22.0.1:46972 -> 172.22.0.9:6650]
Send count: 57639 Ack count: 50000 Pos: 50000 Neg: 0
pulsar1 is the topic owner, isolating pulsar1 from zookeepr!!!!!!
Send count: 100918 Ack count: 100000 Pos: 100000 Neg: 0
Send count: 151902 Ack count: 150000 Pos: 150000 Neg: 0
Send count: 200658 Ack count: 200000 Pos: 200000 Neg: 0
Send count: 251287 Ack count: 250000 Pos: 250000 Neg: 0
Send count: 306076 Ack count: 300000 Pos: 300000 Neg: 0
Send count: 353796 Ack count: 350000 Pos: 350000 Neg: 0
Send count: 403469 Ack count: 400000 Pos: 400000 Neg: 0
Send count: 455195 Ack count: 450000 Pos: 450000 Neg: 0
-------------------------------------------------
pulsar1 ISOLATED!
-------------------------------------------------
Send count: 506817 Ack count: 500000 Pos: 500000 Neg: 0
Send count: 557051 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 601668 Ack count: 600000 Pos: 600000 Neg: 0
Send count: 652490 Ack count: 650000 Pos: 650000 Neg: 0
Send count: 709134 Ack count: 700000 Pos: 700000 Neg: 0
Send count: 757860 Ack count: 750000 Pos: 750000 Neg: 0
Send count: 805959 Ack count: 800000 Pos: 800000 Neg: 0
Send count: 858283 Ack count: 850000 Pos: 850000 Neg: 0
Send count: 907911 Ack count: 900000 Pos: 900000 Neg: 0
Send count: 951597 Ack count: 950000 Pos: 950000 Neg: 0
Send count: 1002732 Ack count: 1000000 Pos: 1000000 Neg: 0
Send count: 1050689 Ack count: 1050000 Pos: 1050000 Neg: 0
Send count: 1104743 Ack count: 1100000 Pos: 1100000 Neg: 0
Send count: 1157332 Ack count: 1150000 Pos: 1150000 Neg: 0
Send count: 1202361 Ack count: 1200000 Pos: 1200000 Neg: 0
Send count: 1256378 Ack count: 1250000 Pos: 1250000 Neg: 0
Send count: 1301586 Ack count: 1300000 Pos: 1300000 Neg: 0
Send count: 1352458 Ack count: 1350000 Pos: 1350000 Neg: 0
Send count: 1407455 Ack count: 1400000 Pos: 1400000 Neg: 0
2018-10-20 08:23:40.624 INFO ClientConnection:1237 | [172.22.0.1:46972 -> 172.22.0.9:6650] Connection closed
2018-10-20 08:23:40.624 INFO HandlerBase:129 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, cluster-1-0-0] Schedule reconnection in 0.1 s
2018-10-20 08:23:40.624 INFO ClientConnection:195 | [172.22.0.1:46972 -> 172.22.0.9:6650] Destroyed connection
2018-10-20 08:23:40.724 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, cluster-1-0-0] Getting connection from pool
2018-10-20 08:23:41.103 INFO ConnectionPool:53 | Deleting stale connection from pool for pulsar://pulsar1:6650 use_count: -1 @ 0
2018-10-20 08:23:41.103 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-20 08:23:41.104 INFO ClientConnection:287 | [172.22.0.1:47050 -> 172.22.0.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-20 08:23:42.139 INFO ClientConnection:1237 | [172.22.0.1:47050 -> 172.22.0.9:6650] Connection closed
2018-10-20 08:23:42.139 INFO HandlerBase:129 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, cluster-1-0-0] Schedule reconnection in 0.186 s
2018-10-20 08:23:42.139 INFO ClientConnection:195 | [172.22.0.1:47050 -> 172.22.0.9:6650] Destroyed connection
2018-10-20 08:23:42.325 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, cluster-1-0-0] Getting connection from pool
2018-10-20 08:23:42.750 INFO ConnectionPool:63 | Created connection for pulsar://pulsar3:6650
2018-10-20 08:23:42.751 INFO ClientConnection:287 | [172.22.0.1:47054 -> 172.22.0.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar3:6650
2018-10-20 08:23:43.632 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, cluster-1-0-0] Created producer on broker [172.22.0.1:47054 -> 172.22.0.9:6650]
Send count: 1459178 Ack count: 1450000 Pos: 1450000 Neg: 0
Send count: 1505703 Ack count: 1500000 Pos: 1500000 Neg: 0
Send count: 1551367 Ack count: 1550000 Pos: 1550000 Neg: 0
Send count: 1601042 Ack count: 1600000 Pos: 1600000 Neg: 0
Send count: 1651310 Ack count: 1650000 Pos: 1650000 Neg: 0
Send count: 1707836 Ack count: 1700000 Pos: 1700000 Neg: 0
Send count: 1751159 Ack count: 1750000 Pos: 1750000 Neg: 0
Send count: 1802089 Ack count: 1800000 Pos: 1800000 Neg: 0
Send count: 1851082 Ack count: 1850000 Pos: 1850000 Neg: 0
Send count: 1906930 Ack count: 1900000 Pos: 1900000 Neg: 0
Send count: 1951920 Ack count: 1950000 Pos: 1950000 Neg: 0
Send count: 2000000 Ack count: 2000000 Pos: 2000000 Neg: 0
2018-10-20 08:23:57.505 INFO ProducerImpl:467 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, cluster-1-0-0] Closed producer
-------------------------------------------------
READ PHASE
-------------------------------------------------
2018-10-20 08:24:07.620 INFO ConnectionPool:63 | Created connection for pulsar://172.22.0.9:6650
2018-10-20 08:24:07.621 INFO ClientConnection:285 | [172.22.0.1:47156 -> 172.22.0.9:6650] Connected to broker
2018-10-20 08:24:07.635 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, reader-c1ba0b, 0] Getting connection from pool
2018-10-20 08:24:07.665 INFO ConnectionPool:63 | Created connection for pulsar://pulsar3:6650
2018-10-20 08:24:07.665 INFO ClientConnection:287 | [172.22.0.1:47160 -> 172.22.0.9:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar3:6650
2018-10-20 08:24:07.749 INFO ConsumerImpl:168 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, reader-c1ba0b, 0] Created consumer on broker [172.22.0.1:47160 -> 172.22.0.9:6650]
Last confirmed entry: [1, 658]
Start reading from (0,0,0,-1)
Received: 50000 Curr Entry: [0, 61]
Received: 100000 Curr Entry: [0, 142]
Received: 150000 Curr Entry: [0, 212]
Received: 200000 Curr Entry: [0, 288]
Received: 250000 Curr Entry: [0, 349]
Received: 300000 Curr Entry: [0, 420]
Received: 350000 Curr Entry: [0, 487]
Received: 400000 Curr Entry: [0, 554]
Received: 450000 Curr Entry: [0, 612]
Received: 500000 Curr Entry: [0, 675]
Received: 550000 Curr Entry: [0, 731]
Received: 600000 Curr Entry: [0, 790]
Received: 650000 Curr Entry: [0, 847]
Received: 700000 Curr Entry: [0, 904]
Received: 750000 Curr Entry: [0, 961]
Received: 800000 Curr Entry: [0, 1022]
Received: 850000 Curr Entry: [0, 1080]
Received: 900000 Curr Entry: [0, 1136]
Received: 950000 Curr Entry: [0, 1194]
Received: 1000000 Curr Entry: [0, 1252]
Received: 1050000 Curr Entry: [0, 1310]
Received: 1100000 Curr Entry: [0, 1370]
Received: 1150000 Curr Entry: [0, 1428]
Received: 1200000 Curr Entry: [0, 1487]
Received: 1250000 Curr Entry: [0, 1545]
Received: 1300000 Curr Entry: [0, 1602]
Received: 1350000 Curr Entry: [0, 1660]
Received: 1400000 Curr Entry: [0, 1721]
Received: 1450000 Curr Entry: [1, 1]
Received: 1500000 Curr Entry: [1, 60]
Received: 1550000 Curr Entry: [1, 124]
Received: 1600000 Curr Entry: [1, 186]
Received: 1650000 Curr Entry: [1, 247]
Received: 1700000 Curr Entry: [1, 303]
Received: 1750000 Curr Entry: [1, 361]
Received: 1800000 Curr Entry: [1, 420]
Received: 1850000 Curr Entry: [1, 478]
Received: 1900000 Curr Entry: [1, 539]
Received: 1950000 Curr Entry: [1, 598]
Received: 2000000 Curr Entry: [1, 658]
Read phase complete with message (1,658,138,-1)
2018-10-20 08:24:44.361 INFO ConsumerImpl:761 | [persistent://vanlightly/cluster-1/ns1/iso-broker_1, reader-c1ba0b, 0] Closed consumer 0
Results --------------------------------------------
Final send count: 2000000
Final ack count: 2000000
Final positive ack count: 2000000
Final negative ack count: 0
Messages received: 2000000
Acked messages missing: 0
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
----------------------------------------------------
在本次测试中,咱们发送了 200 万条音讯,因为隔离 owner broker 再看到成果须要一段时间。大略在发送 140 万条音讯时呈现故障并迅速从新连贯。Reader 有序读取全副音讯,并且没有音讯反复。
5 次测试的运行后果如下。
08:33:38 Start test
08:35:55:
08:35:55: Test Run #1 on topic iso-broker_1 ------------
08:38:10: Results --------------------------------------------
08:38:10: Final send count: 2000000
08:38:10: Final ack count: 2000000
08:38:10: Final positive ack count: 2000000
08:38:10: Final negative ack count: 0
08:38:10: Messages received: 2000000
08:38:10: Acked messages missing: 0
08:38:10: Non-acked messages received: 0
08:38:10: Out-of-order: 0
08:38:10: Duplicates: 0
08:38:10: ----------------------------------------------------
08:40:29:
08:40:29: Test Run #2 on topic iso-broker_2 ------------
08:42:36: Results --------------------------------------------
08:42:36: Final send count: 2000000
08:42:36: Final ack count: 2000000
08:42:36: Final positive ack count: 2000000
08:42:36: Final negative ack count: 0
08:42:36: Messages received: 2000000
08:42:36: Acked messages missing: 0
08:42:36: Non-acked messages received: 0
08:42:36: Out-of-order: 0
08:42:36: Duplicates: 0
08:42:36: ----------------------------------------------------
08:44:55:
08:44:55: Test Run #3 on topic iso-broker_3 ------------
08:47:08: Results --------------------------------------------
08:47:08: Final send count: 2000000
08:47:08: Final ack count: 2000000
08:47:08: Final positive ack count: 2000000
08:47:08: Final negative ack count: 0
08:47:08: Messages received: 2001000
08:47:08: Acked messages missing: 0
08:47:08: Non-acked messages received: 0
08:47:08: Out-of-order: 0
08:47:08: Duplicates: 1000
08:47:08: ----------------------------------------------------
08:49:33:
08:49:33: Test Run #4 on topic iso-broker_4 ------------
08:51:52: Results --------------------------------------------
08:51:52: Final send count: 2000000
08:51:52: Final ack count: 2000000
08:51:52: Final positive ack count: 2000000
08:51:52: Final negative ack count: 0
08:51:52: Messages received: 2001090
08:51:52: Acked messages missing: 0
08:51:52: Non-acked messages received: 0
08:51:52: Out-of-order: 0
08:51:52: Duplicates: 1090
08:51:52: ----------------------------------------------------
08:54:18:
08:54:18: Test Run #5 on topic iso-broker_5 ------------
08:56:41: Results --------------------------------------------
08:56:41: Final send count: 2000000
08:56:41: Final ack count: 2000000
08:56:41: Final positive ack count: 2000000
08:56:41: Final negative ack count: 0
08:56:41: Messages received: 2000000
08:56:41: Acked messages missing: 0
08:56:41: Non-acked messages received: 0
08:56:41: Out-of-order: 0
08:56:41: Duplicates: 0
08:56:41: ----------------------------------------------------
没有呈现音讯失落和乱序问题,但呈现了音讯反复。在两次测试中,别离呈现了 1000 条和 1090 条反复音讯。
当已读取整数值小于等于读取过的音讯时(即这是一条反复音讯),reader 会在 [test-name]-duplicates.txt 中记录音讯 ID、之前生产音讯的整数值、以后音讯的整数值。
test-output]$ cat iso-broker_duplicates.txt
run|last_msg_id|last_value|curr_msg_id|curr_value
3|(0,1643,999,-1)|1356634|(1,0,0,-1)|1355635
4|(0,1597,792,-1)|1220380|(1,0,0,-1)|1219588
4|(1,712,296,-1)|1811705|(2,0,0,-1)|1811409
第 3 次运行后果中只有一个 entry,也就是说反复音讯位于间断的音讯块中。整数值 1,355,635 第二次呈现于 ledger 1,entry 0 的第一条音讯中,音讯 ID 为(1,0,0,-1)。在 ledger 0 中,共追加了 1000 条音讯,在下一个 ledger 的结尾同样写入了这 1000 条音讯。
在第 4 次测试中也呈现了相似的状况。
第一个 broker 重启后,客户端从新发送音讯可能是造成音讯反复的起因。Broker 向 pulsar1 的 BK ensemble 发送 1000 条音讯,然而在向客户端发送 ack 前,因为与 ZooKeeper 断开,broker 呈现故障。而后,客户端从新连贯到新的 owner pulsar2,再次发送 1000 条音讯。Pulsar2 复原了上一个 ledger(即 pulsar1 领有的 ledger),在敞开此 ledger 后创立新 ledger,筹备接管新音讯。Broker 接管到了客户端发来的 1000 条音讯,并将这些音讯写入新 ledger。因而呈现了音讯反复。
咱们能够启用音讯去重测试上述可能的起因。启用音讯去重时,broker 将每个 producer 最初的序列号存储在哈希表中。如果收到的序列号小于所有现有序列号,则这条音讯为反复的音讯,不对其做任何解决。Broker 将对于 topic 的数据(producer、序列号)存储在游标中,当 broker 进行故障转移时,新 broker 将从新创立哈希表。因为哈希表会定期刷新,如果 broker 呈现故障,哈希表中的最新序列号可能会失落。如果新 broker 仅依赖于哈希表更新快照,则在 broker 进行故障转移时,很可能再次出现音讯反复。为了防止这一状况的产生,Pulsar 中的新 broker owner 从 ledger 读取最新的 N 条 entry,并追加这些 entry 到哈希表,以确保不会因为故障转移导致哈希表中数据不残缺。
测试场景 4 – 隔离 Topic Owner Broker 与 ZooKeeper,启用音讯去重
参数配置为 2-2-1 3-3,启用音讯去重。
除了启用音讯去重外,测试场景 4 与测试场景 3 完全相同。本次测试用于检测音讯去重是否能够避免音讯反复。
在 blockade.yml
文件中,为 broker 的 environment 增加 brokerDeduplicationEnabled
参数,启用音讯去重。
5 次测试的运行后果如下。
09:24:34 Start test
09:26:56:
09:26:56: Test Run #1 on topic iso-broker_1 ------------
09:29:27: Results --------------------------------------------
09:29:27: Final send count: 2000000
09:29:27: Final ack count: 2000000
09:29:27: Final positive ack count: 2000000
09:29:27: Final negative ack count: 0
09:29:27: Messages received: 2000000
09:29:27: Acked messages missing: 0
09:29:27: Non-acked messages received: 0
09:29:27: Out-of-order: 0
09:29:27: Duplicates: 0
09:29:27: ----------------------------------------------------
09:32:48:
09:32:48: Test Run #2 on topic iso-broker_2 ------------
09:35:06: Results --------------------------------------------
09:35:06: Final send count: 2000000
09:35:06: Final ack count: 2000000
09:35:06: Final positive ack count: 2000000
09:35:06: Final negative ack count: 0
09:35:06: Messages received: 2000000
09:35:06: Acked messages missing: 0
09:35:06: Non-acked messages received: 0
09:35:06: Out-of-order: 0
09:35:06: Duplicates: 0
09:35:06: ----------------------------------------------------
09:37:21:
09:37:21: Test Run #3 on topic iso-broker_3 ------------
09:39:21: Results --------------------------------------------
09:39:21: Final send count: 2000000
09:39:21: Final ack count: 2000000
09:39:21: Final positive ack count: 2000000
09:39:21: Final negative ack count: 0
09:39:21: Messages received: 2000000
09:39:21: Acked messages missing: 0
09:39:21: Non-acked messages received: 0
09:39:21: Out-of-order: 0
09:39:21: Duplicates: 0
09:39:21: ----------------------------------------------------
09:41:39:
09:41:39: Test Run #4 on topic iso-broker_4 ------------
09:43:51: Results --------------------------------------------
09:43:51: Final send count: 2000000
09:43:51: Final ack count: 2000000
09:43:51: Final positive ack count: 2000000
09:43:51: Final negative ack count: 0
09:43:51: Messages received: 2000000
09:43:51: Acked messages missing: 0
09:43:51: Non-acked messages received: 0
09:43:51: Out-of-order: 0
09:43:51: Duplicates: 0
09:43:51: ----------------------------------------------------
09:46:17:
09:46:17: Test Run #5 on topic iso-broker_5 ------------
09:48:53: Results --------------------------------------------
09:48:53: Final send count: 2000000
09:48:53: Final ack count: 2000000
09:48:53: Final positive ack count: 2000000
09:48:53: Final negative ack count: 0
09:48:53: Messages received: 2000000
09:48:53: Acked messages missing: 0
09:48:53: Non-acked messages received: 0
09:48:53: Out-of-order: 0
09:48:53: Duplicates: 0
09:48:53: ----------------------------------------------------
从测试后果能够看出,启用音讯去重能够解决测试场景 3 中呈现的音讯反复问题。
测试场景 5 – 隔离 Bookie 与 ZooKeeper
参数配置 2-2-1 3-3。
强调一下,bookie 不须要 ZooKeeper 进行读写,只须要在 ZooKeeper 中进行注册、垃圾回收、AutoRecovery。所以,隔离 bookie 与 ZooKeeper 应该不会产生任何影响。
首先在以后 ledger 的 ensemble 中辨认出一个 bookie,在音讯传输期间将其与 ZooKeeper 隔离。5 次测试后果如下。
$ cat iso-bookie_output.txt
12:03:52 Start test
12:06:10:
12:06:10: Test Run #1 on topic iso-bookie_1 ------------
12:08:18: Results --------------------------------------------
12:08:18: Final send count: 2000000
12:08:18: Final ack count: 2000000
12:08:18: Final positive ack count: 2000000
12:08:18: Final negative ack count: 0
12:08:18: Messages received: 2000000
12:08:18: Acked messages missing: 0
12:08:18: Non-acked messages received: 0
12:08:18: Out-of-order: 0
12:08:18: Duplicates: 0
12:08:18: ----------------------------------------------------
12:10:35:
12:10:35: Test Run #2 on topic iso-bookie_2 ------------
12:12:40: Results --------------------------------------------
12:12:40: Final send count: 2000000
12:12:40: Final ack count: 2000000
12:12:40: Final positive ack count: 2000000
12:12:40: Final negative ack count: 0
12:12:40: Messages received: 2000000
12:12:40: Acked messages missing: 0
12:12:40: Non-acked messages received: 0
12:12:40: Out-of-order: 0
12:12:40: Duplicates: 0
12:12:40: ----------------------------------------------------
12:14:55:
12:14:55: Test Run #3 on topic iso-bookie_3 ------------
12:16:54: Results --------------------------------------------
12:16:54: Final send count: 2000000
12:16:54: Final ack count: 2000000
12:16:54: Final positive ack count: 2000000
12:16:54: Final negative ack count: 0
12:16:54: Messages received: 2000000
12:16:54: Acked messages missing: 0
12:16:54: Non-acked messages received: 0
12:16:54: Out-of-order: 0
12:16:54: Duplicates: 0
12:16:54: ----------------------------------------------------
12:19:10:
12:19:10: Test Run #4 on topic iso-bookie_4 ------------
12:21:16: Results --------------------------------------------
12:21:16: Final send count: 2000000
12:21:16: Final ack count: 2000000
12:21:16: Final positive ack count: 2000000
12:21:16: Final negative ack count: 0
12:21:16: Messages received: 2000000
12:21:16: Acked messages missing: 0
12:21:16: Non-acked messages received: 0
12:21:16: Out-of-order: 0
12:21:16: Duplicates: 0
12:21:16: ----------------------------------------------------
12:23:36:
12:23:36: Test Run #5 on topic iso-bookie_5 ------------
12:25:39: Results --------------------------------------------
12:25:39: Final send count: 2000000
12:25:39: Final ack count: 2000000
12:25:39: Final positive ack count: 2000000
12:25:39: Final negative ack count: 0
12:25:39: Messages received: 2000000
12:25:39: Acked messages missing: 0
12:25:39: Non-acked messages received: 0
12:25:39: Out-of-order: 0
12:25:39: Duplicates: 0
12:25:39: ----------------------------------------------------
没有呈现音讯失落、反复和乱序。
测试场景 6 – 敞开多个 Bookie
参数配置 3-3-1 3-5。
除了冗余数为 3,敞开 2 个 bookie 外,测试场景 6 与测试场景 2 雷同。应用 Pulsar 时,bookie 故障次数不超过 Qw-1,就不会呈现数据失落。
能够应用 kill-bookies[n]
命令敞开以后 ensemble 中任意数量的 bookie。
单次运行的残缺输入如下。
$ python pulsar-test.py kill-bookies[2] kill-m-bookies 1 2000000 50000 3-3-1 3-5 false
Creating blockade cluster
Running test with config: E 3 Qw 3 Qa 1 with nodes bookie1 bookie2 bookie3 bookie4 bookie5 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 3-3-1
Test Run #1 on topic kill-m-bookies_1 ------------
-------------------------------------------------
WRITE PHASE
-------------------------------------------------
2018-10-20 22:41:21.932 INFO ConnectionPool:63 | Created connection for pulsar://192.168.176.11:6650
2018-10-20 22:41:21.932 INFO ClientConnection:285 | [192.168.176.1:47178 -> 192.168.176.11:6650] Connected to broker
2018-10-20 22:41:22.599 INFO BatchMessageContainer:41 | {BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1] [producerName_ =] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 0] [averageBatchSize = 0]} BatchMessageContainer constructed
2018-10-20 22:41:22.599 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1,] Getting connection from pool
2018-10-20 22:41:23.140 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-20 22:41:23.140 INFO ClientConnection:287 | [192.168.176.1:47182 -> 192.168.176.11:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-20 22:41:23.631 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1,] Created producer on broker [192.168.176.1:47182 -> 192.168.176.11:6650]
Send count: 53933 Ack count: 50000 Pos: 50000 Neg: 0
-------------------------------------------------
Identifing first 2 bookies in ensemble
-------------------------------------------------
Send count: 105567 Ack count: 100000 Pos: 100000 Neg: 0
Send count: 150962 Ack count: 150000 Pos: 150000 Neg: 0
Send count: 202992 Ack count: 200000 Pos: 200000 Neg: 0
Send count: 251576 Ack count: 250000 Pos: 250000 Neg: 0
Send count: 304059 Ack count: 300000 Pos: 300000 Neg: 0
Send count: 354476 Ack count: 350000 Pos: 350000 Neg: 0
Send count: 401795 Ack count: 400000 Pos: 400000 Neg: 0
Send count: 451880 Ack count: 450000 Pos: 450000 Neg: 0
Send count: 500607 Ack count: 500000 Pos: 500000 Neg: 0
Send count: 550300 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 600474 Ack count: 600000 Pos: 600000 Neg: 0
Send count: 650469 Ack count: 650000 Pos: 650000 Neg: 0
Send count: 702927 Ack count: 700000 Pos: 700000 Neg: 0
Send count: 751328 Ack count: 750000 Pos: 750000 Neg: 0
Send count: 805205 Ack count: 800000 Pos: 800000 Neg: 0
Send count: 853940 Ack count: 850000 Pos: 850000 Neg: 0
-------------------------------------------------
2 BOOKIES KILLED!
-------------------------------------------------
Send count: 901111 Ack count: 900000 Pos: 900000 Neg: 0
Send count: 950963 Ack count: 950000 Pos: 950000 Neg: 0
Send count: 1000561 Ack count: 1000000 Pos: 1000000 Neg: 0
Send count: 1053648 Ack count: 1050000 Pos: 1050000 Neg: 0
Send count: 1103723 Ack count: 1100000 Pos: 1100000 Neg: 0
Send count: 1155762 Ack count: 1150000 Pos: 1150000 Neg: 0
Send count: 1202638 Ack count: 1200000 Pos: 1200000 Neg: 0
Send count: 1253326 Ack count: 1250000 Pos: 1250000 Neg: 0
Send count: 1300821 Ack count: 1300000 Pos: 1300000 Neg: 0
Send count: 1355902 Ack count: 1350000 Pos: 1350000 Neg: 0
Send count: 1409752 Ack count: 1400000 Pos: 1400000 Neg: 0
Send count: 1451236 Ack count: 1450000 Pos: 1450000 Neg: 0
Send count: 1500961 Ack count: 1500000 Pos: 1500000 Neg: 0
Send count: 1554802 Ack count: 1550000 Pos: 1550000 Neg: 0
Send count: 1600702 Ack count: 1600000 Pos: 1600000 Neg: 0
Send count: 1652755 Ack count: 1650000 Pos: 1650000 Neg: 0
Send count: 1705403 Ack count: 1700000 Pos: 1700000 Neg: 0
Send count: 1757240 Ack count: 1750000 Pos: 1750000 Neg: 0
Send count: 1804954 Ack count: 1800000 Pos: 1800000 Neg: 0
Send count: 1850631 Ack count: 1850000 Pos: 1850000 Neg: 0
Send count: 1907237 Ack count: 1900000 Pos: 1900000 Neg: 0
Send count: 1951117 Ack count: 1950000 Pos: 1950000 Neg: 0
Send count: 2000000 Ack count: 2000000 Pos: 2000000 Neg: 0
2018-10-20 22:42:20.572 INFO ProducerImpl:467 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1, cluster-1-0-0] Closed producer
-------------------------------------------------
READ PHASE
-------------------------------------------------
2018-10-20 22:42:30.708 INFO ConnectionPool:63 | Created connection for pulsar://192.168.176.11:6650
2018-10-20 22:42:30.710 INFO ClientConnection:285 | [192.168.176.1:47374 -> 192.168.176.11:6650] Connected to broker
2018-10-20 22:42:30.715 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1, reader-b23b69, 0] Getting connection from pool
2018-10-20 22:42:31.028 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-20 22:42:31.028 INFO ClientConnection:287 | [192.168.176.1:47378 -> 192.168.176.11:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-20 22:42:31.110 INFO ConsumerImpl:168 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1, reader-b23b69, 0] Created consumer on broker [192.168.176.1:47378 -> 192.168.176.11:6650]
Last confirmed entry: [0, 2546]
Start reading from (0,0,0,-1)
Received: 50000 Curr Entry: [0, 62]
Received: 100000 Curr Entry: [0, 135]
Received: 150000 Curr Entry: [0, 203]
Received: 200000 Curr Entry: [0, 276]
Received: 250000 Curr Entry: [0, 341]
Received: 300000 Curr Entry: [0, 409]
Received: 350000 Curr Entry: [0, 470]
Received: 400000 Curr Entry: [0, 532]
Received: 450000 Curr Entry: [0, 599]
Received: 500000 Curr Entry: [0, 661]
Received: 550000 Curr Entry: [0, 724]
Received: 600000 Curr Entry: [0, 799]
Received: 650000 Curr Entry: [0, 866]
Received: 700000 Curr Entry: [0, 943]
Received: 750000 Curr Entry: [0, 1005]
Received: 800000 Curr Entry: [0, 1073]
Received: 850000 Curr Entry: [0, 1133]
Received: 900000 Curr Entry: [0, 1205]
Received: 950000 Curr Entry: [0, 1268]
Received: 1000000 Curr Entry: [0, 1340]
Received: 1050000 Curr Entry: [0, 1402]
Received: 1100000 Curr Entry: [0, 1460]
Received: 1150000 Curr Entry: [0, 1523]
Received: 1200000 Curr Entry: [0, 1588]
Received: 1250000 Curr Entry: [0, 1647]
Received: 1300000 Curr Entry: [0, 1705]
Received: 1350000 Curr Entry: [0, 1764]
Received: 1400000 Curr Entry: [0, 1825]
Received: 1450000 Curr Entry: [0, 1884]
Received: 1500000 Curr Entry: [0, 1941]
Received: 1550000 Curr Entry: [0, 1998]
Received: 1600000 Curr Entry: [0, 2064]
Received: 1650000 Curr Entry: [0, 2122]
Received: 1700000 Curr Entry: [0, 2184]
Received: 1750000 Curr Entry: [0, 2241]
Received: 1800000 Curr Entry: [0, 2295]
Received: 1850000 Curr Entry: [0, 2364]
Received: 1900000 Curr Entry: [0, 2425]
Received: 1950000 Curr Entry: [0, 2481]
Received: 2000000 Curr Entry: [0, 2546]
Read phase complete with message (0,2546,561,-1)
2018-10-20 22:44:15.596 INFO ConsumerImpl:761 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_1, reader-b23b69, 0] Closed consumer 0
Results --------------------------------------------
Final send count: 2000000
Final ack count: 2000000
Final positive ack count: 2000000
Final negative ack count: 0
Messages received: 2000000
Acked messages missing: 0
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
----------------------------------------------------
5 次测试的运行后果如下。
$ cat kill-m-bookies_output.txt
22:59:01 Start test
23:01:37:
23:01:37: Test Run #1 on topic kill-m-bookies_1 ------------
23:04:49: Results --------------------------------------------
23:04:49: Final send count: 2000000
23:04:49: Final ack count: 2000000
23:04:49: Final positive ack count: 2000000
23:04:49: Final negative ack count: 0
23:04:49: Messages received: 2000000
23:04:49: Acked messages missing: 0
23:04:49: Non-acked messages received: 0
23:04:49: Out-of-order: 0
23:04:49: Duplicates: 0
23:04:49: ----------------------------------------------------
23:07:26:
23:07:26: Test Run #2 on topic kill-m-bookies_2 ------------
23:09:23: Results --------------------------------------------
23:09:23: Final send count: 2000000
23:09:23: Final ack count: 2000000
23:09:23: Final positive ack count: 2000000
23:09:23: Final negative ack count: 0
23:09:23: Messages received: 2005107
23:09:23: Acked messages missing: 0
23:09:23: Non-acked messages received: 0
23:09:23: Out-of-order: 0
23:09:23: Duplicates: 5107
23:09:23: ----------------------------------------------------
23:11:56:
23:11:56: Test Run #3 on topic kill-m-bookies_3 ------------
23:14:48: Results --------------------------------------------
23:14:48: Final send count: 2000000
23:14:48: Final ack count: 2000000
23:14:48: Final positive ack count: 2000000
23:14:48: Final negative ack count: 0
23:14:48: Messages received: 2000000
23:14:48: Acked messages missing: 0
23:14:48: Non-acked messages received: 0
23:14:48: Out-of-order: 0
23:14:48: Duplicates: 0
23:14:48: ----------------------------------------------------
23:17:25:
23:17:25: Test Run #4 on topic kill-m-bookies_4 ------------
23:20:24: Results --------------------------------------------
23:20:24: Final send count: 2000000
23:20:24: Final ack count: 2000000
23:20:24: Final positive ack count: 2000000
23:20:24: Final negative ack count: 0
23:20:24: Messages received: 2000000
23:20:24: Acked messages missing: 0
23:20:24: Non-acked messages received: 0
23:20:24: Out-of-order: 0
23:20:24: Duplicates: 0
23:20:24: ----------------------------------------------------
23:23:05:
23:23:05: Test Run #5 on topic kill-m-bookies_5 ------------
23:25:03: Results --------------------------------------------
23:25:03: Final send count: 2000000
23:25:03: Final ack count: 2000000
23:25:03: Final positive ack count: 2000000
23:25:03: Final negative ack count: 0
23:25:03: Messages received: 2000000
23:25:03: Acked messages missing: 0
23:25:03: Non-acked messages received: 0
23:25:03: Out-of-order: 0
23:25:03: Duplicates: 0
23:25:03: ----------------------------------------------------
在一次测试中呈现了 5000 多条反复音讯。通过查看运行的残缺输入,我发现了和 producer id
相干的连贯谬误。
Send count: 557331 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 601885 Ack count: 600000 Pos: 600000 Neg: 0
2018-10-20 23:07:49.283 INFO HandlerBase:129 | [persistent://vanlightly/cluster-1/ns1/kill-m-bookies_2, cluster-1-2-0] Schedule reconnection in 0.1 s
2018-10-20 23:07:49.319 ERROR ClientConnection:638 | [192.168.208.1:58276 -> 192.168.208.11:6650] Got invalid producer Id in SendReceipt: 0 -- msg: 648684
2018-10-20 23:07:49.319 ERROR ClientConnection:638 | [192.168.208.1:58276 -> 192.168.208.11:6650] Got invalid producer Id in SendReceipt: 0 -- msg: 649684
2018-10-20 23:07:49.320 ERROR ClientConnection:638 | [192.168.208.1:58276 -> 192.168.208.11:6650] Got invalid producer Id in SendReceipt: 0 -- msg: 650684
2018-10-20 23:07:49.320 ERROR ClientConnection:638 | [192.168.208.1:58276 -> 192.168.208.11:6650] Got invalid producer Id in SendReceipt: 0 -- msg: 651684
2018-10-20 23:07:49.321 ERROR ClientConnection:638 | [192.168.208.1:58276 -> 192.168.208.11:6650] Got invalid producer Id in SendReceipt: 0 -- msg: 652684
2018-10-20 23:07:49.321 ERROR ClientConnection:638 | [192.168.208.1:58276 -> 192.168.208.11:6650] Got invalid producer Id in SendReceipt: 0 -- msg: 653047
具体起因还不能确定,可能与 bookie 故障相干。无论是否和 bookie 故障相干,启用音讯去重都能够无效解决相似的音讯反复问题。
没有呈现音讯失落或乱序。
如果敞开 Qw 个 bookie,会呈现怎么的后果呢?尽管这样的测试设定会损坏所有音讯正本,咱们还是来理论测试一下。
测试场景 7 – 敞开 Qw 个 Bookie
参数配置 2-2-1 3-5。
本次测试中,咱们敞开了组成以后 ledger 中 ensemble 的两个 bookie。
单次运行的残缺输入如下。
$ python pulsar-test.py kill-bookies[2] kill-qw-bookies 1 2000000 50000 2-2-1 3-5 false
Destroying blockade cluster
Creating blockade cluster
Running test with config: E 2 Qw 2 Qa 1 with nodes bookie1 bookie2 bookie3 bookie4 bookie5 proxy pulsar1 pulsar2 pulsar3 zk1
Creating tenant and namespace with configuration 2-2-1
Test Run #1 on topic kill-qw-bookies_1 ------------
-------------------------------------------------
WRITE PHASE
-------------------------------------------------
2018-10-20 23:46:24.587 INFO ConnectionPool:63 | Created connection for pulsar://172.20.0.11:6650
2018-10-20 23:46:24.588 INFO ClientConnection:285 | [172.20.0.1:40032 -> 172.20.0.11:6650] Connected to broker
2018-10-20 23:46:25.620 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-20 23:46:25.620 INFO ClientConnection:287 | [172.20.0.1:40036 -> 172.20.0.11:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-20 23:46:26.098 INFO ProducerImpl:155 | [persistent://vanlightly/cluster-1/ns1/kill-qw-bookies_1,] Created producer on broker [172.20.0.1:40036 -> 172.20.0.11:6650]
Send count: 56001 Ack count: 50000 Pos: 50000 Neg: 0
-------------------------------------------------
Identifing first 2 bookies in ensemble
-------------------------------------------------
Send count: 101422 Ack count: 100000 Pos: 100000 Neg: 0
Send count: 150870 Ack count: 150000 Pos: 150000 Neg: 0
Send count: 204342 Ack count: 200000 Pos: 200000 Neg: 0
Send count: 252210 Ack count: 250000 Pos: 250000 Neg: 0
Send count: 304805 Ack count: 300000 Pos: 300000 Neg: 0
Send count: 357891 Ack count: 350000 Pos: 350000 Neg: 0
Send count: 400852 Ack count: 400000 Pos: 400000 Neg: 0
Send count: 450256 Ack count: 450000 Pos: 450000 Neg: 0
Send count: 502910 Ack count: 500000 Pos: 500000 Neg: 0
Send count: 551386 Ack count: 550000 Pos: 550000 Neg: 0
Send count: 604204 Ack count: 600000 Pos: 600000 Neg: 0
Send count: 653056 Ack count: 650000 Pos: 650000 Neg: 0
Send count: 709835 Ack count: 700000 Pos: 700000 Neg: 0
-------------------------------------------------
2 BOOKIES KILLED!
-------------------------------------------------
Send count: 753606 Ack count: 750000 Pos: 750000 Neg: 0
Send count: 803318 Ack count: 800000 Pos: 800000 Neg: 0
Send count: 853127 Ack count: 850000 Pos: 850000 Neg: 0
Send count: 905068 Ack count: 900000 Pos: 900000 Neg: 0
Send count: 951923 Ack count: 950000 Pos: 950000 Neg: 0
Send count: 1004825 Ack count: 1000000 Pos: 1000000 Neg: 0
Send count: 1051416 Ack count: 1050000 Pos: 1050000 Neg: 0
Send count: 1102301 Ack count: 1100000 Pos: 1100000 Neg: 0
Send count: 1151166 Ack count: 1150000 Pos: 1150000 Neg: 0
Send count: 1205605 Ack count: 1200000 Pos: 1200000 Neg: 0
Send count: 1251999 Ack count: 1250000 Pos: 1250000 Neg: 0
Send count: 1306006 Ack count: 1300000 Pos: 1300000 Neg: 0
Send count: 1357391 Ack count: 1350000 Pos: 1350000 Neg: 0
Send count: 1406744 Ack count: 1400000 Pos: 1400000 Neg: 0
Send count: 1458207 Ack count: 1450000 Pos: 1450000 Neg: 0
Send count: 1509467 Ack count: 1500000 Pos: 1500000 Neg: 0
Send count: 1558339 Ack count: 1550000 Pos: 1550000 Neg: 0
Send count: 1603861 Ack count: 1600000 Pos: 1600000 Neg: 0
Send count: 1654170 Ack count: 1650000 Pos: 1650000 Neg: 0
Send count: 1702673 Ack count: 1700000 Pos: 1700000 Neg: 0
Send count: 1751781 Ack count: 1750000 Pos: 1750000 Neg: 0
Send count: 1803040 Ack count: 1800000 Pos: 1800000 Neg: 0
Send count: 1850616 Ack count: 1850000 Pos: 1850000 Neg: 0
Send count: 1906137 Ack count: 1900000 Pos: 1900000 Neg: 0
Send count: 1955752 Ack count: 1950000 Pos: 1950000 Neg: 0
Send count: 2000000 Ack count: 2000000 Pos: 2000000 Neg: 0
2018-10-20 23:47:28.728 INFO ProducerImpl:467 | [persistent://vanlightly/cluster-1/ns1/kill-qw-bookies_1, cluster-1-0-0] Closed producer
-------------------------------------------------
READ PHASE
-------------------------------------------------
2018-10-20 23:47:38.923 INFO ConnectionPool:63 | Created connection for pulsar://172.20.0.11:6650
2018-10-20 23:47:38.924 INFO ClientConnection:285 | [172.20.0.1:40216 -> 172.20.0.11:6650] Connected to broker
2018-10-20 23:47:38.931 INFO HandlerBase:53 | [persistent://vanlightly/cluster-1/ns1/kill-qw-bookies_1, reader-1b446f, 0] Getting connection from pool
2018-10-20 23:47:39.176 INFO ConnectionPool:63 | Created connection for pulsar://pulsar1:6650
2018-10-20 23:47:39.176 INFO ClientConnection:287 | [172.20.0.1:40220 -> 172.20.0.11:6650] Connected to broker through proxy. Logical broker: pulsar://pulsar1:6650
2018-10-20 23:47:39.276 INFO ConsumerImpl:168 | [persistent://vanlightly/cluster-1/ns1/kill-qw-bookies_1, reader-1b446f, 0] Created consumer on broker [172.20.0.1:40220 -> 172.20.0.11:6650]
Last confirmed entry: [0, 2506]
能够持续发送音讯。同时敞开两个 bookie 后,broker 将会通过新的 ensemble 创立新分片。本来有 5 个 bookie,敞开 2 个后,剩下的 3 个 bookie 将会和 E、2 个 Qw 一起负责新分片上的操作。
然而,因为 reader 会尝试从没有数据的分片读取数据,reader 会阻塞在 read_next()
上(除非重启已敞开的 bookie,且数据残缺)。
咱们能够在 ZooKeeper 中查看复制有余的 ledger。上面的 bash 脚本能够检测这些 ledger 的门路。
$ bash monitor-underrep.sh
[0000]
也能够查看第一个 ledger 的详细信息。
$ bash show-ledger.sh 00/0000/L0000
...(omitted a bunch of uniteresting output)
quorumSize: 2
ensembleSize: 2
length: 0
lastEntryId: -1
state: OPEN
segment {
ensembleMember: "172.20.0.3:3181"
ensembleMember: "172.20.0.5:3181"
firstEntryId: 0
}
segment {
ensembleMember: "172.20.0.7:3181"
ensembleMember: "172.20.0.4:3181"
firstEntryId: 949
}
segment {
ensembleMember: "172.20.0.7:3181"
ensembleMember: "172.20.0.6:3181"
firstEntryId: 959
}
segment {
ensembleMember: "172.20.0.7:3181"
ensembleMember: "172.20.0.6:3181"
firstEntryId: 1340
}
digestType: CRC32C
password: ""
ackQuorumSize: 1
Reader 阻塞的第一个分片上有两个 IP 地址:172.20.0.3
和 172.20.0.5
,别离对应于敞开的两个 bookie。
因为第一个分片上的数据已失落,因而无奈复原 ledger。如果已敞开的两个 bookie 上的数据已永恒失落,咱们只能跳转至 ID 为 949 的 entry 持续读取数据。
然而如果能够找回敞开的两个 bookie 中任意一个 bookie 上的数据,就能够通过 AutoRecovery 齐全复制分片且数据可用。咱们来模仿一下,在集群目录中应用 blockade start bookie1
命令即可启动 bookie1。
咱们留神到,reader 复活,生产 65 万条音讯后产生阻塞,直到达到咱们设置的 60 秒超时,测试完结。
Start reading from (0,0,0,-1)
Received: 50000 Curr Entry: [0, 66]
Received: 100000 Curr Entry: [0, 158]
Received: 150000 Curr Entry: [0, 246]
Received: 200000 Curr Entry: [0, 312]
Received: 250000 Curr Entry: [0, 382]
Received: 300000 Curr Entry: [0, 444]
Received: 350000 Curr Entry: [0, 501]
Received: 400000 Curr Entry: [0, 566]
Received: 450000 Curr Entry: [0, 635]
Received: 500000 Curr Entry: [0, 698]
Received: 550000 Curr Entry: [0, 761]
Received: 600000 Curr Entry: [0, 826]
Received: 650000 Curr Entry: [0, 895]
Read phase complete with message (0,946,478,-1)
2018-10-21 00:04:15.760 INFO ConsumerImpl:761 | [persistent://vanlightly/cluster-1/ns1/kill-qw-bookies_1, reader-1b446f, 0] Closed consumer 0
Results --------------------------------------------
Final send count: 2000000
Final ack count: 2000000
Final positive ack count: 2000000
Final negative ack count: 0
Messages received: 691566
Acked messages missing: 1308433
Non-acked messages received: 0
Out-of-order: 0
Duplicates: 0
----------------------------------------------------
Reader 读取的最初一条 entry ID 为 946,持续数 3 条就到第二个分片了。为什么 reader 在第一个分片的结尾处产生了阻塞?
当 Qa=1 时,同时敞开 ensemble 中的两个 bookie,最初 3 个 entry 长久化到 1 个 bookie 上。而这个 bookie 依然处于敞开状态,因而最初 3 个 entry 不可读。
咱们应用 reader.py
从 topic 读取数据。在第 65 万条音讯处呈现了阻塞。当初启用 bookie3:blockade start bookie3
。
因为对读操作利用了指数退却算法,在读取失败并复原读取时,指数退却算法会像 TCP 一样迟缓启动,逐步减少从 bookie 读取的批大小。reader.py 在生产到阻塞处时,跳过这一条音讯后生产速度有所降落,而后迅速生产了 200 万条音讯。
在 ZooKeeper 中复制有余的门路下,没再呈现 ledger 0000。实践上,在 ledger 实现复制后能够再次敞开 bookie3,因为 bookie1 中有第一个分片中的所有 entry,所以应该立刻启用 AutoRecovery 并从新复制 ledger 到第二个 bookie。
$ blockade kill bookie3
$ bash monitor-underrep.sh
[]
[]
[]
[]
[]
[]
[]
[]
[]
[0000]
[0000]
[0000]
[0000]
[]
[]
[] 示意不存在 ledger 复制有余。查看此门路须要 2 秒钟(运行 Docker、zookeeper-shell 等),确认 ledger 0000 复制有余一共须要 16 秒,但复制操作只需 8 秒钟。
再次查看 ledger,第一个分片上的 ensemble 应该有所变动:
$bash show-ledger.sh 00/0000/L0000
...
quorumSize: 2
ensembleSize: 2
length: 0
lastEntryId: -1
state: OPEN
segment {
ensembleMember: "172.20.0.6:3181"
ensembleMember: "172.20.0.4:3181"
firstEntryId: 0
}
segment {
ensembleMember: "172.20.0.7:3181"
ensembleMember: "172.20.0.4:3181"
firstEntryId: 949
}
segment {
ensembleMember: "172.20.0.7:3181"
ensembleMember: "172.20.0.6:3181"
firstEntryId: 959
}
segment {
ensembleMember: "172.20.0.7:3181"
ensembleMember: "172.20.0.6:3181"
firstEntryId: 1340
}
digestType: CRC32C
password: ""
ackQuorumSize: 1
Bookie3(172.20.0.5)不再属于第一个分片,AutoRecovery 也实现了相应的复制。
通过以上测试,咱们发现:
如果不想丢音讯,最多能够敞开 Qw-1 个 bookie。如果敞开 Qw 个 bookie,要保障不呈现数据失落,则须要复原这些 bookie 中的所有数据。
如果敞开 bookie 的数量等于 Qa 的数量,AutoRecovery 不能修复 ledger 复制有余的问题。当 Qa=1,敞开 1 个 bookie 时,AutoRecovery 不能修复 ledger。
这种状况下,BookKeeper 不能复原 ledger,Ivan Kelly 提出,“敞开 bookie 的操作有问题。敞开操作须要确认分片的最初一个 entry。当 Qa=1 时,如果 write quorum 中有 1 个 bookie 呈现故障,则无奈确认分片的最初一个 entry,因为该 bookie 可能曾经 ack 了写操作(严格来说,可能会 ack 写操作)。”
因而 Qa=1 的配置十分冒险,因为一旦有 1 个 bookie 呈现故障,就无奈修复 ledger 复制有余的问题。
测试总结
- 确认失落音讯总数:0
- 乱序音讯总数:0
- 所有测试中均未呈现音讯失落和音讯乱序。
- 在敞开 Qw 个 bookie 的测试中,同时复原 2 个 bookie 不会呈现数据失落。但这一场景不在上述总结范畴内。
结语
通过测试能够发现,敞开 broker 或将 broker 与 ZooKeeper 隔离都不会造成音讯失落,但会读取未 ack 的音讯(很常见,不可避免),如果未启用音讯去重,还会呈现音讯反复。即便在故障转移场景中,也能够应用音讯去重。下次我会进一步测试音讯去重。
Qw 大于等于 2 时,敞开 1 个 bookie 或将 bookie 与 ZooKeeper 隔离也不会造成音讯失落。
敞开 ensemble 中的全副 bookie 会阻塞 reader 直到复原 bookie。如果不能胜利复原 bookie,则会失落所有数据。失落全副数据正本是咱们最不想看到的状况。
最小值:
- Write Quorum(Qw)。1 个 bookie 宕机时,要确保数据不失落,Qw 必须大于等于 2。
- Ack Quorum(Qa)。1 个 bookie 宕机时,要通过 AutoRecovery 从新复制 ledger,Qa 必须大于等于 2。
- 至多须要默认值 E 个 bookie 才能够创立 reader 和 consumer,因为须要应用这个值创立游标。
总之,Apache Pulsar 功能强大,在不同配置状况下也不容易呈现数据失落、反复、乱序的问题。在后续测试中,仍有一些须要留神的中央,如在 broker 故障转移后,如果正在运行的音讯数量靠近外部期待音讯队列大小,producer 的速度会迅速降落。但如果仅思考是否呈现音讯失落和乱序,Apache Pulsar 白璧无瑕。
能够进一步测试的方向有以下几个:
- 音讯去重和 broker 故障转移
- 应用速度较慢的 producer 运行较长时间(几个小时),运行中每 5 分钟随机敞开 1 个节点(broker 或 bookie)。运行完结后,查看是否呈现音讯失落和乱序的问题。
- 发送大量音讯。在几个小时之内,每 5-10 分钟随机敞开 1 个 bookie。运行完结后,查看是否通过 AutoRecovery 保障复制了全副音讯,没有呈现音讯失落。
- 分区 topic – 所有类型的测试
- 在网络不稳固或网速较慢的条件下进行测试
相干浏览
- 译文|Apache Pulsar 如何保障音讯不丢不重?
- Pulsar 和 Kafka 基准测试:Pulsar 性能精准解析(完整版)
点击链接,为 Apache Pulsar 点赞!