共计 9643 个字符,预计需要花费 25 分钟才能阅读完成。
作者:杨文
DBA,负责客户我的项目的需要与保护,会点数据库,不限于 MySQL、Redis、Cassandra、GreenPlum、ClickHouse、Elastic、TDSQL 等等。
本文起源:原创投稿
* 爱可生开源社区出品,原创内容未经受权不得随便应用,转载请分割小编并注明起源。
一、前情提要:
咱们晓得 cassandra 具备分区容错性和强一致性,然而当数据所在主机产生故障时,该主机对应的数据副本该何去何从呢?是否跟宿主机一样变得不可用呢?想晓得答案的话,就跟我一起往下看吧。
二、试验环境:
集群模式下跨数据中心:
数据中心 | 节点 IP | 种子节点 |
---|---|---|
DC1 | 10.186.60.61、10.186.60.7、10.186.60.118、10.186.60.67 | 10.186.60.61、10.186.60.7 |
DC2 | 10.186.60.53、10.186.60.65、10.186.60.94、10.186.60.68 | 10.186.60.53、10.186.60.65 |
首先一起来瞅一瞅节点退出集群过程中的 owns 变动:
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.186.60.7 88.29 KiB 16 46.0% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 69.07 KiB 16 37.7% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.25 KiB 16 34.2% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.186.60.65 69.04 KiB 16 41.4% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 83.18 KiB 16 41.7% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.186.60.67 74.01 KiB 16 24.7% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 88.29 KiB 16 27.5% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 83.16 KiB 16 28.9% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.25 KiB 16 30.3% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.186.60.65 83.17 KiB 16 27.7% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 83.18 KiB 16 29.8% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
UN 10.186.60.94 69.05 KiB 16 31.1% c8fa86e4-ee9a-4c62-b00b-d15edc967b9f rack2
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.186.60.67 74.01 KiB 16 21.4% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 88.29 KiB 16 25.2% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 83.16 KiB 16 27.1% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 83.19 KiB 16 28.9% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.186.60.68 88.55 KiB 16 21.6% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 83.17 KiB 16 24.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 83.18 KiB 16 25.4% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
UN 10.186.60.94 69.05 KiB 16 26.4% c8fa86e4-ee9a-4c62-b00b-d15edc967b9f rack2
能够看到,刚建设的集群,owns 的总和时刻放弃在 200%,但单个数据中心的 owns 不是 100%。
三、具体试验:
3.1、试验 1:
[cassandra@data01 ~]$ cqlsh 10.186.60.61 -u cassandra -p cassandra
CREATE KEYSPACE "dcdatabase" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1' : 4, 'dc2' : 4};
use dcdatabase;
create table test (id int, user_name varchar, primary key (id) );
insert into test (id,name) VALUES (1,'test1');
insert into test (id,name) VALUES (2,'test2');
insert into test (id,name) VALUES (3,'test3');
insert into test (id,name) VALUES (4,'test4');
insert into test (id,name) VALUES (5,'test5');
查看集群状态:
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.67 96.55 KiB 16 100.0% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 88.29 KiB 16 100.0% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 88.33 KiB 16 100.0% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.37 KiB 16 100.0% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.68 74.23 KiB 16 100.0% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 83.17 KiB 16 100.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 88.36 KiB 16 100.0% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
UN 10.186.60.94 74.23 KiB 16 100.0% c8fa86e4-ee9a-4c62-b00b-d15edc967b9f rack2
能够看到集群中,每个数据中心的 owns 都是 400%,合乎四正本的设置;
查看数据在节点上的散布状况:
[cassandra@data01 ~]$ nodetool getendpoints dcdatabase test 1
10.186.60.7
10.186.60.94
10.186.60.65
10.186.60.118
10.186.60.67
10.186.60.61
10.186.60.53
10.186.60.68
[cassandra@data03 ~]$ nodetool getendpoints dcdatabase test 5
10.186.60.67
10.186.60.94
10.186.60.7
10.186.60.53
10.186.60.65
10.186.60.118
10.186.60.61
10.186.60.68
能够看到集群数据分布在所有数据中心的所有节点上,合乎数据的散布原理。
测试并查看集群中呈现故障节点后的数据分布状况:
94 机器敞开服务:systemctl stop cassandra
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.67 96.55 KiB 16 100.0% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 88.29 KiB 16 100.0% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 88.33 KiB 16 100.0% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.37 KiB 16 100.0% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.68 74.23 KiB 16 100.0% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 83.17 KiB 16 100.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 88.36 KiB 16 100.0% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
DN 10.186.60.94 74.23 KiB 16 100.0% c8fa86e4-ee9a-4c62-b00b-d15edc967b9f rack2
能够看到,94 节点曾经宕掉,然而 dc2 数据中心的 owns 散布并未扭转。
查看数据分布在哪个节点:
[cassandra@data01 ~]$ nodetool getendpoints dcdatabase test 5
10.186.60.67
10.186.60.94
10.186.60.7
10.186.60.53
10.186.60.65
10.186.60.118
10.186.60.61
10.186.60.68
能够看到,数据仍散布在 94 节点上;
把故障节点 94 移除集群:
[cassandra@data02 ~]$ nodetool removenode c8fa86e4-ee9a-4c62-b00b-d15edc967b9f
查看集群状态:
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.67 96.55 KiB 16 100.0% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 88.29 KiB 16 100.0% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 88.33 KiB 16 100.0% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.37 KiB 16 100.0% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.68 74.23 KiB 16 100.0% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 83.17 KiB 16 100.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 88.36 KiB 16 100.0% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
[cassandra@data02 ~]$ nodetool getendpoints dcdatabase test 5
10.186.60.67
10.186.60.7
10.186.60.53
10.186.60.65
10.186.60.118
10.186.60.61
10.186.60.68
能够看到,数据不在 94 节点上了;
阐明:对于 cassandra 进行服务或移出集群,仍是能够应用的,只是不能登入本人的 cassandra 数据库,但仍能够登录其余 cassandra 数据库。
3.2、试验 2:
CREATE KEYSPACE "dcdatabase" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3};
查看集群状态:
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.67 96.55 KiB 16 73.2% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 89.39 KiB 16 74.7% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 88.33 KiB 16 77.4% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.42 KiB 16 74.7% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.68 74.22 KiB 16 100.0% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 84.14 KiB 16 100.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 88.30 KiB 16 100.0% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
能够看到集群中,每个数据中心的 owns 都是 300%,合乎三正本的设置;
测试并查看集群中呈现故障节点后的数据分布状况:
94 机器敞开服务,并移除集群:
[cassandra@data02 ~]$ nodetool removenode c8fa86e4-ee9a-4c62-b00b-d15edc967b9f
查看集群状态:
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.67 96.55 KiB 16 73.2% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 89.39 KiB 16 74.7% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 88.33 KiB 16 77.4% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.42 KiB 16 74.7% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.68 74.22 KiB 16 100.0% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 84.14 KiB 16 100.0% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 88.30 KiB 16 100.0% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
此时,数据不在 94 节点上了,故障节点上的数据已挪动到其余节点上,因而能够看到,在 dc1 数据中心中,数据随机仍只散布在其中三个节点上,而 dc2 数据中心的数据将散布在了仅有的三个节点上,产生了数据转移;
如果此时 dc2 数据中心还有节点持续故障,那么故障节点上的数据不可能再挪动到其余节点上了,dc1 是不变的,owns 还是 300%,然而 dc2 的 owns 都是 100%,没方法故障转移了,只能存在本身的数据了;
此时重启所有主机,所有主机 Cassandra 服务都会开启,包含之前故障模拟的节点也会自启,那么此时就会达到了另一种成果:故障模拟节点后的状态,再增加到了集群中,那么此时数据又会进行了主动的散发。
查看集群状态:
[cassandra@data01 ~]$ nodetool status
Datacenter: dc1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.67 96.55 KiB 16 73.2% 9d6d759b-c00c-488b-938d-3e1ef9b92b02 rack1
UN 10.186.60.7 89.39 KiB 16 74.7% 4702178e-9878-48dc-97e7-9211b7c9f2e7 rack1
UN 10.186.60.118 88.33 KiB 16 77.4% c920c611-2e8b-472d-93a4-34f1abd5b207 rack1
UN 10.186.60.61 88.42 KiB 16 74.7% af2e0c42-3a94-4647-9716-c484b690899i rack1
Datacenter: dc2
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens owns (effective) Host ID Rack
UN 10.186.60.68 74.22 KiB 16 73.2% a7307228-62bb-4354-9853-990cac9614ab rack2
UN 10.186.60.65 84.14 KiB 16 74.7% 89683bf8-aff8-4fdc-9525-c14764cf2d4f rack2
UN 10.186.60.53 88.30 KiB 16 74.7% 7c91c707-abac-44f2-811O-b18f03f03d13 rack2
UN 10.186.60.94 90.12 KiB 16 77.4% c8fa86e4-ee9a-4c62-b00b-d15edc967b9f rack2