本文解说了 MySQL 8 在复制观测性上带来更丰盛的观测信息。

作者:Frederic Descamps MySQL 社区经理

本文起源:Oracle MySQL 官网博客

  • 爱可生开源社区出品。

许多经验丰富的 MySQL DBA 都应用过 SHOW REPLICA STATUS 输入中的 Seconds_Behind_Source 来判断(异步)复制的运行状态。

留神:这里应用新的术语 REPLICA。我置信所有人都应用过旧的术语。

然而,MySQL 复制机制曾经有很大提高,复制性能团队也做了很多工作,可能为 MySQL 中所有可用的复制模式提供的更丰盛观测信息。

例如,咱们减少了并行复制,组复制等,然而这些信息都无奈从原来的 SHOW REPLICA STATUS 输入中看到。

零碎库 Performance_Schema 提供了比 SHOW REPLICA STATUS 更好的监控和察看复制过程的形式。

以后,Performance_Schema 中有 15 张表用于记录复制信息量化:

+------------------------------------------------------+| Tables_in_performance_schema (replication%)          |+------------------------------------------------------+| replication_applier_configuration                    || replication_applier_filters                          || replication_applier_global_filters                   || replication_applier_status                           || replication_applier_status_by_coordinator            || replication_applier_status_by_worker                 || replication_asynchronous_connection_failover         || replication_asynchronous_connection_failover_managed || replication_connection_configuration                 || replication_connection_status                        || replication_group_communication_information          || replication_group_configuration_version              || replication_group_member_actions                     || replication_group_member_stats                       || replication_group_members                            |+------------------------------------------------------+15 rows in set (0.0038 sec)

但不容置疑,了解这些指标的含意并找出对咱们 MySQL DBA 真正有意义的信息也不总是那么容易:例如,正本与数据源是否有提早?

我筹备了一些视图,能够装置在 sys schema 中,利用这些指标为咱们 DBA 提供相干信息。

咱们来更认真地看一下这些视图。

复制提早

select * from sys.replication_lag;+---------------------------+-----------------------+------------------------+| channel_name              | max_lag_from_original | max_lag_from_immediate |+---------------------------+-----------------------+------------------------+| clusterset_replication    | 00:00:04.963223       | 00:00:04.940782        || group_replication_applier | 0                     | 0                      |+---------------------------+-----------------------+------------------------+

从下面的输入中,咱们能够看到此实例是一个异步复制的正本,但它也是组复制集群的一部分。

事实上,这是 InnoDB ClusterSet 中 DR 集群的次要成员。

咱们还能够看到这个正本的早退差不多有 5 秒。

而后,咱们会看到复制通道的名称,以及与原始提交者和间接源(在级联复制的状况下)的最大提早(因为在并行复制的状况下可能有多个工作线程)。

在组复制集群(InnoDB Cluster)的 Secondary 节点上,咱们能够看到以下输入:

select * from sys.replication_lag;+----------------------------+-----------------------+------------------------+| channel_name               | max_lag_from_original | max_lag_from_immediate |+----------------------------+-----------------------+------------------------+| group_replication_recovery | null                  | null                   || group_replication_applier  | 00:00:02.733008       | 00:00:02.733008        |+----------------------------+-----------------------+------------------------+

咱们能够看到,用于复原的通道(当节点退出组时读取失落的二进制日志事件、事务)没有被应用,而组复制的应用程序绝对源节点稍有滞后。

复制状态

这个视图更残缺,每个工作线程都有一行。

以咱们 DR 站点 InnoDB ClusterSet 中的 Primary 节点为例:

select * from replication_status;+-------------------------------+----------+----------+---------+-------------------+--------------------+| channel                       | io_state | co_state | w_state | lag_from_original | lag_from_immediate |+-------------------------------+----------+----------+---------+-------------------+--------------------+| group_replication_applier (1) | ON       | ON       | ON      | none              | none               || group_replication_applier (2) | ON       | ON       | ON      | none              | none               || group_replication_applier (3) | ON       | ON       | ON      | none              | none               || group_replication_applier (4) | ON       | ON       | ON      | none              | none               || clusterset_replication (1)    | ON       | ON       | ON      | 00:00:15.395870   | 00:00:15.380884    || clusterset_replication (2)    | ON       | ON       | ON      | 00:00:15.395686   | 00:00:15.380874    || clusterset_replication (3)    | ON       | ON       | ON      | 00:00:15.411204   | 00:00:15.388451    || clusterset_replication (4)    | ON       | ON       | ON      | 00:00:15.406154   | 00:00:15.388434    |+-------------------------------+----------+----------+---------+-------------------+--------------------+

能够看到,集群并行(异步)复制应用 4 个并行工作线程。

你可能留神到有 3 种状态(所有都为 ON)。但应用 SHOW REPLICA STATUS 咱们只能看到:

       Replica_IO_Running: Yes      Replica_SQL_Running: Yes

通过并行复制,咱们在利用 binlog 事件期间有另一个线程参加复制:coordinator 线程。

残缺复制状态信息

当然,咱们也能够获取更多对于复制的详细信息。

来看一个后果例子:

select * from sys.replication_status_full\G*************************** 1. row ***************************                 channel: group_replication_applier (1)                    host: <NULL>                    port: 0                    user:             source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e              group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8elast_heartbeat_timestamp: 0000-00-00 00:00:00.000000      heartbeat_interval: 30                io_state: ON         io_thread_state: NULL                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Replica has read all relay log; waiting for more updates                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: Waiting for an event from Coordinator                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 03:36:40.474223      applier_busy_state: IDLE       lag_from_original: none      lag_from_immediate: none          transport_time: 1.80 us       time_to_relay_log: 12.00 us              apply_time: 784.00 uslast_applied_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3 last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3queued_gtid_set_to_apply:*************************** 2. row ***************************                 channel: group_replication_applier (2)                    host: <NULL>                    port: 0                    user:             source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e              group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8elast_heartbeat_timestamp: 0000-00-00 00:00:00.000000      heartbeat_interval: 30                io_state: ON         io_thread_state: NULL                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Replica has read all relay log; waiting for more updates                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: Waiting for an event from Coordinator                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 03:36:40.474223      applier_busy_state: IDLE       lag_from_original: none      lag_from_immediate: none          transport_time: 1.80 us       time_to_relay_log: 12.00 us              apply_time:   0 pslast_applied_transaction: last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3queued_gtid_set_to_apply:*************************** 3. row ***************************                 channel: group_replication_applier (3)                    host: <NULL>                    port: 0                    user:             source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e              group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8elast_heartbeat_timestamp: 0000-00-00 00:00:00.000000      heartbeat_interval: 30                io_state: ON         io_thread_state: NULL                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Replica has read all relay log; waiting for more updates                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: Waiting for an event from Coordinator                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 03:36:40.474223      applier_busy_state: IDLE       lag_from_original: none      lag_from_immediate: none          transport_time: 1.80 us       time_to_relay_log: 12.00 us              apply_time:   0 pslast_applied_transaction: last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3queued_gtid_set_to_apply:*************************** 4. row ***************************                 channel: group_replication_applier (4)                    host: <NULL>                    port: 0                    user:             source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e              group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8elast_heartbeat_timestamp: 0000-00-00 00:00:00.000000      heartbeat_interval: 30                io_state: ON         io_thread_state: NULL                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Replica has read all relay log; waiting for more updates                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: Waiting for an event from Coordinator                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 03:36:40.474223      applier_busy_state: IDLE       lag_from_original: none      lag_from_immediate: none          transport_time: 1.80 us       time_to_relay_log: 12.00 us              apply_time:   0 pslast_applied_transaction: last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3queued_gtid_set_to_apply:*************************** 5. row ***************************                 channel: clusterset_replication (1)                    host: 127.0.0.1                    port: 3310                    user: mysql_innodb_cs_b0adbc6c             source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e              group_name:last_heartbeat_timestamp: 2023-08-22 18:48:41.037817      heartbeat_interval: 30                io_state: ON         io_thread_state: Waiting for source to send event                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Waiting for replica workers to process their queues                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: waiting for handler commit                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 00:00:00.001134      applier_busy_state: APPLYING       lag_from_original: 00:00:01.799071      lag_from_immediate: 00:00:01.783404          transport_time: 2.26 ms       time_to_relay_log: 19.00 us              apply_time: 14.63 mslast_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105180 last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547*************************** 6. row ***************************                 channel: clusterset_replication (2)                    host: 127.0.0.1                    port: 3310                    user: mysql_innodb_cs_b0adbc6c             source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e              group_name:last_heartbeat_timestamp: 2023-08-22 18:48:41.037817      heartbeat_interval: 30                io_state: ON         io_thread_state: Waiting for source to send event                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Waiting for replica workers to process their queues                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: waiting for handler commit                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 00:00:00.001134      applier_busy_state: APPLYING       lag_from_original: 00:00:01.797743      lag_from_immediate: 00:00:01.783390          transport_time: 2.26 ms       time_to_relay_log: 19.00 us              apply_time: 21.47 mslast_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105181 last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547*************************** 7. row ***************************                 channel: clusterset_replication (3)                    host: 127.0.0.1                    port: 3310                    user: mysql_innodb_cs_b0adbc6c             source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e              group_name:last_heartbeat_timestamp: 2023-08-22 18:48:41.037817      heartbeat_interval: 30                io_state: ON         io_thread_state: Waiting for source to send event                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Waiting for replica workers to process their queues                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: waiting for handler commit                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 00:00:00.001134      applier_busy_state: APPLYING       lag_from_original: 00:00:01.786087      lag_from_immediate: 00:00:01.767563          transport_time: 2.26 ms       time_to_relay_log: 19.00 us              apply_time: 21.58 mslast_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105182 last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547*************************** 8. row ***************************                 channel: clusterset_replication (4)                    host: 127.0.0.1                    port: 3310                    user: mysql_innodb_cs_b0adbc6c             source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e              group_name:last_heartbeat_timestamp: 2023-08-22 18:48:41.037817      heartbeat_interval: 30                io_state: ON         io_thread_state: Waiting for source to send event                io_errno: 0               io_errmsg:              io_errtime: 0000-00-00 00:00:00.000000                co_state: ON         co_thread_state: Waiting for replica workers to process their queues                co_errno: 0               co_errmsg:              co_errtime: 0000-00-00 00:00:00.000000                 w_state: ON          w_thread_state: waiting for handler commit                 w_errno: 0                w_errmsg:               w_errtime: 0000-00-00 00:00:00.000000 time_since_last_message: 00:00:00.001134      applier_busy_state: APPLYING       lag_from_original: 00:00:01.785881      lag_from_immediate: 00:00:01.767550          transport_time: 2.26 ms       time_to_relay_log: 19.00 us              apply_time: 29.59 mslast_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105183 last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547

通过这个视图,咱们能够取得更多详细信息,例如复制心跳。咱们还对 GTID(排队、利用等)进行了概述。

咱们也能够看到传输工夫(网络)、写入中继日志的工夫以及最终利用的工夫等信息。

当然,您能够依据须要来抉择视图的显示内容,例如:

InnoDB 集群,ClusterSet,只读正本

如果应用由 Admin API 和 MySQL Shell 治理的集成解决方案,那么所有这些信息曾经通过 status() 办法能够获知。

status() 办法能够扩大 3 个值:

  1. 返回集群元数据版本、组协定版本、组名称、集群成员 UUID、成员角色和状态(由组复制报告),及被隔离的零碎变量列表。
  2. 返回每个连贯和利用程序处理的事务信息。
  3. 返回每个集群成员复制机制更具体的统计信息。

上面来看一下扩大选项 3 的 ClusterSet 示例:

JS> cs.status({extended:3}){    "clusters": {        "cluster2": {            "clusterRole": "REPLICA",            "clusterSetReplication": {                "applierQueuedTransactionSet": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e:137385-138500",                "applierQueuedTransactionSetSize": 1116,                "applierState": "ON",                "applierStatus": "APPLYING",                "applierThreadState": "waiting for handler commit",                "applierWorkerThreads": 4,                "coordinatorState": "ON",                "coordinatorThreadState": "Waiting for replica workers to process their queues",                "options": {                    "connectRetry": 3,                    "delay": 0,                    "heartbeatPeriod": 30,                    "retryCount": 10                },                "receiver": "127.0.0.1:4420",                "receiverStatus": "ON",                "receiverThreadState": "Waiting for source to send event",                "receiverTimeSinceLastMessage": "00:00:00.002737",                "replicationSsl": null,                "source": "127.0.0.1:3310"            },            "clusterSetReplicationStatus": "OK",            "communicationStack": "MYSQL",            "globalStatus": "OK",            "groupName": "7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e",            "groupViewChangeUuid": "7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e",            "paxosSingleLeader": "OFF",            "receivedTransactionSet": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e:129-138500",            "ssl": "REQUIRED",            "status": "OK_NO_TOLERANCE",            "statusText": "Cluster is NOT tolerant to any failures.",            "topology": {                "127.0.0.1:4420": {                    "address": "127.0.0.1:4420",                    "applierWorkerThreads": 4,                    "fenceSysVars": [                        "read_only",                        "super_read_only"                    ],                    "memberId": "c3d726ac-40ec-11ee-ab38-c8cb9e32df8e",                    "memberRole": "PRIMARY",                    "memberState": "ONLINE",                    "mode": "R/O",                    "readReplicas": {},                    "replicationLagFromImmediateSource": "00:00:05.420247",                    "replicationLagFromOriginalSource": "00:00:05.433548",                    "role": "HA",                    "status": "ONLINE",                    "version": "8.1.0"                },                "127.0.0.1:4430": {                    "address": "127.0.0.1:4430",                    "applierWorkerThreads": 4,                    "fenceSysVars": [                        "read_only",                        "super_read_only"                    ],                    "memberId": "709b15ea-40ed-11ee-a9b3-c8cb9e32df8e",                    "memberRole": "SECONDARY",                    "memberState": "ONLINE",                    "mode": "R/O",                    "readReplicas": {},                    "replicationLagFromImmediateSource": "00:00:00.038075",                    "replicationLagFromOriginalSource": "00:00:05.432536",                    "role": "HA",                    "status": "ONLINE",                    "version": "8.1.0"                }            },            "transactionSet": "2cb77a02-40eb-11ee-83f4-c8cb9e32df8e:1-4,54d83026-40eb-11ee-a5d3-c8cb9e32df8e:1-137384,54d8329c-40eb-11ee-a5d3-c8cb9e32df8e:1-5,7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:1-3",            "transactionSetConsistencyStatus": "OK",            "transactionSetErrantGtidSet": "",            "transactionSetMissingGtidSet": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e:137385-138552"        },        "myCluster": {            "clusterRole": "PRIMARY",            "communicationStack": "MYSQL",            "globalStatus": "OK",            "groupName": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e",            "groupViewChangeUuid": "54d8329c-40eb-11ee-a5d3-c8cb9e32df8e",            "paxosSingleLeader": "OFF",            "primary": "127.0.0.1:3310",            "ssl": "REQUIRED",            "status": "OK",            "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",            "topology": {                "127.0.0.1:3310": {                    "address": "127.0.0.1:3310",                    "applierWorkerThreads": 4,                    "fenceSysVars": [],                    "memberId": "2cb77a02-40eb-11ee-83f4-c8cb9e32df8e",                    "memberRole": "PRIMARY",                    "memberState": "ONLINE",                    "mode": "R/W",                    "readReplicas": {                        "127.0.0.1:4410": {                            "address": "127.0.0.1:4410",                            "applierStatus": "APPLYING",                            "applierThreadState": "waiting for handler commit",                            "applierWorkerThreads": 4,                            "receiverStatus": "ON",                            "receiverThreadState": "Waiting for source to send event",                            "replicationSources": [                                "PRIMARY"                            ],                            "replicationSsl": "TLS_AES_256_GCM_SHA384 TLSv1.3",                            "role": "READ_REPLICA",                            "status": "ONLINE",                            "version": "8.1.0"                        }                    },                    "role": "HA",                    "status": "ONLINE",                    "version": "8.1.0"                },                "127.0.0.1:3320": {                    "address": "127.0.0.1:3320",                    "applierWorkerThreads": 4,                    "fenceSysVars": [                        "read_only",                        "super_read_only"                    ],                    "memberId": "327cb102-40eb-11ee-9904-c8cb9e32df8e",                    "memberRole": "SECONDARY",                    "memberState": "ONLINE",                    "mode": "R/O",                    "readReplicas": {},                    "replicationLagFromImmediateSource": "00:00:04.536190",                    "replicationLagFromOriginalSource": "00:00:04.536190",                    "role": "HA",                    "status": "ONLINE",                    "version": "8.1.0"                },                "127.0.0.1:3330": {                    "address": "127.0.0.1:3330",                    "applierWorkerThreads": 4,                    "fenceSysVars": [                        "read_only",                        "super_read_only"                    ],                    "memberId": "3d141d7e-40eb-11ee-933b-c8cb9e32df8e",                    "memberRole": "SECONDARY",                    "memberState": "ONLINE",                    "mode": "R/O",                    "readReplicas": {},                    "replicationLagFromImmediateSource": "00:00:04.652745",                    "replicationLagFromOriginalSource": "00:00:04.652745",                    "role": "HA",                    "status": "ONLINE",                    "version": "8.1.0"                }            },            "transactionSet": "2cb77a02-40eb-11ee-83f4-c8cb9e32df8e:1-4,54d83026-40eb-11ee-a5d3-c8cb9e32df8e:1-138552,54d8329c-40eb-11ee-a5d3-c8cb9e32df8e:1-5"        }    },    "domainName": "myClusterSet",    "globalPrimaryInstance": "127.0.0.1:3310",    "metadataServer": "127.0.0.1:3310",    "primaryCluster": "myCluster",    "status": "HEALTHY",    "statusText": "All Clusters available."}

论断

复制局部的可察看性十分具体,并通过 MySQL 8 提供了大量信息。兴许当初是扭转查看或监督复制形式的好时机。