如何正確衡量主從延遲時間（關於Seconds_Behind_Master和pt-heartbeat）

背景

主庫的worker線程在寫binlog的時候是併發工作的，而主庫的dump線程和從庫的IO線程都是單線程推拉binlog、特別是默認SQL線程是拿着relay log中的event逐一單線程回放的（5.6版本開啓slave_parallel_workers支持特定情況下的並行複製，5.7版本之後全面支持並行複製後在複製層面已極大改善了延時問題）。因此即使不考慮網絡延遲，主流MySQL版本在高併發的情況下，消費很可能趕不上生產，採用異步複製的從庫很有可能跟不上主庫的進度。

複製原理的回顧

（圖摘自愛可生---張沈波的PPT）

1）主庫 Binlog Dump線程在binlog有變化時，主動發送最新的binlog到從庫。

2）從庫 IO線程被動接收主庫傳來的binlog之後，記錄到從庫的relay log中，當沒有數據傳入的時候則會等待。與此同時SQL線程重放 relay log。

3）當從庫長時間未收到主庫傳來的數據，並且等待時間超過了slave_net_timeout定義的時間（默認3600秒）後，Slave_IO_Running的狀態將會置爲No。在此之後，每隔master-connect-retry定義的時間（默認60秒）將會嘗試重新連接，直到連接成功或超過重試次數master-retry-count（默認86400次）。

注：slave_net_timeout可以在配置文件中修改或set variable在線設置；而 --master-connect-retry、--master-retry-count 需要在change master to建立複製關係時提前指定。

可以再次看出在複製期間，無論是主庫或從庫負載高（特別是從庫落盤壓力大，關係到sync_binlog、innodb_flush_log_at_trx_commit的設置）或者是網絡傳輸慢（特別是跨機房的同步）等情況發生時，都會產生主從延遲，並且是不可避免的。如果要實現強一致性，可採用Semi-sync，但採用該plugin也無法保證持續強一致性（rpl_semi_sync_master_timeout會引起復制模式的降級）

顯然，延遲時間就變爲一個重要的監控點了。

Seconds_Behind_Master計算主從延時

先看下官方手冊的釋疑：

When the slave is actively processing updates, this field shows the difference between the current timestamp on the slave and the original timestamp logged on the master for the event currently being processed on the slave.

When no event is currently being processed on the slave, this value is 0.

In MySQL 5.6.9 and later, this field is NULL (undefined or unknown) if the slave SQL thread is not running, or if the SQL thread has consumed all of the relay log and the slave I/O thread is not running

簡言之，計算方法爲：

1、當SQL線程執行event時，從庫執行時刻的timestamp值減去該event上附帶的時間戳（當時主庫上的timestamp），這兩者的差值。

2、一旦SQL線程未在執行event，則SBM爲0

3、IO線程或SQL線程 is not running，則SBM爲NULL

假如有以下3種情況發生，雖然Seconds_Behind_Master仍然存在非NULL的值，但已經變得不準確

1、主從時間不一致（雖然引入了clock_diff_with_master，儘量調節時間差帶來的影響，但該值僅在從庫與主庫建立連接之時獲取，後續不再更新，若之後再去修改主從機的時間，該值就不可靠了）。

2、主從庫間網絡問題或者從庫IO線程未發現主庫的binlog dump 線程掛了，仍然在等待數據傳輸過來，SBM長時間持續爲零。

3、主庫有較大的binlog event執行前後，從庫上看到的SBM將會有大的跳動（監控圖中將很可能產生毛刺）

4、對於並行複製，SMB是基於Exec_Master_Log_Pos，不精準。

注：1的參數在源碼中的體現見參考文檔1，2和3的相關實驗見參考文檔2。

其實，手冊中都有相關注明如下：

1、This time difference computation works even if the master and slave do not have identical clock times, provided that the difference, computed when the slave I/O thread starts, remains constant from then on. Any changes—including NTP updates—can lead to clock skews that can make calculation of Seconds_Behind_Master less reliable.

2、A value of 0 for Seconds_Behind_Master can usually be interpreted as meaning that the slave has caught up with the master, but there are some cases where this is not strictly true. For example, this can occur if the network connection between master and slave is broken but the slave I/O thread has not yet noticed this—that is, slave_net_timeout has not yet elapsed.

3、It is also possible that transient values for Seconds_Behind_Master may not reflect the situation accurately. When the slave SQL thread has caught up on I/O, Seconds_Behind_Master displays 0; but when the slave I/O thread is still queuing up a new event, Seconds_Behind_Master may show a large value until the SQL thread finishes executing the new event. This is especially likely when the events have old timestamps; in such cases, if you execute SHOW SLAVE STATUS several times in a relatively short period, you may see this value change back and forth repeatedly between 0 and a relatively large value.

4、When using a multithreaded slave, you should keep in mind that this value is based on Exec_Master_Log_Pos, and so may not reflect the position of the most recently committed transaction.

以event數爲單位計算主從延時

下面以一個例子來說明：

step1、主庫執行show binary logs;查看主庫最新的binlog位置點

+------------------+--------------+
| Log_name | File_size |
+------------------+--------------+
| mysql-bin.000053 | 1002420123 |
| mysql-bin.000054 | 1212937421 |
+------------------+--------------+

step2、同一時刻，也在從庫上執行show slave status \G

Master_Log_File: mysql-bin.000053

Read_Master_Log_Pos: 671469325

Relay_Master_Log_File: mysql-bin.000053

Exec_Master_Log_Pos: 519854012

Seconds_Behind_Master: 3584

step3、計算該時刻的延遲情況

1）發現主庫當前binlog序號和從庫讀取的binlog序號不同，延遲比較大了；

2）從庫上，IO線程拿到的主庫的binlog最新的偏移量（Read_Master_Log_Pos）與SQL線程正在replay的位置點（偏移量）Exec_Master_Log_Pos 也有較大延遲；

3）所以，主從延遲的總量是當前序號的binlog的延遲量和還沒來得及收的新增序號的binlog的總和。即：（1002420123 - 519854012）+ 1212937421

以時間爲單位計算主從延時

第三方工具mk-heartbeat、pt-heartbeat可以幫到你。以下以pt-heartbeat爲例：

大致工作流程：

使用pt-heartbeat在主庫上維護了一個後臺進程，定時更新系統timestamp到heartbeat表中；

使用 --monitor 或 --check 參數連接從庫，比較從庫當前timestamp值和複製過去的heartbeat表中timestamp值，計算出差值

示例：

step1、主庫創建一個庫，用於存放heartbeat表，該庫將在下一步操作中指定爲 -D 的參數

step2、主庫開啓守護進程，更新被監控庫中的heartbeat表：

pt-heartbeat -D heartbeat --update -S[master_server_socket] -u[username] -p[user_passwd] --create-table --daemonize

進入被監控庫，可以發現新增了一個heartbeat表，定期update（默認1S）最新的timestamp信息到ts字段中

step3、在從庫上，執行

pt-heartbeat -D heartbeat --monitor -h[slave ip] -u[username] -p[user_passwd]

pt-heartbeat -D heartbeat --check -h[slave ip] -u[username] -p[user_passwd]

--monitor：持續監控

--check：僅做一次探測

--master-server-id=XXX：如果主庫的server id無法識別的話，需要手動指定

step4、簡單測試

從庫stop slave後執行 pt-heartbeat --monitor，反覆start slave、stop slave觀察結果

注：使用 pt-heartbeat --stop 關閉後臺進程。binlog中不再因此新產生SET TIMESTAMP記錄，但heartbeat表將不會被刪除。下次再開啓該後臺進行時，命令中不用添加 --create-table 選項了。

需要注意，根據“Remove this file to permit pt-heartbeat to run”的提示，後續再次開啓後臺進程需要先刪除該文件

更進一步，在Zabbix中添加相關監控

設置item注意：

1、shell可以採用：pt-heartbeat -D demo --check -h[slave ip] -u[username] -p[user_passwd]

2、Type of information：numeric(float)。

3、按需修改更新間隔：Update interval

爲了易於演示快速出告警，trigger中閾值定的很低（主從延遲大於2秒就告警）

（1）演示一、

嘗試stop slave一段時間後，再次start slave，如此反覆兩次觀察對應圖形變化情況

第二次start slave時：

（2）演示二、更新一個利用pt-slave-delay工具手工設置主從延遲時間，觀察Zabbix出圖情況：

[root@237_13 ~]# pt-slave-delay --user=root --password=123456 --delay 1m --run-time 10m --host=192.168.237.13
2017-08-04T09:44:24 slave running 0 seconds behind
2017-08-04T09:44:24 STOP SLAVE until 2017-08-04T09:45:24 at master position mysql-bin.000031/10650583
2017-08-04T09:45:24 no new binlog events
2017-08-04T09:46:24 START SLAVE until master 2017-08-04T09:45:24 mysql-bin.000031/10668583
2017-08-04T09:47:24 START SLAVE until master 2017-08-04T09:46:24 mysql-bin.000031/10686583
2017-08-04T09:48:24 START SLAVE until master 2017-08-04T09:47:24 mysql-bin.000031/10704583
2017-08-04T09:49:24 START SLAVE until master 2017-08-04T09:48:24 mysql-bin.000031/10722583
2017-08-04T09:50:24 START SLAVE until master 2017-08-04T09:49:24 mysql-bin.000031/10740583
2017-08-04T09:51:24 START SLAVE until master 2017-08-04T09:50:24 mysql-bin.000031/10758583
2017-08-04T09:52:24 START SLAVE until master 2017-08-04T09:51:24 mysql-bin.000031/10776583
2017-08-04T09:53:24 START SLAVE until master 2017-08-04T09:52:24 mysql-bin.000031/10794583
2017-08-04T09:54:24 START SLAVE until master 2017-08-04T09:53:24 mysql-bin.000031/10812583
2017-08-04T09:54:24 Setting slave to run normally
[root@237_13 ~]#