MySQL：从库出现system lock的原因

水平有限有误请谅解。

本文为笔者2年前写一篇说明性文章，发现很多同学都在问这个问题，因此做一次分享。

本文基于5.7.17源码
本文只考虑row 格式binlog
主要考虑DML语句，DDL语句比较简单不做考虑
以单sql线程为例（非MTS）

如果要系统的学习主从原理可以参考我的《深入理解MySQL主从原理 32讲》。

一、延迟的计算方式

其实每次show slave status命令的时候后台会调用函数show_slave_status_send_data进行及时计算，这个延迟并不是保存在哪里的。栈帧如下：

#0  show_slave_status_send_data (thd=0x7fffd8000cd0, mi=0x38ce2e0, io_gtid_set_buffer=0x7fffd800eda0 "e859a28b-b66d-11e7-8371-000c291f347d:42-100173", 
    sql_gtid_set_buffer=0x7fffd8011ac0 "e859a28b-b66d-11e7-8371-000c291f347d:1-100173") at /MySQL/MySQL-5.7.17/sql/rpl_slave.cc:3602
#1  0x0000000001867749 in show_slave_status (thd=0x7fffd8000cd0) at /MySQL/MySQL-5.7.17/sql/rpl_slave.cc:3982
#2  0x0000000001867bfa in show_slave_status_cmd (thd=0x7fffd8000cd0) at /MySQL/MySQL-5.7.17/sql/rpl_slave.cc:4102

其计算方式基本就是这段代码

time_diff= ((long)(time(0) - mi->rli->last_master_timestamp) - mi->clock_diff_with_master);

稍微解释一下：

time(0) ：取当前slave服务器系统时间。
mi->rli->last_master_timestamp：是event common header中timestamp的时间+exetime，其中exetime只有query event才有，其他全部是0,这也导致了binlog row格式下的延迟最大基本是(2 乘以主库的执行的时间)，但是DDL的语句包含在query event中索引延迟最大基本就是(1 乘以主库执行时间)
mi->clock_diff_with_master：这是从库和主库时间的差值。

这里我们也看到event中common header中的timestamp和slave本地时间起了决定因素。因为每次发起命令time(0)都会增加，所以即便event中common header中的timestamp的时间不变延迟也是不断加大的。

另外还有一段有名的伪代码如下：

 /*
     The pseudo code to compute Seconds_Behind_Master:
     if (SQL thread is running)
     {
       if (SQL thread processed all the available relay log)
       {
         if (IO thread is running)
            print 0;
         else
            print NULL;
       }
        else
          compute Seconds_Behind_Master;
      }
      else
       print NULL;
  */

其实他也来自函数 show_slave_status_send_data，有兴趣的自己在看看，我就不过多解释了。

二、Binlog写入Binlog文件时间和event生成的时间

我发现有朋友这方面有疑问就做个简单的解释：

binlog真正从binglog cache/tmp file写入binlog文件是在commit的flush阶段然后sync阶段才落盘。
event生成是在语句执行期间，具体各个event生成时间如下：

如果没有显示开启事物，Gtid event/query event/map event/dml event/xid event均是命令发起时间。
如果显示开始事物 Gtid event/xid event是commit命令发起时间，其他event是dml语句发起时间。

所以binlog Event写入到binlog文件和Event的中的时间没有什么联系。下面是一个小事物典型的event生命周期，这是工具infobin生成的：

>Gtid Event:Pos:234(0Xea) N_pos:299(0X12b) Time:1513135186 Event_size:65(bytes) 
Gtid:31704d8a-da74-11e7-b6bf-52540a7d243:100009 last_committed=0  sequence_number=1
-->Query Event:Pos:299(0X12b) N_Pos:371(0X173) Time:1513135186 Event_size:72(bytes) 
Exe_time:0  Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:100009
---->Map Event:Pos371(0X173) N_pos:415(0X19f) Time:1513135186 Event_size:44(bytes) 
TABLE_ID:108 DB_NAME:test TABLE_NAME:a Gno:100009
------>Insert Event:Pos:415(0X19f) N_pos:455(0X1c7) Time:1513135186 Event_size:40(bytes) 
Dml on table: test.a  table_id:108 Gno:100009 
>Xid Event:Pos:455(0X1c7) N_Pos:486(0X1e6) Time:1513135186 Event_size:31(bytes) 
COMMIT; /*!Trx end*/ Gno:100009

三、造成延迟的可能原因

这部分是我总结现有的我知道的原因：

大事物延迟延迟略为2*执行时间状态为:reading event from the relay log
大表DDL延迟延迟略为1*执行时间状态为:altering table
长期未提交的事物延迟，会造成延迟的瞬时增加
表上没有主键或者唯一键状态为:system lock 或者 reading event from the relay log
Innodb层锁造成延迟状态为:system lock 或者 reading event from the relay log
从库参数设置如sync_binlog，sync_relay_log，Innodb_flush_log_at_trx_commit等参数

这些原因都是我遇到过的。接下来我想分析一下从库system lock形成的原因。

四、问题由来

问题主要是出现在我们的线上库的从库上，我们经常发现某些数据量大的数据库，sql thread经常处于system lock状态下，大概表现如下：

MySQL> show processlist;
+----+-------------+-----------+------+---------+------+----------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                            | Info             |
+----+-------------+-----------+------+---------+------+----------------------------------+------------------+
|  3 | root        | localhost | test | Sleep   |  426 |                                  | NULL             |
|  4 | system user |           | NULL | Connect | 5492 | Waiting for master to send event | NULL             |
|  5 | system user |           | NULL | Connect |  104 | System lock                      | NULL             |
|  6 | root        | localhost | test | Query   |    0 | starting                         | show processlist |
+----+-------------+-----------+------+---------+------+----------------------------------+------------------+

对于这个状态官方有如下描述:

The thread has called MySQL_lock_tables() and the thread state has not been updated since.
This is a very general state that can occur for many reasons.
For example, the thread is going to request or is waiting for an internal or external system lock for the
table. This can occur when Innodb waits for a table-level lock during execution of LOCK TABLES.
If this state is being caused by requests for external locks and you are not using multiple MySQLd
servers that are accessing the same MyISAM tables, you can disable external system locks with the
--skip-external-locking option. However, external locking is disabled by default, so it is likely
that this option will have no effect. For SHOW PROFILE, this state means the thread is requesting the
lock (not waiting for it).

显然不能解决我的问题，一时间也是无语。而我今天在测试从库手动加行锁并且sql thread冲突的时候发现了这个状态，因此结合gdb调试做了如下分析，希望对大家有用，也作为后期我学习的一个笔记。

五、system lock 延迟的原因

这里先直接给出原因供大家直接参考，简单的说从库出现system lock 应该视为正在干活，而不是名称看到的“lock”，这是由于slave端不存在语句（row格式）的执行，都是Event的直接apply，状态没有切换的机会，也可以认为是slave端状态划分不严谨，其实做一个pstack就能完全看出问题。下面是产生的必要条件：

由于大量的小事物，比如如UPDATE/DELETE table where处理一行数据，这会出现只包含一行数据库的DML event的语句，如果table是一张大表，则会加剧这种可能。
这个表上没有主键或者唯一键，问题加剧。
由于类似Innodb lock堵塞，也就是slave从库修改了数据同时和sql_thread也在修改同样的数据，问题加剧。
确实I/O扛不住了，可以尝试修改参数。

如果是大量的表没有主键或者唯一键可以考虑修改参数slave_rows_search_algorithms 试试。关于slave_rows_search_algorithms 在我的系列中有一章详细讨论，这里不做熬述。

六、system lock 延迟的问题分析

我们知道所有的state都是MySQL上层的一种状态，如果要发生状态的改变必须要调用THD::enter_stage来改变，而system lock则是调用mysql_lock_tables进入的状态，同时从库SQL_THREAD中还有另外一种状态重要的状态reading event from the relay log。

这里是rpl_slave.cc handle_slave_sql函数中的很小的一部分主要用来证明我的分析。

/* Read queries from the IO/THREAD until this thread is killed */

  while (!sql_slave_killed(thd,rli)) //大循环
  {
    THD_STAGE_INFO(thd, stage_reading_event_from_the_relay_log); //这里进入reading event from the relay log状态
    if (exec_relay_log_event(thd,rli)) //这里会先调入next_event函数读取一条event，然后调用lock_tables但是如果不是第一次调用lock_tables则不需要调用MySQL_lock_tables
                                       //逻辑在lock_tables调用mySQL_lock_tables则会将状态置为system lock，然后进入Innodb层进行数据的查找和修改
    
  }

这里还特地请教了阿里的印风兄验证了一下mysql_lock_tables是myisam实现表锁的函数Innodb会设置为共享锁。

这里我们先抛开query event/map event等。只考虑DML event，下面就是system lock出现的流程：

如果一个小事物只有一条DML event的场景下逻辑如下：

->进入reading event from the relay log状态 
 ->读取一条event(参考next_event函数)
  ->进入system lock状态
   ->Innodb层进行查找和修改数据

如果是一个大事物则包含了多条DML event的场景逻辑如下:

->进入reading event from the relay log状态 
 ->读取一条event(参考next_event函数)
  ->进入system lock状态
   ->Innodb层进行查找和修改数据
->进入reading event from the relay log状态 
 ->读取一条event(参考next_event函数)
  ->Innodb层进行查找和修改数据
->进入reading event from the relay log状态 
 ->读取一条event(参考next_event函数)
  ->Innodb层进行查找和修改数据
....直到本事物event执行完成

因此我们看到对于一个小事物我们的sql_thread会在加system lock的情况下进行对数据进行查找和修改，因此得出了我的结论，同时如果是Innodb 层锁造成的sql_thread堵塞也会在持有system lock的状态下。但是对于一个大事物则不一样，虽然出现同样的问题，但是其状态是reading event from the relay log。所以如果出现system lock一般就是考虑前文给出的结论。

七、分析中用到的断点

mysql_lock_tables 本函数更改状态为system lock
gdb打印:p tables[0]->s->table_name
THD::enter_stage 本函数改变状态
gdb打印:p new_stage->m_name
ha_innobase::index_read innodb查找数据接口
gdb打印:p index->table_name
ha_innobase::delete_row innodb删除数据接口
exec_relay_log_event 获取event并且应用
gdb 打印:ev->get_type_code()

八、两个栈帧

进入system lock状态

#0  THD::enter_stage (this=0x7fffec000970, new_stage=0x2ccd180, old_stage=0x0, calling_func=0x2216fd0 "mysql_lock_tables", 
    calling_file=0x22167d8 "/mysql/mysql-5.7.17/sql/lock.cc", calling_line=323) at /mysql/mysql-5.7.17/sql/sql_class.cc:731
#1  0x00000000017451a6 in mysql_lock_tables (thd=0x7fffec000970, tables=0x7fffec005e38, count=1, flags=0) at /mysql/mysql-5.7.17/sql/lock.cc:323
#2  0x00000000014fe8da in lock_tables (thd=0x7fffec000970, tables=0x7fffec012b70, count=1, flags=0) at /mysql/mysql-5.7.17/sql/sql_base.cc:6630
#3  0x00000000014fe321 in open_and_lock_tables (thd=0x7fffec000970, tables=0x7fffec012b70, flags=0, prelocking_strategy=0x7ffff14e2360)
    at /mysql/mysql-5.7.17/sql/sql_base.cc:6448
#4  0x0000000000eee1d2 in open_and_lock_tables (thd=0x7fffec000970, tables=0x7fffec012b70, flags=0) at /mysql/mysql-5.7.17/sql/sql_base.h:477
#5  0x000000000180e7c5 in Rows_log_event::do_apply_event (this=0x7fffec024790, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/log_event.cc:10626
#6  0x00000000017f7b7b in Log_event::apply_event (this=0x7fffec024790, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/log_event.cc:3324
#7  0x00000000018690ff in apply_event_and_update_pos (ptr_ev=0x7ffff14e2818, thd=0x7fffec000970, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/rpl_slave.cc:4709
#8  0x000000000186a7f2 in exec_relay_log_event (thd=0x7fffec000970, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/rpl_slave.cc:5224//这里可以看到不同event不同的处理
#9  0x0000000001870db6 in handle_slave_sql (arg=0x357fc50) at /mysql/mysql-5.7.17/sql/rpl_slave.cc:7332//这里是整个主逻辑
#10 0x0000000001d5442c in pfs_spawn_thread (arg=0x7fffd88fb870) at /mysql/mysql-5.7.17/storage/perfschema/pfs.cc:2188
#11 0x00007ffff7bc7851 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff672890d in clone () from /lib64/libc.so.6

在system lock状态下查找数据

#0  ha_innobase::index_read (this=0x7fffec0294c0, buf=0x7fffec0297b0 "\375\311y", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY)
    at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:8540
#1  0x000000000192126c in ha_innobase::index_first (this=0x7fffec0294c0, buf=0x7fffec0297b0 "\375\311y")
    at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:9051
#2  0x00000000019214ba in ha_innobase::rnd_next (this=0x7fffec0294c0, buf=0x7fffec0297b0 "\375\311y") at /mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:9149
#3  0x0000000000f4972c in handler::ha_rnd_next (this=0x7fffec0294c0, buf=0x7fffec0297b0 "\375\311y") at /mysql/mysql-5.7.17/sql/handler.cc:2947
#4  0x000000000180e1a9 in Rows_log_event::do_table_scan_and_update (this=0x7fffec035c20, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/log_event.cc:10475
#5  0x000000000180f453 in Rows_log_event::do_apply_event (this=0x7fffec035c20, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/log_event.cc:10941
#6  0x00000000017f7b7b in Log_event::apply_event (this=0x7fffec035c20, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/log_event.cc:3324
#7  0x00000000018690ff in apply_event_and_update_pos (ptr_ev=0x7ffff14e2818, thd=0x7fffec000970, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/rpl_slave.cc:4709
#8  0x000000000186a7f2 in exec_relay_log_event (thd=0x7fffec000970, rli=0x393b9c0) at /mysql/mysql-5.7.17/sql/rpl_slave.cc:5224
#9  0x0000000001870db6 in handle_slave_sql (arg=0x357fc50) at /mysql/mysql-5.7.17/sql/rpl_slave.cc:7332
#10 0x0000000001d5442c in pfs_spawn_thread (arg=0x7fffd88fb870) at /mysql/mysql-5.7.17/storage/perfschema/pfs.cc:2188
#11 0x00007ffff7bc7851 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffff672890d in clone () from /lib64/libc.so.6

九、相关测试

下面来测试一下system lock这种状态，使用一张包含5条数据的表，表结构如下：

root@localhost:testrep:04:29:18>select count(*) from testrep;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)
root@localhost:testrep:04:30:12>show create table testrep
    -> ;
+---------+------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                       |
+---------+------------------------------------------------------------------------------------------------------------------------------------+
| testrep | CREATE TABLE `testrep` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------+------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

1 从库开启事务，同时锁住一行数据，如下：
begin;select *from testrep where id=5 for update;
2 主库执行删除语句，删除这5行数据，如下：
delete from testrep;
3 从库进行观察可以发现system lock状态

root@localhost:testrep:04:35:14>select state  from information_schema.processlist where id=9;
+-------------+
| state       |
+-------------+
| System lock |
+-------------+
1 row in set (0.00 sec)

出现的原因就是主库删除的5条数据会处于一个DELETE_EVENT中，我们现在知道了一个Event最大为8K左右，因此本事务只包含了一个Event。从库由于id=5的这条数据上了Innodb 行锁，因此SQL线程不能对这条数据进行处理，导致了system lock的出现。

最后编辑于：2020.09.25 22:17:20

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,839评论 6赞 482
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,543评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 153,116评论 0赞 344
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,371评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,384评论 5赞 374
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,111评论 1赞 285
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,416评论 3赞 400
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 37,053评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,558评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 36,007评论 2赞 325
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,117评论 1赞 334
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,756评论 4赞 324
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,324评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,315评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,539评论 1赞 262
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,578评论 2赞 355
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,877评论 2赞 345