最近有网友遇到了在RR隔离级别下insert A select B where B.COL=** 由于select表也就是B表引发的死锁的问题。分析死锁日志后,笔者进行模拟重现了这位网友遇到了2种场景并且在本文中进行详细的描述。
- 本文使用版本percona 5.7.14修改版,能够打印出事务所有的行锁信息结构链(不包含隐含锁)
- 本文中的测试是在RR隔离级别下完成的,RC不存在这样的问题
- 笔者对源码的理解有限,如有错误请指正
- 本文使用了自制工具innblock和bcview,前者用于扫描块结构后者用于更加方便的查看二进制文件信息获取地址
- innblock http://pan.baidu.com/s/1qYnyVWo
- bcview http://pan.baidu.com/s/1num76RJ
感谢叶金荣老师对本文的审核,笔者也曾是一名知数堂的学生
一、基本概念
在开始正文之前我打算介绍一下一些基本概念,特别是锁模型和兼容矩阵会对本文的阅读有相当大的帮助。
1、 innodb lock模型
- [LOCK_ORDINARY[next_key_lock]:]
源码定义:
#define LOCK_ORDINARY 0 /*!< this flag denotes an ordinary
next-key lock in contrast to LOCK_GAP
or LOCK_REC_NOT_GAP */
默认是LOCK_ORDINARY即普通的next_key_lock,锁住行及以前的间隙。
- [LOCK_GAP:]
源码定义:
#define LOCK_GAP 512 /*!< when this bit is set, it means that the
lock holds only on the gap before the record;
for instance, an x-lock on the gap does not
give permission to modify the record on which
the bit is set; locks of this type are created
when records are removed from the index chain
间隙锁,锁住行以前的间隙,不锁住本行。
- [LOCK_REC_NOT_GAP:]
源码定义:
#define LOCK_REC_NOT_GAP 1024 /*!< this bit means that the lock is only on
the index record and does NOT block inserts
to the gap before the index record; this is
used in the case when we retrieve a record
with a unique key, and is also used in
locking plain SELECTs (not part of UPDATE
or DELETE) when the user has set the READ
COMMITTED isolation level */
行锁,锁住行而不锁住任何间隙。
- [LOCK_INSERT_INTENTION:]
源码定义:
#define LOCK_INSERT_INTENTION 2048 /*!< this bit is set when we place a waiting
gap type record lock request in order to let
an insert of an index record to wait until
there are no conflicting locks by other
transactions on the gap; note that this flag
remains set when the waiting lock is granted,
or if the lock is inherited record */
插入意向锁,如果插入的记录在某个已经锁定的间隙内为这个锁。
2、 innodb lock兼容矩阵
/* LOCK COMPATIBILITY MATRIX
* IS IX S X AI
* IS + + + - +
* IX + + - - +
* S + - + - -
* X - - - - -
* AI + + - - -
3、infimum和supremum
一个page中包含这两个伪记录。页中所有的行未删除(或删除未purge)的行逻辑上都连接到这两个虚列之间,表现为一个逻辑链表数据结构,其中supremum伪记录的锁始终为next_key_lock。
4、heap no
heap no存储在fixed_extrasize 中。heap no 为物理存储填充的序号,页的空闲空间挂载在page free链表中(头插法)可以重用,但是重用此heap no不变,如果一直是insert 则heap no 不断增加,并不是按照ROWID(主键)排序的逻辑链表顺序,而是物理填充顺序。
5、n bits
和这个page相关的锁位图的大小,每一行记录都有1 bit的位图信息与其对应,用来表示是否加锁,并且始终预留64bit。例如我的表有9条数据,同时包含infimum和supremum虚拟记录即 64+9+2 bits,即75bits但是必须被8整除向上取整为一个字节,结果也就是就是80 bits。注意不管是否加锁每行都会对应一bit的位图。
6、lock struct
这是LOCK的内存结构体源码中用lock_t表示其可以包含
lock_table_t tab_lock;/*!< table lock */
lock_rec_t rec_lock;/*!< record lock */
一般来说innodb上锁都会对表级加上IX,这占用一个结构体。然后分别对二级索引和主键进行加锁,每一个BLOCK会占用这样一个结构体。
7、row lock
这个信息描述了当前事务加锁的行数,他是所有lock struct结构体中排除table lock以外所有加锁记录的总和,并且包含了infimum和supremum伪列。
8、逐步加锁
如果细心的朋友应该会发现在show engine 中事务信息中的row lock在对大量行进行加锁的时候会不断的增加,因为加行锁最终会调用lock_rec_lock逐行加锁,这也会增加了大数据量加锁的触发死锁的可能性。
二、Innodb层对insert...select 中select表的加锁模式
RR隔离级别下insert A select B where B.COL=**,innodb层会对B表满足条件的数据进行加锁,但是RC模式下B表记录不会加任何innodb层的锁,表现如下:
- 如果B.COL有二级(非唯一),并且执行计划使用到了(非using index)
- B表二级索引对选中记录加上LOCK_S|LOCK_ORDINARY[next_key_lock],并且对下一条记录加上LOCK_S|LOCK_GAP
- B表PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
- 如果B.COL有二级(唯一),并且执行计划使用到了(非using index)
- B表二级索引对选中记录加上LOCK_S|LOCK_REC_NOT_GAP
- B表PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
- 如果B.COL没有二级索引
- 对整个B表上的所有记录加上LOCK_S|LOCK_ORDINARY[next_key_lock]
三、Innodb层对insert...select中select表的加锁测试
下面我们分别对其进行测试和打印输出:
1. 如果B.COL有二级(唯一),并且执行计划使用到了(非using index)
使用语句:
drop table t1;
drop table t2;
create table t1(id int primary key,n1 varchar(20),n2 varchar(20),key(n1));
create table t2 like t1;
insert into t1 values(1,'gao1','gao'),(2,'gao1','gao'),(3,'gao1','gao'),(4,'gao2','gao'),(5,'gao2','gao'),(6,'gao2','gao'),(7,'gao3','gao'),(8,'gao4','gao');
查看执行计划:
mysql> desc insert into t2 select * from t1 force index(n1) where n1='gao2';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | INSERT | t2 | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | t1 | NULL | ref | n1 | n1 | 23 | const | 3 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
执行语句:
begin;insert into t2 select * from t1 force index(n1) where n1='gao2';
观察结果:
- 二级索引对记录加上LOCK_S|LOCK_ORDINARY[next_key_lock]
-----TRX NO:28470 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 86 page no 4 n bits 80 index n1 of table `test`.`t1` trx id 28470 lock mode S(LOCK_S) locks gap and rec(LOCK_ORDINARY[next_key_lock])
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000004; asc ;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000005; asc ;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000006; asc ;;
- PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
-----TRX NO:28470 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 86 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 28470 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 000000006f20; asc o ;;
2: len 7; hex bc000001300134; asc 0 4;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 000000006f20; asc o ;;
2: len 7; hex bc000001300140; asc 0 @;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000006; asc ;;
1: len 6; hex 000000006f20; asc o ;;
2: len 7; hex bc00000130014c; asc 0 L;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
- 对二级索引下一条记录加上LOCK_S|LOCK_GAP
-----TRX NO:28470 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 86 page no 4 n bits 80 index n1 of table `test`.`t1` trx id 28470 lock mode S(LOCK_S) locks gap before rec(LOCK_GAP)
Record lock, heap no 8 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f33; asc gao3;;
1: len 4; hex 80000007; asc ;;
- 如图红色部分都是需要锁定的记录
2. 如果B.COL有二级(唯一),并且执行计划使用到了(非using index)
使用语句:
drop table t1;
drop table t2;
create table t1(id int primary key,n1 varchar(20),n2 varchar(20),unique key(n1));
create table t2 like t1;
insert into t1 values(1,'gao1','gao'),(2,'gao2','gao'),(3,'gao3','gao'),(4,'gao4','gao'),(5,'gao5','gao'),(6,'gao6','gao'),(7,'gao7','gao'),(8,'gao8','gao');
查看执行计划:
mysql> desc insert into t2 select * from t1 force index(n1) where n1 in ('gao2','gao3','gao4');
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
| 1 | INSERT | t2 | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | t1 | NULL | range | n1 | n1 | 23 | NULL | 3 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+
执行语句:
begin;insert into t2 select * from t1 force index(n1) where n1 in ('gao2','gao3','gao4');
观察输出:
- B表二级索引对选中记录加上LOCK_S|LOCK_REC_NOT_GAP
-----TRX NO:30514 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 94 page no 4 n bits 80 index n1 of table `test`.`t1` trx id 30514 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f32; asc gao2;;
1: len 4; hex 80000002; asc ;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f33; asc gao3;;
1: len 4; hex 80000003; asc ;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 4; hex 67616f34; asc gao4;;
1: len 4; hex 80000004; asc ;;
- B表PRIMARY加上LOCK_S|LOCK_REC_NOT_GAP
-----TRX NO:30514 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 94 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 30514 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000002; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a200000115011c; asc ;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000003; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150128; asc (;;
3: len 4; hex 67616f33; asc gao3;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150134; asc 4;;
3: len 4; hex 67616f34; asc gao4;;
4: len 3; hex 67616f; asc gao;;
- 如图红色部分都是需要锁定的记录
3.如果B.COL没有二级索引
使用语句:
drop table t1;
drop table t2;
create table t1(id int primary key,n1 varchar(20),n2 varchar(20));
create table t2 like t1;
insert into t1 values(1,'gao1','gao'),(2,'gao2','gao'),(3,'gao3','gao'),(4,'gao4','gao'),(5,'gao5','gao'),(6,'gao6','gao'),(7,'gao7','gao'),(8,'gao8','gao');
查看执行计划:
mysql> desc insert into t2 select * from t1 where n1 in ('gao2','gao3','gao4');
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | INSERT | t2 | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 8 | 37.50 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
执行语句:
begin;insert into t2 select * from t1 where n1 in ('gao2','gao3','gao4');
观察输出:
-----TRX NO:30535 LOCK STRUCT(1)(Add by gaopeng)
RECORD LOCKS space id 94 page no 3 n bits 80 index PRIMARY of table `test`.`t1` trx id 30535 lock mode S(LOCK_S) locks gap and rec(LOCK_ORDINARY[next_key_lock])
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000001; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150110; asc ;;
3: len 4; hex 67616f31; asc gao1;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 3 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000002; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a200000115011c; asc ;;
3: len 4; hex 67616f32; asc gao2;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 4 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000003; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150128; asc (;;
3: len 4; hex 67616f33; asc gao3;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000004; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150134; asc 4;;
3: len 4; hex 67616f34; asc gao4;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 6 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000005; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150140; asc @;;
3: len 4; hex 67616f35; asc gao5;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 7 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000006; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a200000115014c; asc L;;
3: len 4; hex 67616f36; asc gao6;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 8 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000007; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150158; asc X;;
3: len 4; hex 67616f37; asc gao7;;
4: len 3; hex 67616f; asc gao;;
Record lock, heap no 9 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000008; asc ;;
1: len 6; hex 000000007728; asc w(;;
2: len 7; hex a2000001150164; asc d;;
3: len 4; hex 67616f38; asc gao8;;
4: len 3; hex 67616f; asc gao;;
- 如图红色部分都是需要锁定的记录
四、insert...select由于select查询表引起的死锁
有了上面的理论,我们知道在RR隔离级别下insert...select会对select符合条件的数据加上LOCK_S锁,我曾经总结过出现死锁的条件:
- 至少2个独立的线程(会话)
- 单位操作中包含多个相对独立的加锁步骤,有一定的时间差
- 多个线程(会话)之间加锁对象必须有相互等待的情况发生,并且等待出现环状。
由于存在对select符合条件的数据加上LOCK_S锁的情况,RR模式下insert...select出现死锁的概率无疑更加高,我通过测试模拟出这种情况,严格意义上是相同的语句在高并发情况下表现为两种死锁情况。
测试脚本:
create table b(id int primary key,name1 varchar(20),name2 varchar(20));
alter table b add key(name1);
DELIMITER //
CREATE PROCEDURE test_i()
begin
declare num int;
set num = 1;
while num <= 3000 do
insert into b values(num,concat('gao',num),'gaopeng');
set num=num+1;
end while;
end//
call test_i()//
create table a like b//
DELIMITER ;
语句都是一样的:
TX1 | TX2 |
---|---|
begin; | - |
update b set name2='test' where id=2999; | - |
- | insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999); |
update b set name2='test' where id=999; | - |
但是在高并发下相同的语句却表现出不同的死锁情况,现在进行分析:
情况1:
- TX1:执行update将表b主键id=2999的记录加上LOCK_X
- TX2:执行insert...select语句b表上的记录(996,997,998,999,2995,2996,2997,2998,2999)会申请加上LOCK_S,
但是id=2999已经加上LOCK_X,显然不能获得只能等待. - TX1:执行update需要获得表b主键id=999的LOCK_X显然这个记录已经被TX2加锁LOCK_S,只能等待,触发死锁检测
如下图红色记录为不能获得锁的记录:
情况2:
这种情况比较极端只能在高并发上出现
- TX1:执行update将表b主键id=2999的记录加上LOCK_X
- TX2:执行insert...select语句b表上的记录(996,997,998,999,2995,2996,2997,2998,2999)会申请加上LOCK_S,因为上锁是有一个逐步加锁的过程,假设此时加锁到2997前那么TX2并不会等待
- TX1:执行update需要获得表b主键id=999的LOCK_X显然这个记录已经被TX2加锁LOCK_S,只能等待
- TX2:继续加锁LOCK_S 2997、2998、2999 发现2999已经被TX1加锁LOCK_X,只能等待,触发死锁检测
如下图红色记录为不能获得锁的记录:
五、源码修改和参数增加
情况2的测试需要在高并发下才会出现,因为insert...select语句是一条语句很难人为控制,也就是很让他在特定条件下停止。但是为了能够模拟出这种情况笔者对innodb增加了4个参数如下,为了方便识别我都加上了自己的名字的拼音:
mysql> show variables like '%gaopeng%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| innodb_gaopeng_sl_heap_no | 0 |
| innodb_gaopeng_sl_ind_id | 0 |
| innodb_gaopeng_sl_page_no | 0 |
| innodb_gaopeng_sl_time | 0 |
+---------------------------+-------+
默认情况都是0,即不启用。他们的意思如下:
- innodb_gaopeng_sl_heap_no:记录所在的heap no
- innodb_gaopeng_sl_ind_id:记录所在的index_id
- innodb_gaopeng_sl_page_no:记录所在的page_no
- innodb_gaopeng_sl_time:睡眠多少秒
有了index_id、page_no、heap no就能唯一限定一条数据了,并且睡眠时间也是可以人为指定的。
并且在源码lock_rec_lock 开头增加如下代码:
//add by gaopeng
/*if find index_id heap no page no to sleep srv_gaopeng_sl_time secs*/
if(srv_gaopeng_sl_ind_id && srv_gaopeng_sl_page_no && srv_gaopeng_sl_heap_no)
{
if(heap_no == (ulint)(srv_gaopeng_sl_heap_no) && (block->page.id).page_no() ==(ib_uint32_t)(srv_gaopeng_sl_page_no)
&& index->id ==(index_id_t)(srv_gaopeng_sl_ind_id) )
{
lock_mutex_exit();
sleep(srv_gaopeng_sl_time);
lock_mutex_enter();
}
}
//add end
这样一旦判定为符合条件的记录,本条记录的加锁钱便会休眠指定的秒如果我们拟定在LOCK_S:id=2997之前睡眠30秒,那么情况2能够必定发生如下图:
六、实际测试
情况1:
TX1 | TX2 |
---|---|
begin; | - |
update b set name2='test' where id=2999;对id:2999加LOCK_X锁 | - |
- | insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999);对id:996,997,998,999,2995,2996,2997,2998加LOCK_S锁,但是对id:2999加LOCK_S锁时发现已经加LOCK_X锁,需等待 |
update b set name2='test' where id=999;对id:999加LOCK_X锁,但是发现已经加LOCK_S锁,需等待,触发死锁检测 | - |
TX1触发死锁,TX1在权重判定下回滚 | - |
死锁报错语句:
mysql> update b set name2='test' where id=999;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
死锁日志:
*** (1) TRANSACTION:
TRANSACTION 48423, ACTIVE 7 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 5 lock struct(s), heap size 1160, 9 row lock(s), undo log entries 8
MySQL thread id 4, OS thread handle 140737223177984, query id 9110 localhost root Sending data
insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 119 page no 18 n bits 160 index PRIMARY of table `test`.`b` trx id 48423 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 86 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000bb7; asc ;;
1: len 6; hex 00000000bd26; asc &;;
2: len 7; hex 21000001511e7d; asc ! Q };;
3: len 7; hex 67616f32393939; asc gao2999;;
4: len 4; hex 74657374; asc test;;
*** (2) TRANSACTION:
TRANSACTION 48422, ACTIVE 24 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 3, OS thread handle 140737223444224, query id 9111 localhost root updating
update b set name2='test' where id=999
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 119 page no 18 n bits 160 index PRIMARY of table `test`.`b` trx id 48422 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 86 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000bb7; asc ;;
1: len 6; hex 00000000bd26; asc &;;
2: len 7; hex 21000001511e7d; asc ! Q };;
3: len 7; hex 67616f32393939; asc gao2999;;
4: len 4; hex 74657374; asc test;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 119 page no 10 n bits 456 index PRIMARY of table `test`.`b` trx id 48422 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 11 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e7; asc ;;
1: len 6; hex 00000000b534; asc 4;;
2: len 7; hex bd000001310110; asc 1 ;;
3: len 6; hex 67616f393939; asc gao999;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
*** WE ROLL BACK TRANSACTION (2)
信息提取如下:
TRX1:48423
LOCK HOLD:死锁信息不提供
LOCK WAIT:
表:b
索引:`PRIMARY`
锁模式:LOCK_S|LOCK_REC_NOT_GAP|LOCK_WAIT
记录:主键为0Xbb7(2999)
附加信息:space id 119 page no 18 heap no 86
CURRENT SQL:
insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
TRX2:48422(触发死锁、权重回滚)
LOCK HOLD:
表:b
索引:`PRIMARY`
锁模式:LOCK_X|LOCK_REC_NOT_GAP
记录:主键为0Xbb7(2999)
附加信息:pace id 119 page no 18 heap no 86
LOCK WAIT:
表:b
索引:`PRIMARY`
锁模式:LOCK_X|LOCK_REC_NOT_GAP|LOCK_WAIT
记录:主键为0X3e7(999)
附加信息:space id 119 page no 10 heap no 11
CURRENT SQL:
update b set name2='test' where id=999
情况2:
如上我们拟定在语句
insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
对b表记录加锁时在2997加锁前停顿30秒,那么我就需要找到b表主键2997的index_id、page_no、heap_no三个信息,这里使用到我的innblock工具
./innblock b.ibd scan 16
===INDEX_ID:121
level1 total block is (1)
block_no: 3,level: 1|*|
level0 total block is (9)
block_no: 5,level: 0|*|block_no: 6,level: 0|*|block_no: 7,level: 0|*|
block_no: 10,level: 0|*|block_no: 11,level: 0|*|block_no: 13,level: 0|*|
block_no: 15,level: 0|*|block_no: 17,level: 0|*|block_no: 18,level: 0|*|
因为为顺序插入那么2997必定到page 18中然后如下:
./innblock b.ibd 18 16
==== Block base info ====
block_no:18 space_id:121 index_id:121
....
(84) normal record offset:3287 heapno:83 n_owned 0,delflag:N minflag:0 rectype:0
(85) normal record offset:3326 heapno:84 n_owned 0,delflag:N minflag:0 rectype:0
(86) normal record offset:3365 heapno:85 n_owned 0,delflag:N minflag:0 rectype:0
(87) normal record offset:3404 heapno:86 n_owned 0,delflag:N minflag:0 rectype:0
(88) normal record offset:3443 heapno:87 n_owned 0,delflag:N minflag:0 rectype:0
因为为顺序插入heap_no 84就是id为2997的记录。我们使用另外一个工具bcview进行验证
./bcview b.ibd 16 3326 4
current block:00000018--Offset:03326--cnt bytes:04--data is:80000bb5
当然0Xbb5就是2997
因此设置参数为:
set global innodb_gaopeng_sl_heap_no=84;
set global innodb_gaopeng_sl_ind_id=121;
set global innodb_gaopeng_sl_page_no=18;
set global innodb_gaopeng_sl_time=30;
mysql> show variables like '%gaopeng%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| innodb_gaopeng_sl_heap_no | 84 |
| innodb_gaopeng_sl_ind_id | 121 |
| innodb_gaopeng_sl_page_no | 18 |
| innodb_gaopeng_sl_time | 30 |
+---------------------------+-------+
那么情况2执行顺序如下:
TX1 | TX2 |
---|---|
begin; | - |
update b set name2='test' where id=2999; 对id:2999加LOCK_X锁 | - |
- | insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999);对id:在加锁到996,997,998,999,2995,2996加LOCK_S锁,在对id:2997加锁前睡眠30秒,为下面的update语句腾出时间) |
update b set name2='test' where id=999;对id:999加LOCK_X锁等待但发现已经加LOCK_S锁,需等待 | - |
- | 醒来后继续对2997、2998、2999加LOCK_S锁,但是发现id:2999已经加LOCK_X锁,需等待,触发死锁检测 |
TX1权重回滚 | - |
死锁报错语句:
mysql> update b set name2='test' where id=999;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
死锁日志:
*** (1) TRANSACTION:
TRANSACTION 51545, ACTIVE 41 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1160, 2 row lock(s), undo log entries 1
MySQL thread id 9, OS thread handle 140737223444224, query id 18310 localhost root updating
update b set name2='test' where id=999
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 121 page no 10 n bits 456 index PRIMARY of table `test`.`b` trx id 51545 lock_mode X(LOCK_X) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 11 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e7; asc ;;
1: len 6; hex 00000000c167; asc g;;
2: len 7; hex bc000001300110; asc 0 ;;
3: len 6; hex 67616f393939; asc gao999;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
*** (2) TRANSACTION:
TRANSACTION 51546, ACTIVE 30 sec starting index read
mysql tables in use 2, locked 2
5 lock struct(s), heap size 1160, 9 row lock(s), undo log entries 8
MySQL thread id 8, OS thread handle 140737223177984, query id 18309 localhost root Sending data
insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 121 page no 10 n bits 456 index PRIMARY of table `test`.`b` trx id 51546 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP)
Record lock, heap no 8 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e4; asc ;;
1: len 6; hex 00000000c164; asc d;;
2: len 7; hex b90000012d0110; asc - ;;
3: len 6; hex 67616f393936; asc gao996;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
Record lock, heap no 9 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e5; asc ;;
1: len 6; hex 00000000c165; asc e;;
2: len 7; hex ba0000014f0110; asc O ;;
3: len 6; hex 67616f393937; asc gao997;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
Record lock, heap no 10 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e6; asc ;;
1: len 6; hex 00000000c166; asc f;;
2: len 7; hex bb0000012f0110; asc / ;;
3: len 6; hex 67616f393938; asc gao998;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
Record lock, heap no 11 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 800003e7; asc ;;
1: len 6; hex 00000000c167; asc g;;
2: len 7; hex bc000001300110; asc 0 ;;
3: len 6; hex 67616f393939; asc gao999;;
4: len 7; hex 67616f70656e67; asc gaopeng;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 121 page no 18 n bits 160 index PRIMARY of table `test`.`b` trx id 51546 lock mode S(LOCK_S) locks rec but not gap(LOCK_REC_NOT_GAP) waiting(LOCK_WAIT)
Record lock, heap no 86 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
0: len 4; hex 80000bb7; asc ;;
1: len 6; hex 00000000c959; asc Y;;
2: len 7; hex 00000002240110; asc $ ;;
3: len 7; hex 67616f32393939; asc gao2999;;
4: len 4; hex 74657374; asc test;;
*** WE ROLL BACK TRANSACTION (1)
信息提取如下:
TRX1:51545
LOCK HOLD:死锁信息不提供
LOCK WAIT:
表:b
索引:`PRIMARY`
锁模式:LOCK_MODE:LOCK_X|LOCK_REC_NOT_GAP|LOCK_WAIT
记录:主键为0X3e7
附加信息: space id 121 page no 10 heap no 11
CURRENT SQL:
update b set name2='test' where id=999
TRX2:51546
LOCK HOLD:
表:b
索引:`PRIMARY`
锁模式:LOCK_MODE:LOCK_S|LOCK_REC_NOT_GAP
记录:主键为0X3e4到0X3e7的多个行锁
附加信息:space id 121 page no 10
LOCK WAIT:
表:b
索引:`PRIMARY`
锁模式:LOCK_MODE:LOCK_S|LOCK_REC_NOT_GAP|LOCK_WAIT
记录:主键为0Xbb7
附加信息:space id 121 page no 10 heap no 86
CURRENT SQL:
insert into a select * from b where id in (996,997,998,999,2995,2996,2997,2998,2999)
我们通过死锁日志明显的看出同样的语句报出来的死锁信息却不一样,在高并发下相同语句,两种死锁场景都是可能发生的。
七、总结
分析死锁一般要从死锁日志中获取如下信息
- 1、加锁发生在主键还是辅助索引
- 2、加锁的模式是什么
- 3、是单行还是多行加锁
- 4、触发死锁事务最后的语句
- 5、死锁信息中事务顺序是怎么样的
在重现的时候,必须要做到和线上死锁信息完全匹配那么这个死锁场景才叫测试成功了,从这个例子我们就发现,同样的语句产生的死锁信息却不一样,我们当然就要按照不通的场景去考虑,本文中的情况2比较复杂一般只是在高并发先出现,测试也相对麻烦,本文通过修改源码的方式进行测试的,否则很难重现。找到原因后就需要采取必要的措施,比如本文中的例子需要考虑:
- 对insert...select中select表的修改是否及时提交。
- insert...select是否可以用其他方式代替。这种语句在自增锁上也存在一定风险。
- 是否考虑使用RC隔离级别,在RC隔离级别下不存在对select表记录加锁的情况。
强调一点对于出现LOCK_S这样的锁最好深入分析,因为这种锁并不多见。