最近在用pt-archiver归档一张1.4T的表,使用如下命令:
pt-archiver --no-check-charset --primary-key-only --bulk-delete --commit-each --limit 1000 --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx --where "collect_time <= \"1502072807\"" --purge
但发现pt-archiver迟迟不输出结果,MySQL的process里显示如下查询运行了几百秒:
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` FORCE INDEX(`PRIMARY`) WHERE (collect_time <= "1501725651") AND (`id` < '159695169') LIMIT 1000;
explain发现是个慢查询:
+----+-------------+---------------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | xxx | NULL | range | PRIMARY | PRIMARY | 8 | NULL | 77493125 | 33.33 | Using where |
+----+-------------+---------------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
先加上--dry-run
选项验证下是不是pt-archiver产生的慢查询:
> pt-archiver --no-check-charset --primary-key-only --bulk-delete --commit-each --limit 1000 --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx --where "collect_time <= \"1502072807\"" --purge --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` FORCE INDEX(`PRIMARY`) WHERE (collect_time <= "1502072807") AND (`id` < '160458402') LIMIT 1000
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` FORCE INDEX(`PRIMARY`) WHERE (collect_time <= "1502072807") AND (`id` < '160458402') AND ((`id` >= ?)) LIMIT 1000
DELETE FROM `xxx`.`xxx` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (collect_time <= "1502072807") LIMIT 1000
果然是的!
原来pt-archiver默认开启了ascending-index optimization
The default ascending-index optimization causes pt-archiver to optimize repeated SELECT queries so they seek into the index where the previous query ended, then scan along it, rather than scanning from the beginning of the table every time. This is enabled by default because it is generally a good strategy for repeated accesses.
那解决办法也就清晰明了了,有2个:
1.加上--no-ascend
选项禁用ascending-index optimization
> pt-archiver --no-check-charset --primary-key-only --bulk-delete --commit-each --limit 1000 --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx --where "collect_time <= \"1502072807\"" --purge --dry-run --no-ascend
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` WHERE (collect_time <= "1502072807") LIMIT 1000
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` WHERE (collect_time <= "1502072807") LIMIT 1000
DELETE FROM `xxx`.`xxx` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (collect_time <= "1502072807") LIMIT 1000
2.在配置source的DSN那加i=索引名
强制指定我们希望用的索引
> pt-archiver --no-check-charset --primary-key-only --bulk-delete --commit-each --limit 1000 --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx,i=ix_collect_time --where "collect_time <= \"1502072807\"" --purge --dry-run
SELECT /*!40001 SQL_NO_CACHE */ `id`,`collect_time` FROM `xxx`.`xxx` FORCE INDEX(`ix_collect_time`) WHERE (collect_time <= "1502072807") LIMIT 1000
SELECT /*!40001 SQL_NO_CACHE */ `id`,`collect_time` FROM `xxx`.`xxx` FORCE INDEX(`ix_collect_time`) WHERE (collect_time <= "1502072807") AND ((`collect_time` >= ?)) LIMIT 1000
DELETE FROM `xxx`.`xxx` WHERE (((`collect_time` >= ?))) AND (((`collect_time` <= ?))) AND (collect_time <= "1502072807") LIMIT 1000