The Log-Structured Merge-Tree (LSM-Tree)
ABSTRACT
High-performance transaction system applications typically insert rows in a History table to provide an activity trace;
高性能的事务系统通常记录一个事件到一个历史记录表用于追踪。
at the same time the transaction system generates log records for purposes of system recovery.
同时系统生成日志记录用于系统恢复。
Both types of generated information can benefit from efficient indexing.
两个类型的形象都可以受益于高效的索引。
An example in a well-known setting is the TPC-A benchmark application, modified to support efficient queries on the History for account activity for specific accounts.
一个众所周知的示例是TPC-A,修改支持高效查询特殊账户的历史。
This requires an index by account-id on the fast-growing History table.
这要求快速增长的记录有一个用户id的索引。
Unfortunately, standard disk-based index structures such as the B-tree will effectively double the I/O cost of the transaction to maintain an index such as this in real time, increasing the total system cost up to fifty percent.
不幸的是,标准磁盘架构,例如b-tree会提高一倍io使用率维护索引,这增加了整体50%性能消耗。
Clearly a method for maintaining a real-time index at low cost is desirable.
需要一个低消耗的维护实时索引的方案。
The Log-Structured Merge-tree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period.
LSM-tree是一个基于硬盘通过扩展段实现这样功能的一个方案。
The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort.
LSM-tree使用一个推迟和批量操作的算法,把基于内存的组件和基于硬盘的组件级联再一起类似合并排序。
During this process all index values are continuously accessible to retrievals (aside from very short locking periods), either through the memory component or one of the disk components.
这样索引可以连续检索,不管是内存部分还是硬盘部分。
The algorithm has greatly reduced disk arm movements compared to a traditional access methods such as B-trees, and will improve cost- performance in domains where disk arm costs for inserts with traditional access methods overwhelm storage media costs.
这个实现和之前的方案例如B-tree对比极大的减少了硬盘悬臂的移动操作,这个在传统上插入操作硬盘悬臂消耗主要时间的场景性能会提高很多。
The LSM-tree approach also generalizes to operations other than insert and delete.
LSM-tree也适用其他操作,不仅是插入和删除。
However, indexed finds requiring immediate response will lose I/O ef- ficiency in some cases, so the LSM-tree is most useful in applications where index inserts are more common than finds that retrieve the entries.
然而,索引检索操作要求立即返回会丢失I/O效率,所以LSM-tree在插入操作远大于检索的场景非常有用。
This seems to be a common property for History tables and log files, for example.
这似乎是历史表和日志文件的属性。
The conclusions of Section 6 compare the hybrid use of memory and disk components in the LSM-tree access method with the commonly understood advantage of the hybrid method to buffer disk pages in memory.
第6节比较内存和硬盘组件混合组件通过缓存磁盘页提高性能。