Hudi 使用之Metadata Index

本篇带来Hudi metadata index的介绍、配置和使用。本篇将Hudi官网内容有关部分重新整理在一起，阅读和查找起来更为容易。

Metadata Index依赖metadata表来实现，提供了无需读取底层数据而快速定位数据物理存储位置的能力。
Metadata表设计为无服务，并且不和特定计算引擎绑定。
Metadata表是一个MOR类型的Hudi表，用于存放索引信息，位于.hoodie目录。数据文件格式为HFile，能够显著增强按key查找数据的能力。
相比传统的Index而言，metadata index性能提升巨大，无论是写性能和读性能。

优点：

Eliminate the requirement of list files operation。无需执行list file操作。对于海量数据的对象存储而言list file需要遍历文件，资源消耗极大会成为性能瓶颈。
Expose columns stats through indices for better query planning and faster lookups by readers。记录列状态信息可以有效的帮助计算引擎优化执行计划，根据索引中的列统计信息能够裁剪掉不相关的数据。对于Hudi的表服务例如compaction和clustering等，列统计信息能够帮助引擎快速定位到需要操作的数据所在的物理文件位置，加速运行过程。

支持的索引类型：

files index: Stored as files partition in the metadata table. Contains file information such as file name, size, and active state for each partition in the data table. Improves the files listing performance by avoiding direct file system calls such as exists, listStatus and listFiles on the data table.
column_stats index: Stored as column_stats partition in the metadata table. Contains the statistics of interested columns, such as min and max values, total values, null counts, size, etc., for all data files and are used while serving queries with predicates matching interested columns. This index is used along with the data skipping to speed up queries by orders of magnitude.

bloom_filter index: Stored as bloom_filter partition in the metadata table. This index employs range-based pruning on the minimum and maximum values of the record keys and bloom-filter-based lookups to tag incoming records. For large tables, this involves reading the footers of all matching data files for bloom filters, which can be expensive in the case of random updates across the entire dataset. This index stores bloom filters of all data files centrally to avoid scanning the footers directly from all data files.
record_index: Stored as record_index partition in the metadata table. Contains the mapping of the record key to location. Record index is a global index, enforcing key uniqueness across all partitions in the table. Most recently added in 0.14.0 Hudi release, this index aids in locating records faster than other existing indices and can provide a speedup orders of magnitude faster in large deployments where index lookup dominates write latencies.

对应翻译如下：

文件索引：存储在元数据表的files分区中。包含了数据表中每个分区的文件信息，如文件名、大小和当前状态。通过避免对数据表直接进行文件系统调用，如exists、listStatus和listFiles，它提高了文件列表的性能。
列统计索引：存储在元数据表的column_stats分区中。包含了所有数据文件中感兴趣列的统计信息，如最小值和最大值、总值、null数量、大小等，这些在服务与感兴趣列匹配的谓词的查询时使用。这个索引与data skip技术结合使用，可以极大地加快查询速度。
布隆过滤器索引：存储在元数据表的bloom_filter分区中。该索引采用基于记录键的最小值和最大值的范围修剪以及基于布隆过滤器的查找，来标记传入的记录。对于大型表，这涉及到读取所有匹配数据文件的页脚中的布隆过滤器，如果整个数据集随机更新，这可能代价高昂。该索引将所有数据文件的布隆过滤器集中存储，避免了直接扫描所有数据文件的页脚。
记录索引：存储在元数据表的record_index分区中。包含了记录键到位置的映射。记录索引是一个全局索引，它强制执行表中所有分区的键唯一性。最近在Hudi 0.14.0版本中添加，该索引帮助比其他现有索引更快地定位记录，并且在索引查找占主导地位写入延迟的大型部署中，可以提供数量级更快的加速。

数据写入端启用Hudi metadata表和多模索引

Spark可用的配置项：

Config Name	Default	Description
hoodie.metadata.enable	true (Optional) Enabled on the write side	Enable the internal metadata table which serves table metadata like level file listings. For 0.10.1 and prior releases, metadata table is disabled by default and needs to be explicitly enabled. `Config Param: ENABLE` `Since Version: 0.7.0`
hoodie.metadata.index.bloom.filter.enable	false (Optional)	Enable indexing bloom filters of user data files under metadata table. When enabled, metadata table will have a partition to store the bloom filter index and will be used during the index lookups. `Config Param: ENABLE_METADATA_INDEX_BLOOM_FILTER` `Since Version: 0.11.0`
hoodie.metadata.index.column.stats.enable	false (Optional)	Enable indexing column ranges of user data files under metadata table key lookups. When enabled, metadata table will have a partition to store the column ranges and will be used for pruning files during the index lookups. `Config Param: ENABLE_METADATA_INDEX_COLUMN_STATS` `Since Version: 0.11.0`
hoodie.metadata.record.index.enable	false (Optional)	Create the HUDI Record Index within the Metadata Table `Config Param: RECORD_INDEX_ENABLE_PROP` `Since Version: 0.14.0`

Flink可用的配置项：

Config Name	Default	Description
metadata.enabled	false(Optional)	Enable the internal metadata table which serves table metadata like level file listings, default disabled. `Config Param: METADATA_ENABLED`
hoodie.metadata.index.column.stats.enable	false (Optional)	Enable indexing column ranges of user data files under metadata table key lookups. When enabled, metadata table will have a partition to store the column ranges and will be used for pruning files during the index lookups.

根据官网列出的配置项来看，目前Flink只支持column stats和data skip。

使用文件索引

文件级别的索引只需启用metadata table即可。
各引擎启用metadata table的配置项如下：

Readers	Config	Description
- Spark DataSource - Spark SQL - Strucured Streaming	hoodie.metadata.enable	When set to `true` enables use of the spark file index implementation for Hudi, that speeds up listing of large tables.
Presto	hudi.metadata-table-enabled	When set to `true` fetches the list of file names and sizes from Hudi’s metadata table rather than storage.
Trino	hudi.metadata-enabled	When set to `true` fetches the list of file names and sizes from metadata rather than storage.
Athena	hudi.metadata-listing-enabled	When this table property is set to `TRUE` enables the Hudi metadata table and the related file listing functionality
- Flink DataStream - Flink SQL	metadata.enabled	When set to `true` from DDL uses the internal metadata table to serves table metadata like level file listings

使用column_stats index 和 data skipping

Spark或者Flink使用data skipping的前提是Hudi表启用metadata table并且开启column index。
Spark或者Flink读取时启用data skipping，分别需要开启如下配置：

Readers	Config	Description
- Spark DataSource - Spark SQL - Strucured Streaming	- `hoodie.metadata.enable` - `hoodie.enable.data.skipping`	- When set to `true` enables use of the spark file index implementation for Hudi, that speeds up listing of large tables. - When set to `true` enables data-skipping allowing queries to leverage indices to reduce the search space by skipping over files `Config Param: ENABLE_DATA_SKIPPING` `Since Version: 0.10.0`
- Flink DataStream - Flink SQL	- `metadata.enabled` - `read.data.skipping.enabled`	- When set to `true` from DDL uses the internal metadata table to serves table metadata like level file listings - When set to `true` enables data-skipping allowing queries to leverage indices to reduce the search space by skipping over files

记录索引和传统索引的对比

Record level索引和传统索引相比优势巨大。

Record Level Index	Global Simple Index	Global Bloom Index	Bucket Index
Performant look-up in general	Yes	No	No
Boost both writes and reads	Yes	No, write-only	No, write-only
Easy to enable	Yes	Yes	Yes

Spark在线异步metadata index

Spark写入端Hudi参数示例如下：

# ensure that both metadata and async indexing is enabled as below two configs  
hoodie.metadata.enable=true  
hoodie.metadata.index.async=true  
# enable column_stats index config  
hoodie.metadata.index.column.stats.enable=true  
# set concurrency mode and lock configs as this is a multi-writer scenario  
# check https://hudi.apache.org/docs/concurrency_control/ for differnt lock provider configs  
hoodie.write.concurrency.mode=optimistic_concurrency_control  
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider  
hoodie.write.lock.zookeeper.url=<zk_url>  
hoodie.write.lock.zookeeper.port=<zk_port>  
hoodie.write.lock.zookeeper.lock_key=<zk_key>  
hoodie.write.lock.zookeeper.base_path=<zk_base_path>

离线metadata index

Schedule

我们可以使用HoodieIndex的schedule模式，排期index构建操作。例如：

spark-submit \
--class org.apache.hudi.utilities.HoodieIndexer \
/Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0.jar \
--props /Users/home/path/to/indexer.properties \
--mode schedule \
--base-path /tmp/hudi-ny-taxi \
--table-name ny_hudi_tbl \
--index-types COLUMN_STATS \
--parallelism 1 \
--spark-memory 1g

该操作会在timeline中写入 indexing.requested instant。

Execute

使用HoodieIndexer的execute模式，执行上一步的index排期。

spark-submit \
--class org.apache.hudi.utilities.HoodieIndexer \
/Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0.jar \
--props /Users/home/path/to/indexer.properties \
--mode execute \
--base-path /tmp/hudi-ny-taxi \
--table-name ny_hudi_tbl \
--index-types COLUMN_STATS \
--parallelism 1 \
--spark-memory 1g

我们也可以使用scheduleAndExecute模式，把排期和执行放在一起搞定。当然，排期和执行分开的话具有更高的灵活性。

Drop

删除索引可使用HoodieIndexer的dropindex模式。

spark-submit \
--class org.apache.hudi.utilities.HoodieIndexer \
/Users/home/path/to/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.13.0.jar \
--props /Users/home/path/to/indexer.properties \
--mode dropindex \
--base-path /tmp/hudi-ny-taxi \
--table-name ny_hudi_tbl \
--index-types COLUMN_STATS \
--parallelism 1 \
--spark-memory 2g

并发写入控制

单实例写入，inline表服务。这种情况仅需简单开启hoodie.metadata.enable，即可保证并发安全。
单实例写入，异步表服务。需要增加乐观并发访问控制。

hoodie.write.concurrency.mode=optimistic_concurrency_control
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider

多实例写入。为防止数据丢失，需要所有的写入实例配置分布式锁

hoodie.write.concurrency.mode=optimistic_concurrency_control
hoodie.write.lock.provider=<distributed-lock-provider-classname>

注意事项

在启用元数据表之前，需要停止该表的所有写入操作。或者是开启并发写入控制。

目前MOR表不支持data skipping。参见：[HUDI-3866] Support Data Skipping for MOR - ASF JIRA (apache.org)

参考文献

Metadata Table | Apache Hudi
Metadata Indexing | Apache Hudi
记录级别索引：Apache Hudi 针对大型数据集的超快索引 - 知乎 (zhihu.com)

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,839评论 6赞 482
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,543评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 153,116评论 0赞 344
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,371评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,384评论 5赞 374
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,111评论 1赞 285
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,416评论 3赞 400
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 37,053评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,558评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 36,007评论 2赞 325
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,117评论 1赞 334
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,756评论 4赞 324
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,324评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,315评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,539评论 1赞 262
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,578评论 2赞 355
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,877评论 2赞 345