前面介绍了HIVE的ANALYZE TABLE命令, IMPALA也提供了一个类似的命令叫COMPUTE STATS。这篇文章就是讲讲这个命令。
IMPALA的COMPUTE STATS是做啥的
Gathers information about volume and distribution of data in a table and all associated columns and partitions. The information is stored in the metastore database, and used by Impala to help optimize queries. For example, if Impala can determine that a table is large or small, or has many or few distinct values it can organize parallelize the work appropriately for a join query or insert operation. For details about the kinds of information gathered by this statement, see Table and Column Statistics.
和HIVE的ANALYZE TABLE类似,这个命令主要也是为了优化查询,加快查询的速度。本来IMPALA是依靠HIVE的ANALYZE TABLE的,但是这个命令不是很好用同时不稳定,所以IMPALA自己实现了个命令完成相同功能。
语法
#全量
COMPUTE STATS [db_name.]table_name
#增量
COMPUTE INCREMENTAL STATS [db_name.]table_name [PARTITION (partition_spec)]
例子
SHOW PARTITIONS dw_wy_video_kqi_cell_hourly;
COMPUTE INCREMENTAL STATS dw_wy_video_kqi_cell_hourly PARTITION (date_time='2019022817');
SHOW PARTITIONS dw_wy_video_kqi_cell_hourly;
效果如下,没有用过COMPUTE INCREMENTAL STATS的分区是 -1
执行COMPUTE STATS dw_wy_video_kqi_cell_hourly
语句之前的效果,可以看到有很多分区的数据并未统计
执行COMPUTE STATS dw_wy_video_kqi_cell_hourly
后的效果