ES提供了强大的聚合分析功能,按照操作上细化,可以主要分为四种,如下表所示:
聚合方式 | 解释 |
---|---|
Bucket Aggregation | 一些满足特定条件的文档的集合 |
Metric Aggregation | 一些数学计算,可以对文档字段统计分析 |
Pipeline Aggregation | 对其他的聚合结果进行二次聚合 |
Metrix Aggregation | 支持对多个字段的操作并提供一个结果矩阵 |
在我个人看来这些只是理论意义上的细化,在实际的应用过程中,我们并没有说针对那种场景使用那种聚合分析。都是为了满足我们的业务,在实现的过程中同时会使用到多种聚合的方式。
一. 四种聚合方式
1.1 Bucket(分桶)
分桶就是将具有某一类共同特征的数据归为一类,然后求其总数,例如: 男女、公司同一工作岗位的员工、商品高中低档等。在对数据分桶后还可以进一步分桶,例如:0~ 20岁男性、21~50岁男性、50岁以上男性;同一工作岗位男性、女性;高档商品好评、中评、差评的商品。
1.2 Metric(计算)
计算具有一类特征的数据的统计值,例如平均值、最大值、最小值等。
1.3 Pipeline(管道)
pipeline与Linux操作系统中的管道操作(将上一步操作的结果作为下一步操作的数据源)类似。即将上一次聚合操作的结果作为下一次聚合操作的数据源。
1.4 Metrix(矩阵)
矩阵就是同时可以支持多值的输出,例如对分桶的数据同时求平均、最大、最小值;
二. 具体的案例
在说具体的案例的时候笔者并不会严格的去按照四种聚合方式去讲解。首先在ES中插入一批的测试数据,在插入测试数据之前先定义mapping.
2.1 mapping的定义
PUT employee
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "keyword"
},
"job": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"gender": {
"type": "keyword"
}
}
}
}
2.2 插入数据
PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
数据说明:插入的数据为员工信息,name是员工的姓名,job是员工的工种,age为员工的年龄,sal为员工的薪水,gender为员工的性别
2.3 聚合查询
查询工种的数量
GET employee/_search
{
"size": 0,
"aggs": {
"job_category_count": {
"cardinality": {
"field": "job"
}
}
}
}
查询每个工种的分桶信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_category_num": {
"cardinality": {
"field": "job"
}
}
}
}
查询不同工种的员工的数量,并查询每个工种最大年龄的员工信息。
GET employee/_search
{
"size": 0,
"aggs": {
"job_analysis": {
"terms": {
"field": "job"
},
"aggs": {
"age_top_1": {
"top_hits": {
"size": 1,
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
}
}
}
}
}
查询工资范围在 0~5000, 5001~8000, 8001~12000, 12001~18000, 18001+ 员工的人数
GET employee/_search
{
"size": 0,
"aggs": {
"sal_range_info": {
"range": {
"field": "sal",
"ranges": [
{
"to": 5000
},
{
"from": 5001,
"to": 8000
},
{
"from": 8001,
"to": 12000
},
{
"from": 12001,
"to": 18000
},
{
"from": 18001
}
]
}
}
}
}
以每5000为一个区间,查询工资在对应范围内的员工的数量
GET employee/_search
{
"size": 0,
"aggs": {
"sal_histogram": {
"histogram": {
"field": "sal",
"interval": 5000,
"extended_bounds": {
"min": 0,
"max": 25000
}
}
}
}
}
查询每个工种的数量,以及不同工种的工资统计信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_and_salary_info": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
不同工种下男女员工的数量,以及男女员工的薪资信息
GET employee/_search
{
"size": 0,
"aggs": {
"job_gender_sal_info": {
"terms": {
"field": "job"
},
"aggs": {
"gender_info": {
"terms": {
"field": "gender"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
}
}
查询平均工资最低的部门的平均工资,以及最低工资。
GET employee/_search
{
"size": 0,
"aggs": {
"jobs": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"avg": {
"field": "sal"
}
}
}
},
"min_avg_sal": {
"max_bucket": {
"buckets_path": "jobs>sal_info"
}
}
}
}
三. ES自带航空数据案例
查询到达各目的地的航班的数量
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
}
}
}
}
查询到达各航班的的数量,以及票价的最大值,平均值。
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"max_ticket_price": {
"max": {
"field": "AvgTicketPrice"
}
},
"avg_ticket_price": {
"avg": {
"field": "AvgTicketPrice"
}
}
}
}
}
}
查询到达各航班的的数量,以及票价的聚合信息以及天气的基本信息。
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"ticket_info": {
"stats": {
"field": "AvgTicketPrice"
}
},
"weather_info": {
"terms": {
"field": "DestWeather"
}
}
}
}
}
}
鸣谢
全文几乎没有自己原创的内容,只是对极客时间中阮一鸣
老师的 Elasticsearch核心技术与实战 自己稍稍的做了下总结。