1.借鉴
极客时间 阮一鸣老师的Elasticsearch核心技术与实战
Elasticsearch 参考指南(映射参数enabled)
[翻译]Elasticsearch重要文章之五:预加载fielddata
Elasticsearch学习之图解Elasticsearch中的_source、_all、store和index属性
elasticsearch 中的store 以及倒排索引的问题
Elasticsearch 关于store字段的处理
elasticsearch搜索过程分析
2. 开始
Dynamic
- dynamic控制着索引的文档是否可包含新增字段,默认为true。
true | false | strict | |
---|---|---|---|
文档可被索引 | 是 | 是 | 否 |
字段可被索引 | 是 | 否 | 否 |
_mapping可被更新 | 是 | 否 | 否 |
False
- 我们试一下,设置dynamic为false
PUT /my_movies
{
"mappings": {
"dynamic": false,
"properties": {
"name": {
"type": "keyword"
},
"content": {
"type": "text"
}
}
}
}
- 添加一篇文档,带有mapping中没有指定的字段age
PUT /my_movies/_doc/1
{
"name": "caiser",
"content": "Hello Hello",
"age": 99
}
- 添加成功后再看一下mapping
{
"my_movies" : {
"mappings" : {
"dynamic" : "false",
"properties" : {
"content" : {
"type" : "text"
},
"name" : {
"type" : "keyword"
}
}
}
}
}
- 结果表明,设置为false后,文档被索引了,但是mapping并没有更新
-我们再通过age查询一下,看看字段是否被索引
GET /my_movies/_search?q=age:99
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
- 返回结果是空的,说明设置为false,字段不会被索引
Strict
- 接下来我们试一下,设置dynamic为strict
DELETE /my_movies
PUT /my_movies
{
"mappings": {
"dynamic": "strict",
"properties": {
"name": {
"type": "keyword"
},
"content": {
"type": "text"
}
}
}
}
- 我们尝试添加一篇文档,文档中包含mapping定义中不存在的属性
PUT /my_movies/_doc/1
{
"name": "caiser",
"content": "Hello Hello",
"age": 99
}
- 直接报错了
{
"error": {
"root_cause": [
{
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed"
}
],
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed"
},
"status": 400
}
- 由此可见,设置dynamic为strict时,如果索引mapping中不存在的字段,会直接报错
null_value
- 需要对null值进行搜索
- 只有keyword类型支持设置为null_value
例子
- 我们来验证一下,为text类型设置null_value
DELETE /my_movies
PUT /my_movies
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"content": {
"type": "text",
"null_value": "null"
}
}
}
}
- 为text类型设置null_value则会报以下错误
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Mapping definition for [content] has unsupported parameters: [null_value : null]"
}
],
"type": "mapper_parsing_exception",
"reason": "Failed to parse mapping [_doc]: Mapping definition for [content] has unsupported parameters: [null_value : null]",
"caused_by": {
"type": "mapper_parsing_exception",
"reason": "Mapping definition for [content] has unsupported parameters: [null_value : null]"
}
},
"status": 400
}
- 如果为keyword设置,则可以成功
DELETE /my_movies
PUT /my_movies
{
"mappings": {
"properties": {
"name": {
"type": "keyword",
"null_value": "null"
},
"content": {
"type": "text"
}
}
}
}
- 我们来添加数据并且查询一下:
# 添加两篇文档
PUT /my_movies/_doc/1
{
"content": "123",
"name": null
}
PUT /my_movies/_doc/2
{
"content": "123456"
}
# 查询一下
GET /my_movies/_search
{
"query": {
"term": {
"name": {
"value": "null"
}
}
}
}
- 查询结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "my_movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"content" : "123",
"name" : null
}
}
]
}
}
Copy To
- copy_to将字段的数知拷贝到目标字段
- copy_to的目标字段不出现在_source中
DELETE /my_users
# 创建索引
PUT /my_users
{
"mappings": {
"properties": {
"fristName": {
"type": "text",
"copy_to": "fullName"
},
"lastName": {
"type": "text",
"copy_to": "fullName"
}
}
}
}
# 索引文档
PUT /my_users/_doc/1
{
"fristName": "sun",
"lastName": "ruikai"
}
# 查询
GET /my_users/_search
{
"query": {
"match": {
"fullName": {
"query": "sun ruikai",
"operator": "and"
}
}
}
}
- 查询结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "my_users",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.2876821,
"_source" : {
"fristName" : "sun",
"lastName" : "ruikai"
}
}
]
}
}
doc_values & fielddata
doc_values | fielddata | |
---|---|---|
何时创建 | 索引时,和倒排索引一起创建 | 搜索是动态创建 |
创建位置 | 磁盘文件 | JVM内存 |
优点 | 避免大量内存占用 | 索引速度快,不占用额外的磁盘空间 |
缺点 | 降低索引速度,占用额外的磁盘空间 | 文档过多,动态创建开销大,占用过多JVM内存 |
缺省值 | true | false |
- 如果keyword字段无需排序和聚合,可以设置doc_values: false,可以增加索引的速度,减少磁盘使用量,如果重新打开,需要重建索引
- 如果text字段需要排序和聚合,需要设置fielddata: true
enable
如果一个字段不需要被检索,排序以及集合分析,enable设置为false
需要注意的是:enabled只能设置在顶层mapping中,以及type为object的属性中
以下两种为合法的设置
DELETE my_movies
PUT /my_movies
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"content": {
"type": "text"
},
"url": {
"enabled": false,
"type": "object"
}
}
}
}
DELETE my_movies
PUT /my_movies
{
"mappings": {
"enabled": false,
"properties": {
"name": {
"type": "keyword"
},
"content": {
"type": "text"
},
"url": {
"type": "object"
}
}
}
}
eager_global_ordinals
预加载
如果更新频繁,聚合查询频繁的keyword类型的字段推荐将该选项设置为true
DELETE my_movies
PUT /my_movies
{
"mappings": {
"properties": {
"name": {
"type": "keyword",
"eager_global_ordinals": true
},
"content": {
"type": "text"
},
"url": {
"type": "object"
}
}
}
}
_source && index && store 图例
_source
翻译官网如下:
_source字段包含索引时传递的原始JSON文档主体。_source字段本身没有索引(因此不能搜索),但是会被存储,以便在执行fetch请求(如get或search)时返回。
设置_source为false可节约磁盘,适用于指标型数据,一般优先考虑增加压缩比(index.codec),但是关闭了_source就不支持以下操作
- update, update_by_query, reindex
- 高亮
- 无法在_source字段中获得
我们可以指定_source全部禁用,或者指定包含哪些,不包含哪些
举个栗子
# 全部禁用_source
PUT /song_of_ice_and_fire
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"title": {
"type": "keyword"
},
"content": {
"type": "text"
}
}
}
}
# 包含title,不包含content
PUT /song_of_ice_and_fire
{
"mappings": {
"_source": {
"includes": ["title"],
"excludes": ["content"]
},
"properties": {
"title": {
"type": "keyword"
},
"content": {
"type": "text"
}
}
}
}
store
翻译官网如下:
默认情况下,字段值被索引以使其可搜索,但不存储它们。这意味着可以查询字段,但不能检索原始字段值。
通常这并不重要。字段值已经是_source字段的一部分,该字段默认存储。如果只想检索单个字段或几个字段的值,而不是整个_source,那么可以通过源过滤来实现。
在某些情况下,存储字段是有意义的。例如,如果你有一个有标题的文档,一个日期,和一个非常大的内容字段,你可能想检索仅仅标题和日期,而不必从一个大_source字段提取这些字段
store属性用于指定原始字段是否存储,一般不与_source中的字段重叠
PUT /song_of_ice_and_fire
{
"mappings": {
"_source": {
"includes": ["title"],
"excludes": ["content"]
},
"properties": {
"title": {
"type": "keyword"
},
"content": {
"type": "text",
"store": true
}
}
}
}
Index
- index的设置控制着字段是否被索引,默认为true
true | false | |
---|---|---|
是否会创建倒排索引 | 是 | 否 |
字段是否可被搜索 | 是 | 否 |
- 我们举个栗子,设置name的index属性为false
DELETE /my_movies
PUT /my_movies
{
"mappings": {
"properties": {
"name": {
"type": "keyword",
"index": false
},
"content": {
"type": "text"
}
}
}
}
- 索引一篇文档
PUT /my_movies/_doc/1
{
"name": "caiser",
"content": "Hello Hello",
"age": 99
}
- 查询一下
GET /my_movies/_search
{
"query": {
"term": {
"name": {
"value": "caiser"
}
}
}
}
- 结果直接报错了,es的返回也说明了问题:“Cannot search on field [name] since it is not indexed.”
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"term\" : {\n \"name\" : {\n \"value\" : \"caiser\",\n \"boost\" : 1.0\n }\n }\n}",
"index_uuid": "uLkZEGRuRCKVWyik8Z8VCQ",
"index": "my_movies"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_movies",
"node": "M4LyTpueT--40-oJaXKvfA",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: {\n \"term\" : {\n \"name\" : {\n \"value\" : \"caiser\",\n \"boost\" : 1.0\n }\n }\n}",
"index_uuid": "uLkZEGRuRCKVWyik8Z8VCQ",
"index": "my_movies",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Cannot search on field [name] since it is not indexed."
}
}
}
]
},
"status": 400
}
Index Option
- index_option控制者倒排索引记录的级别
序号 | 级别 | 描述 |
---|---|---|
1 | doc | 记录doc id |
2 | freqs | 记录doc id 和 term 频率 |
3 | positions | 记录doc id,term频率,term位置 |
4 | offsets | 记录doc id,term频率,term位置,字符偏移量 |
- text 默认级别为positions,其他默认为doc