Most fields are indexed by default, which makes them searchable. Sorting, aggregations, and accessing field values in scripts, however, requires a different access pattern from search.
大多数字段默认是被索引了的,索引使他们可检索。排序、聚合和访问字段值在脚本中,然而,需要从搜索中获得一个不同的访问模式。
Search needs to answer the question "Which documents contain this term?", while sorting and aggregations need to answer a different question: "What is the value of this field for this document?".
搜索需要回答这个问题“哪个文档包含这分词?”,同时排序和聚合需要回答不同的问题“这个文档的字段值是什么?”
Most fields can use index-time, on-disk doc_values
for this data access pattern, but text fields do not support doc_values.
大多数字段可以使用 index-time(文档索引时间), 磁盘上的 doc_values(文档值)为这个数据的访问模式,但是"text"类型的字段不支持 doc_values 。
注释:doc_values文档值是磁盘上的数据结构,它建立在文档索引时间内,这使得数据访问模式成为可能。几乎所有字段类型都支持文档值,其中分析字符串字段的显着例外。所有支持文档值的字段默认启用它们。如果您确信不需要在字段上排序或聚集,或从脚本中访问字段值,则可以禁用文档值以节省磁盘空间。
Instead, text fields use a query-time in-memory data structure called fielddata. This data structure is built on demand the first time that a field is used for aggregations, sorting, or in a script. It is built by reading the entire inverted index for each segment from disk, inverting the term ↔︎ document relationship, and storing the result in memory, in the JVM heap.
相反,文本字段使用 fielddata 一个在内存中的数据结构里的 query-time(查询时间)。这个数据结构被构建在首次一个字段被使用与聚会、排序、或在脚本中的时候。它是通过读取磁盘的每个段的全部倒排索引来建立的。颠倒这个分词 ↔︎ 文档关系 并在JVM 堆栈 内存中存储这个结果。
Fielddata is disabled on text fields by default
Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.
Fielddata 会消耗大量的堆空间,尤其是加载高基数的文本字段时。一旦 fielddata 已经被加载进入堆栈,它保留在那里直到段的一生。同时,加载 fielddata 是一个昂贵的过程,它会引起用户经历延迟的攻击。这就是为什么 fielddata 默认被禁用的原因。
If you try to sort, aggregate, or access values from a script on a text field, you will see this exception:
如果你尝试排序、聚合或访问来自一个文本字段的脚本中的值,你讲看到这个例外:
Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
Fielddata 在文本字段中默认被禁用。Set fielddata=true on [your_field_name] 的目的是为了加载 fielddata 到内存里通过反逆转这个倒排索引。注意,这会使用大量的内存。
Before enabling fielddata
Before you enable fielddata, consider why you are using a text field for aggregations, sorting, or in a script. It usually doesn’t make sense to do so.
在你激活 fielddata 之前,就聚合、排序或在一个脚本里而言,思考为什么你要使用一个文本字段?这通常是没有意义的。
A text field is analyzed before indexing so that a value like New York can be found by searching for new or for york. A terms aggregation on this field will return a new bucket and a york bucket, when you probably want a single bucket called New York.
索引之前分析一个文本字段,以便通过搜索“new” 或 “york”能发现像“New York”这样的值。当你想要一个单一的New York桶的时候,在这个字段上的一个分词聚合将返回一个 new 桶和一个 york 桶。