1、能用term就不用match_phrase
The Lucene nightly benchmarks show that a simple term query is about 10 times as fast as a phrase query, and about 20 times as fast as a proximity query (a phrase query with slop).
term查询比match_phrase性能要快10倍,比带slop的match_phrase快20倍。
GET /my_index/my_type/_search
{
"query": {
"match_phrase": {
"title": "quick"
}
}
}
变为
GET /my_index/my_type/_search
{
"query": {
"term": {
"title": "quick"
}
}
}
2、如果查询条件与文档排序无关,则一定要用filter,既不用参与分数计算,还能缓存数据,加快下次查询。
比如说要查询类型为Ford,黄色的,名字包含dev的汽车,一般的查询语句应该如下:
GET /my_index/my_type/_search
{
"bool": {
"must": [
{
"term": {
"type": "ford"
}
},
{
"term": {
"color": "yellow"
}
},
{
"term": {
"name": "dev"
}
}
]
}
}
上述查询中类型和颜色同样参与了文档排名得分的计算,但是由于类型和颜色仅作为过滤条件,计算得分至于name的匹配相关。因此上述的查询是不合理且效率不高的。
GET /my_index/my_type/_search
{
"bool": {
"must": {
"term": {
"name": "dev"
}
},
"filter": [
{
"term": {
"type": "ford"
}
},
{
"term": {
"color": "yellow"
}
}]
}
}
3、如果对查出的数据的顺序没有要求,则可按照_doc排序,取数据时按照插入的顺序返回。
_doc has no real use-case besides being the most efficient sort order. So if you don’t care about the order in which documents are returned, then you should sort by _doc. This especially helps when scrolling. _doc to sort by index order.
GET /my_index/my_type/_search
{
"query": {
"term": {
"name": "dev"
}
},
"sort":[
"_doc"
]
}
4、随机取n条(n>=10000)数据
1)可以利用ES自带的方法random score查询。缺点慢,消耗内存。
GET /my_index/my_type/_search
{
"size": 10000,
"query": {
"function_score": {
"query": {
"term": {
"name": "dev"
}
},
"random_score": {
}
}
}
}
2)可以利用ES的脚本查询。缺点比random score少消耗点内存,但比random score慢。
GET /my_index/my_type/_search
{
"query": {
"term": {
"name": "dev"
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "Math.random()"
},
"order": "asc"
}
}
}
3)插入数据时,多加一个字段mark,该字段的值随机生成。查询时,对该字段排序即可。
GET /my_index/my_type/_search
{
"query": {
"term": {
"name": "dev"
}
},
"sort":[
"mark"
]
}
5、range Aggregations时耗时太长
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"ranges" : [
{ "from" : 10, "to" : 50 },
{ "from" : 50, "to" : 70 },
{ "from" : 70, "to" : 100 }
]
}
}
}
}
如例子所示,我们对[10,50),[50,70),[70,100)三个区间做了聚合操作。因为涉及到比较操作,数据量较大的情况下,可能会比较慢。
解决方案:在插入时,将要聚合的区间以keyword的形式写入索引中,查询时,对该字段做聚合即可。
假设price都小于100,插入的字段为mark,mark的值为10-50, 50-70, 70-100。
{
"aggs" : {
"genres" : {
"terms" : { "field" : "mark" }
}
}
}
6、查询空字符串
如果是要查字段是否存在或丢失,用Exists Query查询即可(exists, must_not exits)。
GET /_search
{
"query": {
"exists" : { "field" : "user" }
}
}
GET /_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "user"
}
}
}
}
}
这里指的是字段存在,且字段为“”的field。
curl localhost:9200/customer/_search?pretty -d'{
"size": 5,
"query": {
"bool": {
"must": {
"script": {
"script": {
"inline": "doc['\''strnickname'\''].length()<1",
"lang": "painless"
}
}
}
}
}
}'