DSL搜索
数据准备
-
自定义词库
- 马可波罗
- 马可
- 波罗
- 马
- 可
- 波
- 罗
建立索引 demeter_index
-
手动建立mappings
POST /demeter_index/_mapping { "properties": { "id": { "type": "long" }, "age": { "type": "integer" }, "username": { "type": "keyword" }, "nickname": { "type": "text", "analyzer": "ik_max_word", "fields": { "keyword": { "type": "keyword" } } }, "money": { "type": "float" }, "desc": { "type": "text", "analyzer": "ik_max_word" }, "sex": { "type": "byte" }, "birthday": { "type": "date" }, "face": { "type": "text", "index": false } } }
添加数据
POST /demeter_index/_doc/1001 { "id": 1001, "age": 18, "username": "demeter", "nickname": "马可", "money": 88.8, "desc": "我叫马可波罗,很马兴认识大家", "sex": 0, "birthday": "1992-12-24", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1002 { "id": 1002, "age": 19, "username": "sulliven", "nickname": "波罗", "money": 77.8, "desc": "今天太阳很大,马路上没有行人", "sex": 1, "birthday": "1993-01-24", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1003 { "id": 1003, "age": 20, "username": "paul", "nickname": "马可波罗", "money": 66.8, "desc": "马可波罗来中国历险", "sex": 1, "birthday": "1996-01-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1004 { "id": 1004, "age": 22, "username": "sky", "nickname": "云中君", "money": 55.8, "desc": "羊吃草,马儿跑", "sex": 0, "birthday": "1988-02-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1005 { "id": 1005, "age": 25, "username": "tiger", "nickname": "裴擒虎", "money": 155.8, "desc": "我今天玩了一局王者荣耀", "sex": 1, "birthday": "1989-03-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1006 { "id": 1006, "age": 19, "username": "misscodedemeter", "nickname": "小罗", "money": 156.8, "desc": "我叫罗某某,今年20岁,是一名学生", "sex": 1, "birthday": "1993-04-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1007 { "id": 1007, "age": 19, "username": "cat", "nickname": "小小", "money": 1056.8, "desc": "这是我第一天学习elasticsearch", "sex": 1, "birthday": "1985-05-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1008 { "id": 1008, "age": 19, "username": "mark", "nickname": "小天", "money": 1056.8, "desc": "大学毕业后,来到一家开发公司工作", "sex": 1, "birthday": "1995-06-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1009 { "id": 1009, "age": 22, "username": "tim", "nickname": "大菠萝", "money": 96.8, "desc": "阿罗在大学毕业后,考研究生去了", "sex": 1, "birthday": "1998-07-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1010 { "id": 1010, "age": 30, "username": "gaga", "nickname": "可心", "money": 100.8, "desc": "我在学习kibana", "sex": 1, "birthday": "1988-07-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1011 { "id": 1011, "age": 31, "username": "sprder", "nickname": "知事", "money": 180.8, "desc": "能让我尊重的新闻媒体不多了", "sex": 1, "birthday": "1989-08-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
POST /demeter_index/_doc/1012 { "id": 1012, "age": 31, "username": "super hero", "nickname": "super hero", "money": 188.8, "desc": "BatMan, GreenArrow, SpiderMan, IronMan... are all Super Hero", "sex": 1, "birthday": "1980-08-14", "face": "https://www.codedemeter.com/static/img/index/logo.png" }
入门语法
请求参数的查询(QueryString)
查询[字段]包含[内容]的文档
text与keyword搜索对比测试(keyword不会被倒排索引,不会被分词)username对应的是keyword,nickname对应的是text.
GET /demeter_index/_doc/_search?q=nickname:马克
GET /demeter_index/_doc/_search?q=username:meter
GET /demeter_index/_doc/_search?q=username:demeter
DSL基本语法
QueryString用的很少,一旦参数复杂就难以构建,所以大多数查询都会使用dsl来查询。
- Domain Specific Language (领域专用语言)
- 基于JSON格式的数据查询
- 查询更灵活,有利于复杂查询
DSL格式语法
#查询
POST /demeter_index/_doc/_search
{
"query":{
"match":{
"desc":"学习"
}
}
}
#判断某字段是否存在
POST /demeter_index/_doc/_search
{
"query": {
"exists": {
"field": "desc"
}
}
}
- 语法格式为一个json object,内容都是key-value键值对,可以嵌套
- key可以是es的关键字,也可以是某个field字段
查询与分页
查询所有
match_all
POST /demeter_index/_doc/_search
{
"query": {
"match_all": {}
}
}
只想显示一些field可以设置_source
POST /demeter_index/_doc/_search
{
"query": {
"match_all": {}
},
"_source": [
"id",
"nickname",
"age",
"desc"
]
}
分页查询,默认查询是只有10条记录,可以通过分页来展示,设置from(从第几条开始) size(查询几条)
POST /demeter_index/_doc/_search
{
"query": {
"match_all": {}
},
"_source": [
"id",
"nickname",
"age",
"desc"
],
"from": 0,
"size": 5
}
term与match区别
term精确搜索与match分词搜索
term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词,所以搜索必须是文档分词集合中的一个
POST /demeter_index/_doc/_search
{
"query": {
"term": {
"nickname":"马可"
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
查询到两条
match查询会对搜索词进行分词,只要搜索词的分词集合中的一个或多个存在与文档中就会被查询到
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"nickname":"马可"
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
查询到3条
terms 多个词语匹配检索
查询某个字段里含有多个关键词的文档
POST /demeter_index/_doc/_search
{
"query": {
"terms": {
"nickname":["马可","波罗"]
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
match_phrase
match_phrase 短语匹配,match分词后只要有匹配就返回,match_phrase分词结果必须在text字段分词中都包含,而且顺序必须相同,而且必须都是连续的。
POST /demeter_index/_doc/_search
{
"query": {
"match_phrase": {
"desc":{
"query":"第一天 学习"
}
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
slop:允许词语间跳过的数量
{
"query": {
"match_phrase": {
"desc":{
"query":"我 学习",
"slop": 1
}
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
match(operator)/ids
match扩展 operator
- or:搜索内容分词后,只要存在一个词语匹配就展示结果
- and:搜索内容分词后,都要满足词语匹配
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc":"我 学习"
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
#等同于
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc":{
"query":"我 学习",
"operator":"or"
}
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc":{
"query":"我 学习",
"operator":"and"
}
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
- minimum_should_macth:最低匹配精度,至少有[分词后的词语个数]x百分百,得出一个数据值取整,举个例子:当前属性设置为70,若一个用户查询检索内容分词后有10个词语,那么匹配度按照10x70%=7,则desc中至少有7个词语匹配,就展示,若分词后有8个词语,8x70%=5.6,则desc中至少需要5个词语匹配就展示。
- minimum_should_macth也可以设置具体的数字,表示个数
# 查询检索内容的分词结果
POST /_analyze
{
"analyzer": "ik_max_word",
"text": "我学习了redis和docker"
}
#分词后共有6个词语
#我 学习 了 redis 和 docker
#6*40%=2.4 需要满足两个词语匹配
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc":{
"query":"我学习了redis和docker",
"minimum_should_match":"40%"
}
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
# 需要满足两个词语匹配 结果如上图
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc":{
"query":"我学习了redis和docker",
"minimum_should_match":2
}
}
},
"_source": [
"id",
"nickname",
"desc"
]
}
ids 根据文档主键ids搜索
GET /demeter_index/_doc/1001
查询多个
POST /demeter_index/_doc/_search
{
"query": {
"ids": {
"type": "_doc",
"values": ["1001", "1005", "1011"]
}
}
}
multi_match/boost
multi_match:在多个字段中进行查询
POST /demeter_index/_doc/_search
{
"query": {
"multi_match": {
"query": "小小明爱学习",
"fields": ["desc", "nickname"]
}
}
}
boost:权重,为某个字段设置权重,权重越高,文档相关性得分越高。
#nickname^10代表nickname搜索提高了10倍相关性
POST /demeter_index/_doc/_search
{
"query": {
"multi_match": {
"query": "小小明爱学习",
"fields": ["desc", "nickname^10"]
}
}
}
布尔查询
must:返回的文档必须满足must子句的条件,并且参与计算分值
should:返回的文档可能满足should子句的条件。在一个Bool查询中,如果没有must或者filter,有一个或者多个should子句,那么只要满足一个就可以返回。
minimum_should_match
参数定义了至少满足几个子句。must_not:返回的文档必须不满足must_not定义的条件
POST /demeter_index/_doc/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "马可波罗",
"fields": [
"desc",
"nickname"
]
}
},
{
"term": {
"sex": 1
}
},
{
"term": {
"age": 19
}
}
]
}
}
}
改成should
改成****must_not****
组合使用
POST /demeter_index/_doc/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "马",
"fields": [
"desc",
"nickname"
]
}
}
],
"should": [
{
"match": {
"sex": 1
}
}
],
"must_not": [
{
"term": {
"age": 18
}
}
]
}
}
}
过滤器
对搜索出来的结果进行数据过滤,不会到es库里去搜,不会去计算文档的相关度分数,所以过滤的性能会比较高,过滤器可以和全文搜索结合在一起使用。
post_filter元素是以一个顶级元素,只会对搜索结果进行过滤,不会计算数据的匹配度相关性分数,不会根据分数去排序,query则相反,会计算分数,也会按照分数去排序。
query:根据用户搜索条件检索匹配记录
post_filter:用于查询后,对结果数据的筛选
- gte:大于等于
- lte:小于等于
- gt:大于
- lt:小于
POST /demeter_index/_doc/_search
{
"query": {
"multi_match": {
"query": "马",
"fields": [
"desc"
]
}
},
"post_filter":{
"range":{
"money":{
"gt":60,
"lt":80
}
}
}
}
排序
降序desc 升序asc
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc": "马克"
}
},
"post_filter":{
"range":{
"money":{
"gt":60,
"lt":80
}
}
},
"sort": [
{
"age": "desc"
},
{
"money": "desc"
}
]
}
对文本的排序
由于文本会被分词,所以往往要去做排序会报错,可以为这个字段增加额外的一个附属属性,类型为keyword,用于做排序。
# 在创建mappings时 设置
"nickname": {
"type": "text",
"analyzer": "ik_max_word",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc": "马克"
}
},
"post_filter":{
"range":{
"money":{
"gt":60,
"lt":80
}
}
},
"sort": [
{
"nickname.keyword": "desc"
}
]
}
高亮 highlight
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc": "马可"
}
},
"highlight": {
"fields": {
"desc": {}
}
}
}
自定义高亮标签
POST /demeter_index/_doc/_search
{
"query": {
"match": {
"desc": "马可"
}
},
"highlight": {
"pre_tags": [
"<tag>"
],
"post_tags": [
"</tag>"
],
"fields": {
"desc": {}
}
}
}
prefix/fuzzy/wildcard
prefix:前缀查询,
prefix
查询不做相关度评分计算,它只是将所有匹配的文档返回,并为每条结果赋予评分值 1 。它的行为更像是过滤器而不是查询。prefix
查询和prefix
过滤器这两者实际的区别就是过滤器是可以被缓存的,而查询不行。
POST /demeter_index/_doc/_search
{
"query": {
"prefix": {
"desc": "elas"
}
}
}
fuzzy:模糊搜索,并不是指的sql的模糊搜索,而是用户在进行搜索的时候的打字错误现象,搜索引擎会自动纠正,然后尝试匹配索引库中的数据。
POST /demeter_index/_doc/_search
{
"query": {
"fuzzy": {
"desc": "elasticsearhc"
}
}
}
fuzziness,你的搜索文本最多可以纠正几个字母去跟你的数据进行匹配,默认如果不设置,就是2
POST /demeter_index/_doc/_search
{
"query": {
"multi_match": {
"fields": [ "desc", "nickname"],
"query": "elasticsearchs",
"fuzziness": "auto"
}
}
}
wildcard:通配符查询
- ?:1个字符
- *:1个或多个字符
POST /demeter_index/_doc/_search
{
"query": {
"wildcard": {
"desc": "elastic*"
}
}
}
POST /demeter_index/_doc/_search
{
"query": {
"wildcard": {
"desc": "马?"
}
}
}