1.几个基本概念介绍
一个 Elasticsearch 集群可以包含多个索引,相应的每个索引可以包含多个类型。这些不同的类型存储着多个文档,每个文档又有多个属性。
- 索引(index)相当于关系型数据库中的dbname.
- 类型 (type) 相当于关系型数据库中的Table
- 文档(document)相当于关系型数据库中的记录
- 属性,相当于关系型数据库中的字段
2.实例介绍
对于雇员目录,我们将做如下操作:
- 每个雇员索引一个文档,包含该雇员的所有信息。
- 每个文档都将是
employee
类型。 - 该类型位于索引
megacorp
内。 - 该索引保存在我们的 Elasticsearch 集群中
注:一下实例中的命令都是curl的简写形式,
具体省略的包括curl -XPUT 'http://192.168.0.103:9200/megacorp/employee/1'
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
注意,路径 /megacorp/employee/1 包含了三部分的信息:
- megacorp 索引名称
- employee 类型名称
- 1 特定雇员的ID
让我们增加更多的员工信息到目录中:
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
3.检索文档
指定id参数会检索指定id的文档
GET /megacorp/employee/1
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
4.轻量检索
直接加_search
,返回结果包括了所有三个文档,放在数组 hits 中。一个搜索默认返回十条结果。返回结果不仅告知匹配了哪些文档,还包含了整个文档本身:显示搜索结果给最终用户所需的全部信息。
GET /megacorp/employee/_search
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
高亮搜索,在_search
后加上搜索参数q=''
GET /megacorp/employee/_search?q=last_name:Smith
{
...
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
5.查询表达式搜索
领域特定语言DSL指定了使用一个 JSON 请求。我们可以像这样重写之前的查询所有 Smith 的搜索
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
更复杂一些的查询:
GET /megacorp/employee/_search
{
"query" : {
"bool": {
"must": {
"match" : {
"last_name" : "smith"
}
},
"filter": {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}
现在结果只返回了一个雇员,叫 Jane Smith,32 岁。
{
...
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
7.全文搜索
以上介绍的都是简单的查询,现在尝试下稍微高级点儿的全文搜索
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
举个更加具体的Python示例:
def main():
# es = Elasticsearch(es_hosts)
es = Elasticsearch(es_hosts, http_auth=es_auth)
#res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
# res = es.search(index=index_name, body={"query": {"match_all": {}}})
res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
pprint(res)
if __name__ == '__main__':
main()
上述示例会查询出所有consignee字段匹配到张、前、程当个其中一个或多个的结果。
8.短语搜索
找出一个属性中的独立单词是没有问题的,但有时候想要精确匹配一系列单词或者短语。比如, 我们想执行这样一个查询,仅匹配同时包含 “rock”和“climbing” ,并且二者以短语 “rock climbing” 的形式紧挨着的雇员记录。
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
还是上面那个【python】示例,
def main():
# es = Elasticsearch(es_hosts)
es = Elasticsearch(es_hosts, http_auth=es_auth)
#res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
# res = es.search(index=index_name, body={"query": {"match_all": {}}})
# res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
res = es.search(index=index_name, body={"query": {"match_phrase": {"consignee": "张前程"}}})
pprint(res)
if __name__ == '__main__':
main()
上述示例仅仅匹配consignee字段为张前程的结果
9.高亮搜索
许多应用都倾向于在每个搜索结果中高亮部分文本片段,以便让用户知道为何该文档符合查询条件。在 Elasticsearch 中检索出高亮片段也很容易
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
上述【python】示例的版本
def main():
# es = Elasticsearch(es_hosts)
es = Elasticsearch(es_hosts, http_auth=es_auth)
#res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
# res = es.search(index=index_name, body={"query": {"match_all": {}}})
# res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
res = es.search(index=index_name, body={"query": {"match_phrase": {"consignee": "张前程"}}, "highlight":{"fields":{"consignee":{}}}})
pprint(res)
if __name__ == '__main__':
main()
查询结果中还多了一个叫做 highlight 的部分。
当然了,我们还可以指定标签
def main():
# es = Elasticsearch(es_hosts)
es = Elasticsearch(es_hosts, http_auth=es_auth)
#res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
# res = es.search(index=index_name, body={"query": {"match_all": {}}})
# res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
res = es.search(index=index_name, body={"query": {"match_phrase": {"consignee": "张前程"}}, "highlight":{"pre_tags" : ["<span>"],
"post_tags" : ["</span>"],"fields":{"consignee":{}}}})
pprint(res)
if __name__ == '__main__':
main()
还有更多高级用法,可查看高亮