搜索数据建立
ElasticSearch最诱人的地方即是为我们提供了方便快捷的搜索功能,我们首先尝试使用如下的命令创建测试文档:
curl -XPUT "http://localhost:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": ["Crime", "Drama"]
}'
curl -XPUT "http://localhost:9200/movies/movie/2" -d'
{
"title": "Lawrence of Arabia",
"director": "David Lean",
"year": 1962,
"genres": ["Adventure", "Biography", "Drama"]
}'
curl -XPUT "http://localhost:9200/movies/movie/3" -d'
{
"title": "To Kill a Mockingbird",
"director": "Robert Mulligan",
"year": 1962,
"genres": ["Crime", "Drama", "Mystery"]
}'
curl -XPUT "http://localhost:9200/movies/movie/4" -d'
{
"title": "Apocalypse Now",
"director": "Francis Ford Coppola",
"year": 1979,
"genres": ["Drama", "War"]
}'
curl -XPUT "http://localhost:9200/movies/movie/5" -d'
{
"title": "Kill Bill: Vol. 1",
"director": "Quentin Tarantino",
"year": 2003,
"genres": ["Action", "Crime", "Thriller"]
}'
curl -XPUT "http://localhost:9200/movies/movie/6" -d'
{
"title": "The Assassination of Jesse James by the Coward Robert Ford",
"director": "Andrew Dominik",
"year": 2007,
"genres": ["Biography", "Crime", "Drama"]
}'
这里需要了解的是,ElasticSearch为我们提供了通用的_bulk端点来在单请求中完成多文档创建操作,不过这里为了简单起见还是分为了多个请求进行执行。
ElasticSearch中搜索主要是基于_search
这个端点进行的,其标准请求格式为:<index>/<type>/_search</type></index>
,其中index与type都是可选的。
换言之,我们可以以如下几种方式发起请求:
- http://localhost:9200/_search... - 搜索所有的Index与Type
- http://localhost:9200/movies/... - 搜索Movies索引下的所有类型
- http://localhost:9200/movies/movie... -仅搜索包含在Movies索引Movie类型下的文档
响应内容会包含文档的元信息,文档的原始数据存在 _source 字段中。
检索某个文档
我们也可以直接检索出文档的 _source 字段,如下:
curl -XGET 'http://localhost:9200/movies/movie/1/_source'
返回的结果:
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": ["Crime", "Drama"]
}
检索所有文档
我们可以使用 _search 这个 API 检索出所有的文档,命令如下:
curl -XGET 'http://localhost:9200/movies/movie/_search'
返回的结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "movies",
"_type": "movie",
"_id": "5",
"_score": 1,
"_source": {
"title": "Kill Bill: Vol. 1",
"director": "Quentin Tarantino",
"year": 2003,
"genres": [
"Action",
"Crime",
"Thriller"
]
}
},
{
"_index": "movies",
"_type": "movie",
"_id": "2",
"_score": 1,
"_source": {
"title": "Lawrence of Arabia",
"director": "David Lean",
"year": 1962,
"genres": [
"Adventure",
"Biography",
"Drama"
]
}
},
{
"_index": "movies",
"_type": "movie",
"_id": "4",
"_score": 1,
"_source": {
"title": "Apocalypse Now",
"director": "Francis Ford Coppola",
"year": 1979,
"genres": [
"Drama",
"War"
]
}
},
{
"_index": "movies",
"_type": "movie",
"_id": "6",
"_score": 1,
"_source": {
"title": "The Assassination of Jesse James by the Coward Robert Ford",
"director": "Andrew Dominik",
"year": 2007,
"genres": [
"Biography",
"Crime",
"Drama"
]
}
},
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_score": 1,
"_source": {
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}
},
{
"_index": "movies",
"_type": "movie",
"_id": "3",
"_score": 1,
"_source": {
"title": "To Kill a Mockingbird",
"director": "Robert Mulligan",
"year": 1962,
"genres": [
"Crime",
"Drama",
"Mystery"
]
}
}
]
}
}
可以看到,hits
这个 object
包含了 total
,hits
数组等字段,其中,hits
数组包含了所有的文档,这里只有两个文档,total
表明了文档的数量,默认情况下会返回前 10 个结果。我们也可以设定 From/Size
参数来获取某一范围的文档,可参考这里,比如:
curl -XGET 'http://localhost:9200/movies/movie/_search?from=1&size=2'
返回的结果如下:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "movies",
"_type": "movie",
"_id": "2",
"_score": 1,
"_source": {
"title": "Lawrence of Arabia",
"director": "David Lean",
"year": 1962,
"genres": [
"Adventure",
"Biography",
"Drama"
]
}
},
{
"_index": "movies",
"_type": "movie",
"_id": "4",
"_score": 1,
"_source": {
"title": "Apocalypse Now",
"director": "Francis Ford Coppola",
"year": 1979,
"genres": [
"Drama",
"War"
]
}
}
]
}
}
检索某些字段
有时候,我们只需检索文档的个别字段,这时可以使用 _source 参数,多个字段可以使用逗号分隔,如下所示:
curl -XGET 'http://localhost:9200/movies/movie/1?_source=title,director'
返回的结果:
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"director": "Francis Ford Coppola",
"title": "The Godfather"
}
}
query string 搜索
query string 搜索以 q=field:value
的形式进行查询,比如查询 title
字段含有 godfather
的电影:
curl -XGET 'http://localhost:9200/movies/movie/_search?q=title:godfather'
返回的结果:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.25811607,
"hits": [
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_score": 0.25811607,
"_source": {
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}
}
]
}
}
DSL 搜索
上面的 query string 搜索比较轻量级,只适用于简单的场合。Elasticsearch 提供了更为强大的 DSL(Domain Specific Language)查询语言,适用于复杂的搜索场景,比如全文搜索。我们可以将上面的 query string 搜索转换为 DSL 搜索,如下:
GET /movies/movie/_search
{
"query" : {
"match" : {
"title" : "godfather"
}
}
}
使用 curl请求:
curl -X GET "127.0.0.1:9200/movies/movie/_search" -d '{"query": {"match": {"title": "godfather"}}}'
最简单的查询请求即是全文检索,譬如我们这里需要搜索关键字:godfather:
搜索包含“godfather”的关键字:
curl -XPOST "http://localhost:9200/_search" -d'
{
"query": {
"query_string": {
"query": "godfather",
}
}
}'
在title中搜索包含“godfather”的关键字
curl -XPOST "http://localhost:9200/_search" -d'
{
"query": {
"query_string": {
"query": "godfather",
"fields": ["title"]
}
}
}'
返回的结果:
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 25,
"successful": 25,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.25811607,
"hits": [
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_score": 0.25811607,
"_source": {
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}
}
]
}
}
检查文档是否存在
如果你想做的只是检查文档是否存在——你对内容完全不感兴趣——使用HEAD方法来代替GET。HEAD请求不会返回响应体,只有HTTP头:
curl -i -XHEAD "http://localhost:9200/movies/movie/3"
Elasticsearch将会返回200 OK状态如果你的文档存在:
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 255
如果不存在返回404 Not Found:
curl -i -XHEAD "http://localhost:9200/movies/movie/36"
HTTP/1.1 404 Not Found
content-type: application/json; charset=UTF-8
content-length: 60
当然,这只表示你在查询的那一刻文档不存在,但并不表示几毫秒后依旧不存在。另一个进程在这期间可能创建新文档。