Elasticsearch-基础

一些概念

索引-index：

一个索引就是一个拥有几分相似特征的文档的集合。比如说，你可以有一个客户数据的索引，另一个产品目录的索引，还有一个订单数据的索引。一个索引由一个名字来标识（必须全部是小写字母的），并且当我们要对对应于这个索引中的文档进行索引、搜索、更新和删除的时候，都要使用到这个名字

类型-type：

一个类型是你的索引的一个逻辑上的分类/分区，其语义完全由你来定。通常，会为具有一组共同字段的文档定义一个类型。比如说，我们假设你运营一个博客平台并且将你所有的数据存储到一个索引中。在这个索引中，你可以为用户数据定义一个类型，为博客数据定义另一个类型

文档-document：

一个文档是一个可被索引的基础信息单元。比如，你可以拥有某一个客户的文档，某一个产品的一个文档，当然，也可以拥有某个订单的一个文档。文档以JSON（Javascript Object Notation）格式来表示

分片和复制-shard&replicas：

CURL操作

健康检查： curl 'localhost:9200/_cat/health?v'

[root@localhost bin]#  curl 'localhost:9200/_cat/health?v'
epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1487839064 16:37:44  elk     yellow          1         1     16  16    0    0       16             0                  -                 50.0%

注：status：绿色代表一切正常（集群功能齐全），黄色意味着所有的数据都是可用的，但是某些复制没有被分配（集群功能齐全），红色则代表因为某些原因，某些数据不可用

节点查询：curl 'localhost:9200/_cat/nodes?v'

[root@localhost bin]# curl 'localhost:9200/_cat/nodes?v'
ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.16.24.105           45          59   1    0.00    0.01     0.05 mdi       *      node-jimmy

索引列表： curl 'localhost:9200/_cat/indices?v'

[root@localhost bin]#  curl 'localhost:9200/_cat/indices?v'
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   megacorp                    nh2ECFHoSJe5mcJinMT16A   5   1          5            0     27.8kb         27.8kb

注：pri:5个主分片，rep:1个复制，docs.count:5个文档

索引创建： curl -XPUT 'localhost:9200/customer?pretty'

[root@localhost bin]#   curl -XPUT 'localhost:9200/customer?pretty'
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

索引删除： curl -XDELETE 'localhost:9200/customer?pretty'

[root@localhost ~]#   curl -XDELETE 'localhost:9200/customer?pretty'
{
  "acknowledged" : true
}

文档创建： curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' {"name": "John Doe"}'

注：为了索引一个文档，我们必须告诉Elasticsearch这个文档要到这个索引的哪个类型（type）下
示例：将一个简单客户文档索引到customer索引、“external”类型中，这个文档的ID是1

[root@localhost bin]# curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' {"name": "John Doe"}'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}

注：不指定id的时候，使用POST，elasticsearch会自动生成一个ID
curl -XPOST 'localhost:9200/customer/external?pretty' -d ' {"name": "Jane Doe" }'

文档查询： curl -XGET 'localhost:9200/customer/external/1?pretty'

[root@localhost ~]#   curl -XGET 'localhost:9200/customer/external/1?pretty'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "John Doe"
  }
}

文档更新：curl -XPUT 'localhost:9200/customer/external/1?pretty' -d ' { "name": "Jane Doe" }'

[root@localhost ~]#  curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '  {  "name": "Jane Doe"  }'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : false
}
[root@localhost ~]# curl 'localhost:9200/_cat/indices?v'
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   customer                    AYGh32xWQSC3D6OWIwiVyQ   5   1          1            0        7kb            7kb
[root@localhost ~]#

注：version变成了2，docs.count个数没有变化

文档删除-单个： curl -XDELETE 'localhost:9200/customer/external/2?pretty'

注：指定删除的ID

[root@localhost ~]#  curl -XDELETE 'localhost:9200/customer/external/2?pretty'
{
  "found" : true,
  "_index" : "customer",
  "_type" : "external",
  "_id" : "2",
  "_version" : 3,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  }
}

文档删除-多个：
curl -XDELETE 'localhost:9200/customer/external/_query?pretty' -d '
{
"query": { "match": { "name": "John Doe" } }
}'

注：首先查询出所有name为John Doe的，然后一起删除

文档批处理-创建：创建一个Id为21和22的文档
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"index":{"_id":"21"}}
{"name": "John Doe" }
{"index":{"_id":"22"}}
{"name": "Jane Doe" } '

[root@localhost ~]#  curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
>         {"index":{"_id":"21"}}
>         {"name": "John Doe" }
>         {"index":{"_id":"22"}}
>         {"name": "Jane Doe" }
>         '
{
  "took" : 35,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "21",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "created" : true,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "22",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "created" : true,
        "status" : 201
      }
    }
  ]
}

文档批处理：一个更新，一个删除
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"update":{"_id":"21"}}
{"doc": { "name": "Jimmy" } }
{"delete":{"_id":"22"}}
'

[root@localhost ~]#  curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
>         {"update":{"_id":"21"}}
>         {"doc": { "name": "Jimmy" } }
>         {"delete":{"_id":"22"}}
>         '
{
  "took" : 29,
  "errors" : false,
  "items" : [
    {
      "update" : {
        "_index" : "customer",
        "_type" : "external",
        "_id" : "21",
        "_version" : 2,
        "result" : "updated",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "status" : 200
      }
    },
    {
      "delete" : {
        "found" : true,
        "_index" : "customer",
        "_type" : "external",
        "_id" : "22",
        "_version" : 2,
        "result" : "deleted",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "status" : 200
      }
    }
  ]
}

*注： bulk API按顺序执行这些动作。如果其中一个动作因为某些原因失败了，将会继续处理它后面的动作。当bulk API返回时，它将提供每个动作的状态（按照同样的顺序），所以你能够看到某个动作成功与否。

示例

从https://github.com/bly2k/files/blob/master/accounts.zip?raw=true 下载数据样本，解压上传并导入ES

curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @/usr/share/elasticsearch/accounts.json

查询API：_search

注：有两种基本的方式来运行搜索：一种是在REST请求的URI中发送搜索参数，另一种是将搜索参数发送到REST请求体中。请求体方法的表达能力更好，并且你可以使用更加可读的JSON格式来定义搜索。

① 加参数方式：curl 'localhost:9200/bank/_search?q=*&pretty' 返回bank索引中的所有的文档

注：_search:在bank索引中搜索,q=参数指示Elasticsearch去匹配这个索引中所有的文档*

[root@localhost ~]#     curl 'localhost:9200/bank/_search?q=*&pretty'
{
  "took" : 40,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "25",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 25,
          "balance" : 40540,
          "firstname" : "Virginia",
          "lastname" : "Ayala",
          "age" : 39,
          "gender" : "F",
          "address" : "171 Putnam Avenue",
          "employer" : "Filodyne",
          "email" : "virginiaayala@filodyne.com",
          "city" : "Nicholson",
          "state" : "PA"
        }
      },
   ,

返回参数说明：

   - took —— Elasticsearch执行这个搜索的耗时，以毫秒为单位
   - timed_out —— 指明这个搜索是否超时
   - _shards —— 指出多少个分片被搜索了，同时也指出了成功/失败的被搜索的shards的数量
   - hits —— 搜索结果
   - hits.total —— 能够匹配我们查询标准的文档的总数目
   - hits.hits —— 真正的搜索结果数据（默认只显示前10个文档）
   - _score和max_score —— 现在先忽略这些字段

②方法体方式： curl -XPOST 'localhost:9200/bank/_search?pretty' -d '{"query": { "match_all": {} } }'

注：query部分告诉我查询的定义，match_all部分就是我们想要运行的查询的类型。match_all查询，就是简单地查询一个指定索引下的所有的文档。

除了query参数，还可以指定其他参数：
1.size：返回多少条数据，不指定默认为10
[root@localhost ~]#  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match_all": {} },
          "size": 1
        }'

2.from：返回第11到第20个文档，不指定默认为0，与size结合使用分页
[root@localhost ~]#  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
          "query": { "match_all": {} },
          "from": 10,
          "size": 10
        }'

3.sort：排序，账户余额降序排序，返回前10个
[root@localhost ~]#    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match_all": {} },
          "sort": { "balance": { "order": "desc" } }
        }'

4._source：指定返回字段，此例子只返回account_number和balance
[root@localhost ~]#   curl -XGET 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": { "match_all": {} },
           "_source": ["account_number", "balance"]
         }'
{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "account",
        "_id" : "25",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 25,
          "balance" : 40540
        }
   }

5.match：指定匹配字段查询，此例返回账户编号为20的文档
[root@localhost ~]# curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
          "query": { "match": { "account_number": 20 } }
         }'
match：此例返回地址中包含“mill”或者包含“lane”的账户
[root@localhost ~]# curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match": { "address": "mill lane" } }
        }' 

6.match_phrase：此例匹配短语“mill lane”，此时只会查询出address为mill lane的文档
[root@localhost ~]#   curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
        {
          "query": { "match_phrase": { "address": "mill lane" } }
        }'
7. bool：布尔查询
bool must语句指明对于一个文档，所有的查询都必须为真，这个文档才能够匹配成功
此例查询返回包含“mill”和“lane”的所有的账户
[root@localhost ~]#    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "must": [
                 { "match": { "address": "mill" } },
                 { "match": { "address": "lane" } }
               ]
             }
           }
        }'
bool should语句指明对于一个文档，查询列表中，只要有一个查询匹配，那么这个文档就被看成是匹配的
此例查询返回包含“mill”或“lane”的所有的账户
[root@localhost ~]#    curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "should": [
                 { "match": { "address": "mill" } },
                 { "match": { "address": "lane" } }
               ]
             }
           }
         }'
bool must_not语句指明对于一个文档，查询列表中的所有查询都必须都不为真，这个文档才被认为是匹配的
[root@localhost ~]#  curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "must_not": [
                 { "match": { "address": "mill" } },
                 { "match": { "address": "lane" } }
               ]
             }
           }
         }'
可以在一个bool查询里一起使用must、should、must_not
此例返回40岁以上并且不生活在ID（daho）的人的账户
[root@localhost ~]#   curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
         {
           "query": {
             "bool": {
               "must": [
                 { "match": { "age": "40" } }
               ],
               "must_not": [
                 { "match": { "state": "ID" } }
               ]
             }
           }
         }'

过滤器

先前搜索结果中的_score字段这个得分是与我们指定的搜索查询匹配程度的一个相对度量。得分越高，文档越相关，得分越低文档的相关度越低。
Elasticsearch中的所有的查询都会触发相关度得分的计算。对于那些我们不需要相关度得分的场景下，Elasticsearch以过滤器的形式提供了另一种查询功能。
过滤器在概念上类似于查询，但是它们有非常快的执行速度，这种快的执行速度主要有以下两个原因

过滤器不会计算相关度的得分，所以它们在计算上更快一些
过滤器可以被缓存到内存中，这使得在重复的搜索查询上，其要比相应的查询快出许多

此例返回balance在【20000,30000】之间的账户
curl -XGET "http://localhost:9200/bank/_search" -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}'

最后编辑于：2017.12.06 00:35:15

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 200,527评论 5赞 470
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 84,314评论 2赞 377
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 147,535评论 0赞 332
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,006评论 1赞 272
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 62,961评论 5赞 360
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,220评论 1赞 277
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,664评论 3赞 392
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,351评论 0赞 254
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,481评论 1赞 294
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,397评论 2赞 317
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,443评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,123评论 3赞 315
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,713评论 3赞 303
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,801评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,010评论 1赞 255
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,494评论 2赞 346
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,075评论 2赞 341

Elasticsearch-基础

一些概念

CURL操作

示例

过滤器

推荐阅读更多精彩内容