standard analyzer英文、数字按照空格来分词,中文直接使用一元分词,因此IP分词后作为一个term。例如 “the 192.168.0.1”,分词后为[the,192.168.0.1],如果搜索“192”就不会搜到。如果IP需要按照"."来分词,支持IP模糊匹配,搜索“192”可以搜到192.168.0.1,那么就需要自己定义analyzer,来看看如何自定义analyzer。
rest建立索引设置settings的方式
PUT /my_index
{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":["word_delimiter"]
}
}
}
},
"mappings":{
"my_type":{
"properties":{
"title": {
"type":"string",
"analyzer":"my_analyzer",
"search_analyzer":"my_analyzer"
}
}
}
}
}
}
通过java API建立索引设置settings的方式为:
CreateIndexRequest createIndexRequest = new CreateIndexRequest(fullIndexName);
createIndexRequest.source(mapping);
CreateIndexResponse res = admin.create(createIndexRequest).actionGet();
测试分词器的方式
curl -XGET 'http://localhost:9200/my_index/_analyze?pretty=1&analyzer=my_analyzer' -d '192.168.10.10'
比如分词我是huawei is 192.168.10.10
standard analyzer的结果为:
{
"tokens" : [ {
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
}, {
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
}, {
"token" : "huawei",
"start_offset" : 2,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "is",
"start_offset" : 9,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "192.168.10.10",
"start_offset" : 12,
"end_offset" : 25,
"type" : "<NUM>",
"position" : 4
} ]
}
my_analyzer的结果为:
{
"tokens" : [ {
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
}, {
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
}, {
"token" : "huawei",
"start_offset" : 2,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "is",
"start_offset" : 9,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "192",
"start_offset" : 12,
"end_offset" : 15,
"type" : "<NUM>",
"position" : 4
}, {
"token" : "168",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<NUM>",
"position" : 5
}, {
"token" : "10",
"start_offset" : 20,
"end_offset" : 22,
"type" : "<NUM>",
"position" : 6
}, {
"token" : "10",
"start_offset" : 23,
"end_offset" : 25,
"type" : "<NUM>",
"position" : 7
} ]
}
可以看出my_analyzer达到了我们的目的。