1. 安装插件

1.1 安装插件

拼音分词器:https://github.com/medcl/elasticsearch-analysis-pinyin
中文分词器:https://github.com/medcl/elasticsearch-analysis-ik

找到自己对应的自己的Elasticsearch版本的插件进行安装

  • Elasticsearch 7.5.1
  • elasticsearch-analysis-ik 7.5.1
  • elasticsearch-analysis-pinyin 7.5.1

直接进入Elasticsearch安装目录下,依次进行在线安装

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.5.1/elasticsearch-analysis-ik-7.5.1.zip

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.5.1/elasticsearch-analysis-pinyin-7.5.1.zip

安装完成后需要重启 elasticsearch,然后测试分词器是否OK,正常情况下会出现一堆分词结果

1.2 测试中文分词器

POST http://data:9200/_analyze
{
    "analyzer":"ik_smart",
    "text":"新型冠状病毒"
}

分词结果

{
    "tokens": [
        {
            "token": "新型",
            "start_offset": 0,
            "end_offset": 2,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "冠状病毒",
            "start_offset": 2,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 1
        }
    ]
}

1.3 测试拼音分词器

POST http://data:9200/_analyze
{
    "analyzer":"pinyin",
    "text":"新型冠状病毒"
}

分词结果

{
    "tokens": [
        {
            "token": "xin",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "xxgzbd",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 0
        },
        {
            "token": "xing",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 1
        },
        {
            "token": "guan",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 2
        },
        {
            "token": "zhuang",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 3
        },
        {
            "token": "bing",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 4
        },
        {
            "token": "du",
            "start_offset": 0,
            "end_offset": 0,
            "type": "word",
            "position": 5
        }
    ]
}

2. 修改解析器

修改分词器,以下所有操作均是对song 索引库进行的操作

2.1 关闭索引

首先关闭索引,否则会报错的

POST http://data:9200/song/_close
{

}

2.2 配置IK+拼音分词

然后自定义分词器,我这里使用的IK_SMART+拼音

PUT  http://data:9200/song/_settings
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_smart",
                    "filter": "pinyin_filter"
                }
            },
            "filter": {
                "pinyin_filter": {
                    "type": "pinyin",
                    "keep_first_letter": false
                }
            }
        }
    }
}

你也可以使用IK_MAX_WORD + 拼音分词

PUT  http://data:9200/song/_settings
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_max_word",
                    "filter": "pinyin_filter"
                }
            },
            "filter": {
                "pinyin_filter": {
                    "type": "pinyin",
                    "keep_first_letter": false
                }
            }
        }
    }
}

2.3 开启索引

POST http://data:9200/song/_open
{

}

3. 配置mapping(字段)

编辑中

最后修改:2020 年 02 月 03 日
如果觉得我的文章对你有用,请随意赞赏