Skip to content

ES结巴分词插件,修改自sing1ee/elasticsearch-jieba-plugin。根据自身需要修改为maven打包,添加mysql热更新字典。

License

Notifications You must be signed in to change notification settings

liusanp/elasticsearch-jieba-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

elasticsearch-jieba-plugin

Forked from:sing1ee/elasticsearch-jieba-plugin

特点

  • 支持动态添加字典,不重启ES。
  • 添加热更新自定义配置

支持动态添加字典,ES不需要重启

# 字典加载方式:mysql | local
loadType=local
# 字典加载时间:分钟。默认30分钟
gapTime=30
# Mysql连接
mysql.driver=com.mysql.cj.jdbc.Driver
mysql.url=jdbc:mysql://121.0.0.1:3306/test?serverTimezone=Asia/Shanghai&characterEncoding=utf-8&useSSL=true
mysql.username=root
mysql.password=root
  • 选择热更新本地字典时,新添加的字典文件需要不同文件名

more details

  • choose right version source code.
  • run
git clone https://github.com/liusanp/elasticsearch-jieba-plugin.git
mvn clean package
  • copy the zip file to plugin directory
cp target/releases/elasticsearch-jieba-plugin-6.8.8-bin.zip ${path.home}/plugins
  • unzip and rm zip file
unzip elasticsearch-jieba-plugin-6.8.8-bin.zip
rm elasticsearch-jieba-plugin-6.8.8-bin.zip
  • start elasticsearch
./bin/elasticsearch

Custom User Dict

Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:

小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq

Using stopwords

  • find stopwords.txt in ${path.home}/plugins/jieba/dic.
  • create folder named stopwords under ${path.home}/config
mkdir -p {path.home}/config/stopwords
  • copy stopwords.txt into the folder just created
cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords
  • create index:
PUT http://localhost:9200/jieba_index
{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonyms/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}
  • test analyzer:
PUT http://localhost:9200/jieba_index/_analyze
{
  "analyzer" : "my_ana",
  "text" : "黄河之水天上来"
}

Response as follow:

{
    "tokens": [
        {
            "token": "黄河",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "黄河之水天上来",
            "start_offset": 0,
            "end_offset": 7,
            "type": "word",
            "position": 0
        },
        {
            "token": "之水",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "天上",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "上来",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 2
        }
    ]
}

About

ES结巴分词插件,修改自sing1ee/elasticsearch-jieba-plugin。根据自身需要修改为maven打包,添加mysql热更新字典。

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages