elasticsearch-jieba-plugin

Forked from：sing1ee/elasticsearch-jieba-plugin

特点

支持动态添加字典，不重启ES。
添加热更新自定义配置

支持动态添加字典，ES不需要重启

# 字典加载方式：mysql | local
loadType=local
# 字典加载时间：分钟。默认30分钟
gapTime=30
# Mysql连接
mysql.driver=com.mysql.cj.jdbc.Driver
mysql.url=jdbc:mysql://121.0.0.1:3306/test?serverTimezone=Asia/Shanghai&characterEncoding=utf-8&useSSL=true
mysql.username=root
mysql.password=root

选择热更新本地字典时，新添加的字典文件需要不同文件名

more details

choose right version source code.
run

git clone https://github.com/liusanp/elasticsearch-jieba-plugin.git
mvn clean package

copy the zip file to plugin directory

cp target/releases/elasticsearch-jieba-plugin-6.8.8-bin.zip ${path.home}/plugins

unzip and rm zip file

unzip elasticsearch-jieba-plugin-6.8.8-bin.zip
rm elasticsearch-jieba-plugin-6.8.8-bin.zip

start elasticsearch

./bin/elasticsearch

Custom User Dict

Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:

小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq

Using stopwords

find stopwords.txt in ${path.home}/plugins/jieba/dic.
create folder named stopwords under ${path.home}/config

mkdir -p {path.home}/config/stopwords

copy stopwords.txt into the folder just created

cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords

create index:

PUT http://localhost:9200/jieba_index

{
  "settings": {
    "analysis": {
      "filter": {
        "jieba_stop": {
          "type":        "stop",
          "stopwords_path": "stopwords/stopwords.txt"
        },
        "jieba_synonym": {
          "type":        "synonym",
          "synonyms_path": "synonyms/synonyms.txt"
        }
      },
      "analyzer": {
        "my_ana": {
          "tokenizer": "jieba_index",
          "filter": [
            "lowercase",
            "jieba_stop",
            "jieba_synonym"
          ]
        }
      }
    }
  }
}

test analyzer:

PUT http://localhost:9200/jieba_index/_analyze
{
  "analyzer" : "my_ana",
  "text" : "黄河之水天上来"
}

Response as follow:

{
    "tokens": [
        {
            "token": "黄河",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "黄河之水天上来",
            "start_offset": 0,
            "end_offset": 7,
            "type": "word",
            "position": 0
        },
        {
            "token": "之水",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "天上",
            "start_offset": 4,
            "end_offset": 6,
            "type": "word",
            "position": 2
        },
        {
            "token": "上来",
            "start_offset": 5,
            "end_offset": 7,
            "type": "word",
            "position": 2
        }
    ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

elasticsearch-jieba-plugin

特点

支持动态添加字典，ES不需要重启

more details

Custom User Dict

Using stopwords

About

Releases 1

Packages

Languages

License

liusanp/elasticsearch-jieba-plugin

Folders and files

Latest commit

History

Repository files navigation

elasticsearch-jieba-plugin

特点

支持动态添加字典，ES不需要重启

more details

Custom User Dict

Using stopwords

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages