Skip to content

Commit

Permalink
update 070_Index_Mgmt/05_Create_Delete.md 070_Index_Mgmt/10_Settings.…
Browse files Browse the repository at this point in the history
…md 070_Index_Mgmt/15_Configure_Analyzer.md
  • Loading branch information
sailxjx committed Apr 2, 2015
1 parent 52ae342 commit 110bcda
Show file tree
Hide file tree
Showing 7 changed files with 72 additions and 126 deletions.
2 changes: 1 addition & 1 deletion 040_Distributed_CRUD/35_Bulk_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

批量中每个引用的文档属于不同的主分片,每个分片可能被分布于集群中的某个节点上。这意味着批量中的每个**操作(action)**需要被转发到对应的分片和节点上。

如果每个单独的请求被包装到JSON数组中国,那意味着我们需要:
如果每个单独的请求被包装到JSON数组中,那意味着我们需要:

* 解析JSON为数组(包括文档数据,可能非常大)
* 检查每个请求决定应该到哪个分片上
Expand Down
2 changes: 1 addition & 1 deletion 050_Search/15_Pagination.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ GET /_search?size=5&from=10
> ### TIP
>在《重建索引》章节我们将阐述如何能高效的检索大量。文档
>在《重建索引》章节我们将阐述如何能高效的检索大量文档
20 changes: 8 additions & 12 deletions 052_Mapping_Analysis/50_Complex_datatypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,9 @@

### 内部对象的映射

Elasticsearch will detect new object fields dynamically and map them as
type `object`, with each inner field listed under `properties`:
Elasticsearch 会动态的检测新对象的字段,并且映射它们为 `object` 类型,将每个字段加到 `properties` 字段下

[source,js]
--------------------------------------------------
```javascript
{
"gb": {
"tweet": { <1>
Expand All @@ -87,7 +85,8 @@ type `object`, with each inner field listed under `properties`:
}
}
}
--------------------------------------------------
```

<1> 根对象.
<2> 内部对象.

Expand All @@ -98,16 +97,15 @@ _root object_. It is just the same as any other object, except that it has
some special top-level fields for document metadata, like `_source`,
the `_all` field etc.

`user``name`字段的映射与`tweet`类型自己很相似。事实上,`type`映射只是`object`映射的一种特殊类型,
`user``name`字段的映射与`tweet`类型自己很相似。事实上,`type`映射只是`object`映射的一种特殊类型,我们将 `object` 称为_根对象_。它与其他对象一模一样,除非它有一些特殊的顶层字段,比如 `_source`, `_all` 等等。

==== How inner objects are indexed
### 内部对象是怎样被映射的

Lucene doesn't understand inner objects. A Lucene document consists of a flat
list of key-value pairs. In order for Elasticsearch to index inner objects
usefully, it converts our document into something like this:

[source,js]
--------------------------------------------------
```javascript
{
"tweet": [elasticsearch, flexible, very],
"user.id": [@johnsmith],
Expand All @@ -117,8 +115,7 @@ usefully, it converts our document into something like this:
"user.name.first": [john],
"user.name.last": [smith]
}
--------------------------------------------------

```

_Inner fields_ can be referred to by name, eg `"first"`. To distinguish
between two fields that have the same name we can use the full _path_,
Expand Down Expand Up @@ -172,4 +169,3 @@ but we can't get an accurate answer to:
Correlated inner objects, which are able to answer queries like these,
are called _nested_ objects, and we will discuss them later on in
<<nested-objects>>.

63 changes: 22 additions & 41 deletions 070_Index_Mgmt/05_Create_Delete.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
=== Creating an Index
### 创建索引

Until now, we have created a new index((("indices", "creating"))) by simply indexing a document into it. The index is created with the default settings, and new fields are added to the type mapping by using dynamic mapping. Now we need more control over the process: we want to ensure that the index has been created with the appropriate number of primary shards, and that analyzers and mappings are set up _before_ we index any data.
迄今为止,我们简单的通过添加一个文档的方式创建了一个索引。这个索引使用默认设置,新的属性通过动态映射添加到分类中。现在我们需要对这个过程有更多的控制:我们需要确保索引被创建在适当数量的分片上,在索引数据_之前_设置好分析器和类型映射。

To do this, we have to create the index manually, passing in any settings or
type mappings in the request body, as follows:
为了达到目标,我们需要手动创建索引,在请求中加入所有设置和类型映射,如下所示:

[source,js]
--------------------------------------------------
```
PUT /my_index
{
"settings": { ... any settings ... },
Expand All @@ -15,53 +13,36 @@ PUT /my_index
"type_two": { ... any mappings ... },
...
}
}
--------------------------------------------------
```

事实上,你可以通过在 `config/elasticsearch.yml` 中添加下面的配置来防止自动创建索引。

In fact, if you want to, you ((("indices", "preventing automatic creation of")))can prevent the automatic creation of indices by
adding the following setting to the `config/elasticsearch.yml` file on each
node:

[source,js]
--------------------------------------------------
```yml
action.auto_create_index: false
--------------------------------------------------
```
[NOTE]
====
Later, we discuss how you can use <<index-templates>> to preconfigure
automatically created indices. This is particularly useful when indexing log
data: you log into an index whose name includes the date and, as midnight
rolls over, a new properly configured index automatically springs into
existence.
====
> **NOTE**
=== Deleting an Index
> 今后,我们将介绍怎样用<<索引模板>>来自动预先配置索引。这在索引日志数据时尤其有效:
> 你将日志数据索引在一个以日期结尾的索引上,第二天,一个新的配置好的索引会自动创建好。
To delete an index, use ((("HTTP methods", "DELETE")))((("DELETE method", "deleting indices")))((("indices", "deleting")))the following request:
### 删除索引
[source,js]
--------------------------------------------------
DELETE /my_index
--------------------------------------------------
使用以下的请求来删除索引:
```
DELETE /my_index
```

You can delete multiple indices with this:
你也可以用下面的方式删除多个索引

[source,js]
--------------------------------------------------
```
DELETE /index_one,index_two
DELETE /index_*
--------------------------------------------------

```

You can even delete _all_ indices with this:
你甚至可以删除所有索引

[source,js]
--------------------------------------------------
```
DELETE /_all
--------------------------------------------------



```
45 changes: 17 additions & 28 deletions 070_Index_Mgmt/10_Settings.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,40 @@
=== Index Settings
### 索引设置

There are many many knobs((("index settings"))) that you can twiddle to
customize index behavior, which you can read about in the
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_index_settings.html#_index_settings[Index Modules reference documentation],
but...
你可以通过很多种方式来自定义索引行为,你可以阅读[Index Modules reference documentation](http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_index_settings.html#_index_settings),但是:

TIP: Elasticsearch comes with good defaults. Don't twiddle these knobs until
you understand what they do and why you should change them.
提示: Elasticsearch 提供了优化好的默认配置。除非你明白这些配置的行为和为什么要这么做,请不要修改这些配置。

Two of the most important((("shards", "number_of_shards index setting")))((("number_of_shards setting")))((("index settings", "number_of_shards"))) settings are as follows:
下面是两个最重要的设置:

`number_of_shards`::
`number_of_shards`

The number of primary shards that an index should have,
which defaults to `5`. This setting cannot be changed
after index creation.
定义一个索引的主分片个数,默认值是 `5`。这个配置在索引创建后不能修改。

`number_of_replicas`::
`number_of_replicas`

The number of replica shards (copies) that each primary shard
should have, which defaults to `1`. This setting can be changed
at any time on a live index.
每个主分片的复制分片个数,默认是 `1`。这个配置可以随时在活跃的索引上修改。

For instance, we could create a small index--just((("index settings", "number_of_replicas")))((("replica shards", "number_of_replicas index setting"))) one primary shard--and no replica shards with the following request:
例如,我们可以创建只有一个主分片,没有复制分片的小索引。

[source,js]
--------------------------------------------------
```
PUT /my_temp_index
{
"settings": {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
--------------------------------------------------
// SENSE: 070_Index_Mgmt/10_Settings.json
```

Later, we can change the number of replica shards dynamically using the
`update-index-settings` API as((("update-index-settings API"))) follows:
<!-- SENSE: 070_Index_Mgmt/10_Settings.json -->

[source,js]
--------------------------------------------------
然后,我们可以用 `update-index-settings` API 动态修改复制分片个数:

```
PUT /my_temp_index/_settings
{
"number_of_replicas": 1
}
--------------------------------------------------
// SENSE: 070_Index_Mgmt/10_Settings.json

```

<!-- SENSE: 070_Index_Mgmt/10_Settings.json -->
63 changes: 22 additions & 41 deletions 070_Index_Mgmt/15_Configure_Analyzer.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,21 @@
[[configuring-analyzers]]
=== Configuring Analyzers
### 配置分析器

The third important index setting is the `analysis` section,((("index settings", "analysis"))) which is used
to configure existing analyzers or to create new custom analyzers
specific to your index.
第三个重要的索引设置是 `analysis` 部分,用来配置已存在的分析器或创建自定义分析器来定制化你的索引。

In <<analysis-intro>>, we introduced some of the built-in ((("analyzers", "built-in")))analyzers,
which are used to convert full-text strings into an inverted index,
suitable for searching.
在<<分析器介绍>>中,我们介绍了一些内置的分析器,用于将全文字符串转换为适合搜索的倒排索引。

The `standard` analyzer, which is the default analyzer
used for full-text fields,((("standard analyzer", "components of"))) is a good choice for most Western languages.((("tokenization", "in standard analyzer")))((("standard token filter")))((("stop token filter")))((("standard tokenizer")))((("lowercase token filter")))
It consists of the following:
`standard` 分析器是用于全文字段的默认分析器,对于大部分西方语系来说是一个不错的选择。它考虑了以下几点:

* The `standard` tokenizer, which splits the input text on word boundaries
* The `standard` token filter, which is intended to tidy up the tokens
emitted by the tokenizer (but currently does nothing)
* The `lowercase` token filter, which converts all tokens into lowercase
* The `stop` token filter, which removes stopwords--common words
that have little impact on search relevance, such as `a`, `the`, `and`,
`is`.
* `standard` 分词器,在词层级上分割输入的文本。
* `standard` 过滤器,被设计用来整理分词器触发的所有表征(但是目前什么都没做)。
* `lowercase` 过滤器,将所有表征转换为小写。
* `stop` 过滤器,删除所有可能会造成搜索歧义的停用词,如 `a``the``and``is`

By default, the stopwords filter is disabled. You can enable it by creating a
custom analyzer based on the `standard` analyzer and setting the `stopwords`
parameter.((("stopwords parameter"))) Either provide a list of stopwords or tell it to use a predefined
stopwords list from a particular language.
默认情况下,停用词过滤器是被禁用的。如需启用它,你可以通过创建一个基于 `standard` 分析器的自定义分析器,并且设置 `stopwords` 参数。可以提供一个停用词列表,或者使用一个特定语言的预定停用词列表。

In the following example, we create a new analyzer called the `es_std`
analyzer, which uses the predefined list of ((("Spanish", "analyzer using Spanish stopwords")))Spanish stopwords:
在下面的例子中,我们创建了一个新的分析器,叫做 `es_std`,并使用预定义的西班牙语停用词:

[source,js]
--------------------------------------------------
```
PUT /spanish_docs
{
"settings": {
Expand All @@ -44,31 +29,27 @@ PUT /spanish_docs
}
}
}
--------------------------------------------------
// SENSE: 070_Index_Mgmt/15_Configure_Analyzer.json
```

The `es_std` analyzer is not global--it exists only in the `spanish_docs`
index where we have defined it. To test it with the `analyze` API, we must
specify the index name:
<!-- SENSE: 070_Index_Mgmt/15_Configure_Analyzer.json -->

[source,js]
--------------------------------------------------
`es_std` 分析器不是全局的,它仅仅存在于我们定义的 `spanish_docs` 索引中。为了用 `analyze` API 来测试它,我们需要使用特定的索引名。

```
GET /spanish_docs/_analyze?analyzer=es_std
El veloz zorro marrón
--------------------------------------------------
// SENSE: 070_Index_Mgmt/15_Configure_Analyzer.json
```

<!-- SENSE: 070_Index_Mgmt/15_Configure_Analyzer.json -->

The abbreviated results show that the Spanish stopword `El` has been
removed correctly:
下面简化的结果中显示停用词 `El` 被正确的删除了:

[source,js]
--------------------------------------------------
```
{
"tokens" : [
{ "token" : "veloz", "position" : 2 },
{ "token" : "zorro", "position" : 3 },
{ "token" : "marrón", "position" : 4 }
]
}
--------------------------------------------------

```
3 changes: 1 addition & 2 deletions 070_Index_Mgmt/20_Custom_Analyzers.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
[[custom-analyzers]]
=== Custom Analyzers
### 自定义分析器

While Elasticsearch comes with a number of analyzers available out of the box,
the real power comes from the ability to create your own custom analyzers
Expand Down

0 comments on commit 110bcda

Please sign in to comment.