The simplest multi-field query to deal with is the one where we can map
search terms to specific fields. If we know that War and Peace'' is the
title, and
Leo Tolstoy'' is the author, it is easy to write each of these
conditions as a match
clause and to combine them with a bool
query:
GET /_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "War and Peace" }},
{ "match": { "author": "Leo Tolstoy" }}
]
}
}
}
The bool
query takes a more-matches-is-better approach, so the score from
each match
clause will be added together to provide the final _score
for
each document. Documents that match both clauses will score higher than
documents which match just one clause.
Of course, you’re not restricted to using just match
clauses: the bool
query can wrap any other query type, including other bool
queries. We could
add a clause to specify that we prefer to see versions of the book that have
been translated by specific translators:
GET /_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "War and Peace" }},
{ "match": { "author": "Leo Tolstoy" }},
{ "bool": {
"should": [
{ "match": { "translator": "Constance Garnett" }},
{ "match": { "translator": "Louise Maude" }}
]
}}
]
}
}
}
Why did we put the translator clauses inside a separate bool
query? All four
match
queries are should
clauses, so why didn’t we just put the translator
clauses at the same level as the title and author clauses?
The answer lies in how the score is calculated. The bool
query runs each
match
query, adds their scores together, then multiplies by the number of
matching clauses, and divides by the total number of clauses. Each clause at
the same level has the same weight. In the above query, the bool
query
containing the translator clauses counts for ⅓ of the total score. If we had
put the translator clauses at the same level as title and author, then they
would have reduced the contribution of the title and author clauses to ¼ each.
It is likely that an even ⅓ split between clauses is not what we need for the above query. Probably we’re more interested in the title and author clauses then we are in the translator clauses. We need to tune the query to make the title and author clauses relatively more important.
The simplest weapon in our tuning arsenal is the boost
parameter. To
increase the weight of the title
and author
fields, give them a boost
value higher than 1
:
GET /_search
{
"query": {
"bool": {
"should": [
{ "match": { (1)
"title": {
"query": "War and Peace",
"boost": 2
}}},
{ "match": { (1)
"author": {
"query": "Leo Tolstoy",
"boost": 2
}}},
{ "bool": { (2)
"should": [
{ "match": { "translator": "Constance Garnett" }},
{ "match": { "translator": "Louise Maude" }}
]
}}
]
}
}
}
-
The
title
andauthor
clauses have aboost
value of 2. -
The nested
bool
clauses has the defaultboost
of 1.
The `best'' value for the `boost
parameter is most easily determined by
trial and error: set a boost
value, run test queries, repeat. A reasonable
range for boost
lies between 1
and 10
, maybe 15
. Boosts higher than
that have little more impact because scores are
normalized.