If we could only search for one word at a time, full text search would be
pretty inflexible. Fortunately, the match
query makes multi-word queries
just as simple:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": "BROWN DOG!"
}
}
}
The above query returns all four documents in the results list:
{
"hits": [
{
"_id": "4",
"_score": 0.73185337, (1)
"_source": {
"title": "Brown fox brown dog"
}
},
{
"_id": "2",
"_score": 0.47486103, (2)
"_source": {
"title": "The quick brown fox jumps over the lazy dog"
}
},
{
"_id": "3",
"_score": 0.47486103, (2)
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "1",
"_score": 0.11914785, (3)
"_source": {
"title": "The quick brown fox"
}
}
]
}
-
Doc 4 is the most relevant because it contains
"brown"
twice and"dog"
once. -
Docs 2 and 3 both contain
"brown"
and"dog"
once each and thetitle
field is the same length in both docs, so they have the same score. -
Doc 1 matches even though it only contains
"brown"
, not"dog"
.
Because the match
query has to look for two terms — ["brown","dog"]
— internally it has to execute two term
queries and combine their individual
results into the overall result. To do this, it wraps the two term
queries
in a bool
query, which we will examine in detail in [bool-query] below.
The important thing to take away from the above is that any document whose
title
field contains at least one of the specified terms will match the
query. The more terms that match, the more relevant the document.
Matching any document which contains any of the query terms may result in a
long tail of seemingly irrelevant results. It’s a shotgun approach to search.
Perhaps we only want to show documents which contain all of the query terms.
In other words, instead of "brown OR dog"
we only want to return documents
that match "brown AND dog"
.
The match
query accepts an operator
parameter which defaults to "or"
.
You can change it to "and"
to require that all specified terms must match:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": { (1)
"query": "BROWN DOG!",
"operator": "and"
}
}
}
}
-
The structure of the
match
query has to change slightly in order to accommodate theoperator
parameter.
This query would exclude document 1 which only contains one of the two terms.
The choice between all and any is a bit too black-or-white. What if the
user specified five query terms and a document contains only four of them?
Setting "operator"
to "and"
would exclude this document.
Sometimes that is exactly what you want, but for most full-text search use cases, you want to include documents which may be relevant but exclude those that are unlikely to be relevant. In other words, we need something in-between.
The match
query supports the minimum_should_match
parameter which allows
you to specify how many terms must match for a document to be considered
relevant. While you can specify an absolute number of terms, it usually makes
sense to specify a percentage instead as you have no control over how many
words the user may enter:
GET /my_index/my_type/_search
{
"query": {
"match": {
"title": {
"query": "quick brown dog",
"minimum_should_match": "75%"
}
}
}
}
When specified as a percentage, minimum_should_match
does the right thing:
in the example above with three terms, 75%
would be rounded down to 66.6%
or two out of the three terms. No matter what you set it to, at least one term
must match for a document to be considered a match.
The minimum_should_match
parameter is very flexible and different rules can
be applied depending on the number of terms the user enters. For the full
documentation see the
{ref}/query-dsl-minimum-should-match.html[minimum_should_match
reference documentation].
To fully understand how the match
query handles multi-word queries, we need
to look at how to combine multiple queries with the bool
query.