Skip to content

Latest commit

 

History

History
158 lines (132 loc) · 5.13 KB

10_Multi_word_queries.asciidoc

File metadata and controls

158 lines (132 loc) · 5.13 KB

Multi-word queries

If we could only search for one word at a time, full text search would be pretty inflexible. Fortunately, the match query makes multi-word queries just as simple:

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "BROWN DOG!"
        }
    }
}

The above query returns all four documents in the results list:

{
  "hits": [
     {
        "_id":      "4",
        "_score":   0.73185337, (1)
        "_source": {
           "title": "Brown fox brown dog"
        }
     },
     {
        "_id":      "2",
        "_score":   0.47486103, (2)
        "_source": {
           "title": "The quick brown fox jumps over the lazy dog"
        }
     },
     {
        "_id":      "3",
        "_score":   0.47486103, (2)
        "_source": {
           "title": "The quick brown fox jumps over the quick dog"
        }
     },
     {
        "_id":      "1",
        "_score":   0.11914785, (3)
        "_source": {
           "title": "The quick brown fox"
        }
     }
  ]
}
  1. Doc 4 is the most relevant because it contains "brown" twice and "dog" once.

  2. Docs 2 and 3 both contain "brown" and "dog" once each and the title field is the same length in both docs, so they have the same score.

  3. Doc 1 matches even though it only contains "brown", not "dog".

Because the match query has to look for two terms — ["brown","dog"] — internally it has to execute two term queries and combine their individual results into the overall result. To do this, it wraps the two term queries in a bool query, which we will examine in detail in [bool-query] below.

The important thing to take away from the above is that any document whose title field contains at least one of the specified terms will match the query. The more terms that match, the more relevant the document.

Improving precision

Matching any document which contains any of the query terms may result in a long tail of seemingly irrelevant results. It’s a shotgun approach to search. Perhaps we only want to show documents which contain all of the query terms. In other words, instead of "brown OR dog" we only want to return documents that match "brown AND dog".

The match query accepts an operator parameter which defaults to "or". You can change it to "and" to require that all specified terms must match:

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": {      (1)
                "query":    "BROWN DOG!",
                "operator": "and"
            }
        }
    }
}
  1. The structure of the match query has to change slightly in order to accommodate the operator parameter.

This query would exclude document 1 which only contains one of the two terms.

Controlling precision

The choice between all and any is a bit too black-or-white. What if the user specified five query terms and a document contains only four of them? Setting "operator" to "and" would exclude this document.

Sometimes that is exactly what you want, but for most full-text search use cases, you want to include documents which may be relevant but exclude those that are unlikely to be relevant. In other words, we need something in-between.

The match query supports the minimum_should_match parameter which allows you to specify how many terms must match for a document to be considered relevant. While you can specify an absolute number of terms, it usually makes sense to specify a percentage instead as you have no control over how many words the user may enter:

GET /my_index/my_type/_search
{
  "query": {
    "match": {
      "title": {
        "query":                "quick brown dog",
        "minimum_should_match": "75%"
      }
    }
  }
}

When specified as a percentage, minimum_should_match does the right thing: in the example above with three terms, 75% would be rounded down to 66.6% or two out of the three terms. No matter what you set it to, at least one term must match for a document to be considered a match.

The minimum_should_match parameter is very flexible and different rules can be applied depending on the number of terms the user enters. For the full documentation see the {ref}/query-dsl-minimum-should-match.html[minimum_should_match reference documentation].

To fully understand how the match query handles multi-word queries, we need to look at how to combine multiple queries with the bool query.