Multiple query strings

The simplest multi-field query to deal with is the one where we can map search terms to specific fields. If we know that War and Peace'' is the title, andLeo Tolstoy'' is the author, it is easy to write each of these conditions as a match clause and to combine them with a bool query:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title":  "War and Peace" }},
        { "match": { "author": "Leo Tolstoy"   }}
      ]
    }
  }
}

The bool query takes a more-matches-is-better approach, so the score from each match clause will be added together to provide the final _score for each document. Documents that match both clauses will score higher than documents which match just one clause.

Of course, you’re not restricted to using just match clauses: the bool query can wrap any other query type, including other bool queries. We could add a clause to specify that we prefer to see versions of the book that have been translated by specific translators:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title":  "War and Peace" }},
        { "match": { "author": "Leo Tolstoy"   }},
        { "bool":  {
          "should": [
            { "match": { "translator": "Constance Garnett" }},
            { "match": { "translator": "Louise Maude"      }}
          ]
        }}
      ]
    }
  }
}

Why did we put the translator clauses inside a separate bool query? All four match queries are should clauses, so why didn’t we just put the translator clauses at the same level as the title and author clauses?

The answer lies in how the score is calculated. The bool query runs each match query, adds their scores together, then multiplies by the number of matching clauses, and divides by the total number of clauses. Each clause at the same level has the same weight. In the above query, the bool query containing the translator clauses counts for ⅓ of the total score. If we had put the translator clauses at the same level as title and author, then they would have reduced the contribution of the title and author clauses to ¼ each.

Prioritising clauses

It is likely that an even ⅓ split between clauses is not what we need for the above query. Probably we’re more interested in the title and author clauses then we are in the translator clauses. We need to tune the query to make the title and author clauses relatively more important.

The simplest weapon in our tuning arsenal is the boost parameter. To increase the weight of the title and author fields, give them a boost value higher than 1:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { (1)
            "title":  {
              "query": "War and Peace",
              "boost": 2
        }}},
        { "match": { (1)
            "author":  {
              "query": "Leo Tolstoy",
              "boost": 2
        }}},
        { "bool":  { (2)
            "should": [
              { "match": { "translator": "Constance Garnett" }},
              { "match": { "translator": "Louise Maude"      }}
            ]
        }}
      ]
    }
  }
}

The title and author clauses have a boost value of 2.
The nested bool clauses has the default boost of 1.

The `best'' value for the `boost parameter is most easily determined by trial and error: set a boost value, run test queries, repeat. A reasonable range for boost lies between 1 and 10, maybe 15. Boosts higher than that have little more impact because scores are normalized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05_Multiple_query_strings.asciidoc

05_Multiple_query_strings.asciidoc

Multiple query strings

Prioritising clauses

Files

05_Multiple_query_strings.asciidoc

Latest commit

History

05_Multiple_query_strings.asciidoc

File metadata and controls

Multiple query strings

Prioritising clauses