Query DSL

Join Query

The join filter enables the filtering of one set of documents (the target) with another one (the source) based on shared field values. It accepts the following parameters:

type

The type of the join algorithm to use. Valid values are either BROADCAST_JOIN or HASH_JOIN. If this parameter is not specified, the query planner will try to automatically select the optimal one.

indices

The index names for the child collection. Multiple indices can be specified using the Elasticsearch syntax. Defaults to all indices.

on

An array of two elements that specifies the field paths for the join keys in the parent and the child collection, respectively. collection. Both fields must have the same datatype and must have the parameter doc_values set to true.

request

The search request that is used to compute the set of documents of the child collection before performing the join.

Example

In this example, we will join all the documents from target_index with the documents of source_index using the HASH_JOIN algorithm. The query first filters documents from source_index and of type type with the query { "terms" : { "tag" : [ "aaa" ] } }. It then retrieves the ids of the documents from the field id specified by the parameter on. The list of ids is then used as filter and applied on the field foreign_key of the documents from target_index.

GET /siren/target_index/_search
{
  "query" : {
    "join" : {
      "type": "HASH_JOIN",
      "indices" : ["source_index"],
      "on" : ["foreign_key", "id"],
      "request" : { (1)
        "query" : {
          "terms" : {
            "tag" : [ "aaa" ]
          }
        }
      }
    }
  }
}
1 The search request that will be used to filter out the source set (i.e. source_index)

Scoring Capabilities

The join filter returns a constant score. Therefore, the scores of the matching documents from the child collection do not affect the scores of the matching documents from the parent collection. However, one can project the document’s score from the child collection and customize the scoring of the documents from the parent collection with a script score function.

Compatibility with Nested Query

The join filter within a nested query is supported. The join key must specify the field path within the scope of the nested object. For example, as shown below, the join key must be foreign_key and not nested_obj.foreign_key.

GET /siren/target_index/_search
{
  "query" : {
    "nested" : {
      "path" : "nested_obj",
      "query" : {
        "join" : {
          "indices" : ["source_index"],
          "on" : ["foreign_key", "id"],
          "request" : {
            "query" : {
              "match_all" : {}
            }
          }
        }
      }
    }
  }
}

A nested query within a join filter is also supported if and only if the join key does not refer to a field of the nested object.

Project

When joining a child collection with a parent collection, the fields from the child collection may be projected to the parent collection. The projected fields and associated values are mapped to the matching documents of the parent collection.

A projection is defined in the request body search of the join clause using the parameter project. The project parameter accepts an array of field’s objects, each one defining a field to project. There are two different types of field objects: a standard field or a script field.

The projected fields from a child collection are accessible in the scope of the parent’s request. One can refer to a projected field in a project context or in a script context such as in a script field, a script-based sort, and so on.

Example

Consider the following documents from two indices, companies and people:

$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_bulk?pretty' -d '
{ "index" : { "_index" : "companies", "_type" : "company", "_id" : "1" } }
{ "id": 1, "name" : "Acme" }
{ "index" : { "_index" : "companies", "_type" : "company", "_id" : "2" } }
{ "id": 2, "name" : "Bueno" }

{ "index" : { "_index" : "people", "_type" : "person", "_id" : "1" } }
{ "id" : 1, "name" : "Alice", "age" : 31, "employed_by": 1 }
{ "index" : { "_index" : "people", "_type" : "person", "_id" : "2" } }
{ "id" : 2, "name" : "Bob", "age" : 42, "employed_by": 2 }
{ "index" : { "_index" : "people", "_type" : "person", "_id" : "3" } }
{ "id" : 3, "name" : "Carol", "age" : 26, "employed_by": 1 }
'

Suppose that the two indices are joined in order to retrieve a list of companies with the ages of all their respective employees using the following request:

$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/companies/_search?pretty' -d '{
   "query" : {
      "join" : {
        "indices" : ["people"],
        "on" : ["id", "employed_by"],
        "request" : {
          "project" : [
            { "field" : { "name" : "age", "alias" : "employee_age" } }  (1)
          ],
          "query" : {
            "match_all" : {}
          }
        }
      }
    },
    "script_fields" : {
      "employees_age" : {
        "script" : "doc.employee_age"                                    (2)
      }
    }
}'
1 Project the field age from index people as employee_age
2 Return a script field employees_age for each hit with the associated projected values

The response should contain two hits, one for each company, with the script field employees_age as follows:

{
  "hits" : {
    "total" : 2,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "companies",
        "_type" : "company",
        "_id" : "2",
        "_score" : 0.0,
        "fields" : {
          "employees_age" : [
            42
          ]
        }
      },
      {
        "_index" : "companies",
        "_type" : "company",
        "_id" : "1",
        "_score" : 0.0,
        "fields" : {
          "employees_age" : [
            26,
            31
          ]
        }
      }
    ]
  }
}

Field

A standard field object specifies the projection of a field from a collection. It is composed of the following parameters:

name

The name of a field from a child collection to project.

alias

An alias name to give to the projected field. It is not possible to have multiple fields with identical names in the same collection scope as this leads to ambiguity. It is therefore important to carefully select alias names to avoid such ambiguity.

{
  "field" : {
    "name" : "age",              (1)
    "alias" : "employee_age"     (2)
  }
}
1 The name of the field to project
2 An alias for the field name

Script Field

A script field object specifies the projection of the result of a script. It is composed of the following parameters:

name

The name given to the projected script field.

type

The datatype of the projected script field. Supported datatypes are boolean datatype, all numeric datatypes, and keyword datatype.

script

The definition of a script supported by the Elasticsearch API. Projected fields from a child collection are accessible in the script context using the doc values API.

{
  "script_field" : {
    "name" : "employee_age",      (1)
    "type" : "integer",           (2)
    "script": {                   (3)
      "lang": "painless",
      "source": "doc.age"
    }
  }
}
1 The name of the script field
2 The datatype of the script field
3 The script producing values

Document Score

The score of a matching document from a collection may be projected using a standard field object using the special field name _score.

{
  "field" : {
    "name" : "_score",
    "alias" : "employee_score"
  }
}

Retrieving a projected field

A script field may be used to retrieve the values of a projected field for each hit, as shown in the previous example. The projected field is accessed using the doc values API. In the example, the projected field employee_age is accessed using the syntax doc.employee_age.

Sorting based on a projected field

A script-based sort may be used to sort the hits based on the values of a projected field, for example:

$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/companies/_search?pretty' -d '{
   "query" : {
     "join" : {
       "indices" : ["people"],
       "on" : ["id", "employed_by"],
       "request" : {
         "project" : [
           {
             "script_field" : {
               "name" : "employee_age",
               "type" : "integer",
               "script" : {
                 "source" : "doc.age",
                 "lang" : "painless"
               }
             }
           }
         ],
         "query" : {
           "match_all" : {}
         }
       }
     }
   },
   "sort": [
    {
      "_script": {
        "script": {
          "lang": "painless",
          "source": "int sum = 0; for (value in doc.employee_age) { sum += value } return sum;"
        },
        "type": "number",
        "order": "desc"
      }
    }
  ]
}'

The response should contain two hits, one for each company, sorted by the sum of their employees age as follows:

{
  "hits" : {
    "total" : 2,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "companies",
        "_type" : "company",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "id" : 1,
          "name" : "Acme"
        },
        "sort" : [
          57.0
        ]
      },
      {
        "_index" : "companies",
        "_type" : "company",
        "_id" : "2",
        "_score" : null,
        "_source" : {
          "id" : 2,
          "name" : "Bueno"
        },
        "sort" : [
          42.0
        ]
      }
    ]
  }
}

Scoring based on a projected field

A script-based scoring may be used to customize the scoring based on the values of a projected field. For example, we can project the score of the matching documents from the child collection and aggregate them into the parent document as follows:

$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/companies/_search?pretty' -d '{
  "query": {
    "function_score": {
      "query": {
        "join": {
          "indices": [ "people" ],
          "on": [ "id", "employed_by" ],
          "request": {
            "project" : [
              { "field" : { "name" : "_score", "alias" : "child_score" } }
            ],
            "query": {
              "match_all": {}
            }
          }
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "float sum = 0; for (value in doc.child_score) { sum += value } return sum;"
            }
          }
        }
      ],
      "score_mode": "multiply",
      "boost_mode": "replace"
    }
  }
}'

The response should contain two hits, one for each company, sorted by the sum of their child scores as follows:

{
  "hits" : {
    "total" : 2,
    "max_score" : 2.0,
    "hits" : [
      {
        "_index" : "companies",
        "_type" : "company",
        "_id" : "1",
        "_score" : 2.0,
        "_source" : {
          "id" : 1,
          "name" : "Acme"
        }
      },
      {
        "_index" : "companies",
        "_type" : "company",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 2,
          "name" : "Bueno"
        }
      }
    ]
  }
}