Query domain-specific language (DSL)
Join query
The join
filter enables the filtering of one set of documents (the target) with another one (the source) based on
shared field values. It accepts the following parameters:
type
-
The type of join algorithm to use. Valid values are
BROADCAST_JOIN
,HASH_JOIN
, orINDEX_JOIN
. If this parameter is not specified, the query planner will automatically select the optimal one. For more information, see Configuring joins by type. indices
-
The index names for the child set. Multiple indices can be specified using the Elasticsearch syntax. Defaults to all indices.
on
-
An array of two elements that specifies the field paths for the join keys in the parent and the child set, respectively. The metadata of the fields is validated prior to the query execution, see field metadata.
request
-
The search request that is used to compute the set of documents of the child set before performing the join.
Example
In this example, we will join all the documents from target_index
with the documents of source_index
using the HASH_JOIN
algorithm.
The query first filters documents from source_index
and of type type
with the query
{ "terms" : { "tag" : [ "aaa" ] } }
. It then retrieves the ids of the documents from the field id
specified by the parameter on
. The list of ids is then used as filter and applied on the field
foreign_key
of the documents from target_index
.
curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/siren/target_index/_search' 'd '
{
"query" : {
"join" : {
"type": "HASH_JOIN",
"indices" : ["source_index"],
"on" : ["foreign_key", "id"],
"request" : { (1)
"query" : {
"terms" : {
"tag" : [ "aaa" ]
}
}
}
}
}
}
'
1 | The search request that will be used to filter out the source set (i.e. source_index ) |
Field Metadata
The fields metadata are accessed to check for their capabilities. The field capabilities contain the following information:
type
-
The data type of the field.
searchable
-
A property indicating whether or not the field is indexed for search.
aggregatable
-
A property indicating whether or not the field can be aggregated on.
Both the parent and child fields must be of the same type. For hash and broadcast joins, all join fields must be aggregatable. For index join, parent field must be searchable, and the child field must be aggregatable.
Suppose we have index1
with two fields, one keyword
(indexed as searchable and aggregatable) and the other long
(not indexed as searchable and aggregatable),
then we will have the example result below:
GET /siren/index1/_field_caps
{
"indices": [
"index1"
],
"fields": {
"comments.author.keyword": {
"keyword": {
"type": "keyword",
"searchable": true,
"aggregatable": true
}
},
"comments.number": {
"long": {
"type": "long",
"searchable": false,
"aggregatable": false
}
}
}
}
Scoring Capabilities
The join
filter returns a constant score. Therefore, the scores of the matching documents from the child set
do not affect the scores of the matching documents from the parent set. However, one can
project the document’s score from the child set
and customize the scoring of the documents from the parent set with a
script score function.
Project
When joining a child set with a parent set, the fields from the child set may be projected to the parent set. The projected fields and associated values are mapped to the matching documents of the parent set.
A projection is defined in the request body search of the join clause using the parameter project
.
The project
parameter accepts an array of field’s objects, each one defining a field to project.
There are two different types of field objects: a standard field
or a script field.
The projected fields from a child set are accessible in the scope of the parent’s request. One can refer to a projected field in a project context or in a script context such as in a script field, a script-based sort, and so on.
Example
Consider the following documents from two indices, company
and people
:
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_bulk?pretty' -d '
{ "index" : { "_index" : "company", "_type" : "company", "_id" : "1" } }
{ "id": 1, "name" : "Acme" }
{ "index" : { "_index" : "company", "_type" : "company", "_id" : "2" } }
{ "id": 2, "name" : "Bueno" }
{ "index" : { "_index" : "people", "_type" : "person", "_id" : "1" } }
{ "id" : 1, "name" : "Alice", "age" : 31, "gender" : "Female", "employed_by" : 1 }
{ "index" : { "_index" : "people", "_type" : "person", "_id" : "2" } }
{ "id" : 2, "name" : "Bob", "age" : 42, "gender" : "Male", "employed_by" : 2 }
{ "index" : { "_index" : "people", "_type" : "person", "_id" : "3" } }
{ "id" : 3, "name" : "Carol", "age" : 26, "gender" : "Female", "employed_by" : 1 }
'
Suppose that the two indices are joined in order to retrieve a list of companies with the ages of all their respective employees using the following request:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/company/_search?pretty' -d '{
"query" : {
"join" : {
"indices" : ["people"],
"on" : ["id", "employed_by"],
"request" : {
"project" : [
{ "field" : { "name" : "age", "alias" : "employee_age" } } (1)
],
"query" : {
"match_all" : {}
}
}
}
},
"script_fields" : {
"employees_age" : {
"script" : "doc.employee_age" (2)
}
}
}'
1 | Project the field age from index people as employee_age |
2 | Return a script field employees_age for each hit with the associated projected values |
The response should contain two hits, one for each company, with the script field employees_age
as follows:
{
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [
{
"_index" : "company",
"_type" : "company",
"_id" : "2",
"_score" : 0.0,
"fields" : {
"employees_age" : [
42
]
}
},
{
"_index" : "company",
"_type" : "company",
"_id" : "1",
"_score" : 0.0,
"fields" : {
"employees_age" : [
26,
31
]
}
}
]
}
}
Field
A standard field object specifies the projection of a field from a set. It is composed of the following parameters:
name
-
The name of a field from a child set to project.
alias
-
An alias name to give to the projected field. It is not possible to have multiple fields with identical names in the same set scope as this leads to ambiguity. It is therefore important to carefully select alias names to avoid such ambiguity.
{
"field" : {
"name" : "age", (1)
"alias" : "employee_age" (2)
}
}
1 | The name of the field to project |
2 | An alias for the field name |
Script Field
A script field object specifies the projection of the result of a script. It is composed of the following parameters:
name
-
The name given to the projected script field.
type
-
The datatype of the projected script field. Supported datatypes are all numeric datatypes, and keyword datatype.
script
-
The definition of a script supported by the Elasticsearch API. Projected fields from a child set are accessible in the script context using the doc values API.
{
"script_field" : {
"name" : "employee_age", (1)
"type" : "integer", (2)
"script": { (3)
"lang": "painless",
"source": "doc.age"
}
}
}
1 | The name of the script field |
2 | The datatype of the script field |
3 | The script producing values |
Document Score
The score of a matching document from a set may be projected using a standard field object using
the special field name _score
.
{
"field" : {
"name" : "_score",
"alias" : "employee_score"
}
}
Retrieving a projected field
A script field may be used to retrieve the values of a projected field for each hit, as shown in the
previous example. The projected field is accessed using the
doc values API. In the example,
the projected field employee_age
is accessed using the syntax doc.employee_age
.
Sorting based on a projected field
A script-based sorting method can be used to sort the hits based on the values of a projected field, for example:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/company/_search?pretty' -d '{
"query" : {
"join" : {
"indices" : ["people"],
"on" : ["id", "employed_by"],
"request" : {
"project" : [
{
"script_field" : {
"name" : "employee_age",
"type" : "integer",
"script" : {
"source" : "doc.age",
"lang" : "painless"
}
}
}
],
"query" : {
"match_all" : {}
}
}
}
},
"sort": [
{
"_script": {
"script": {
"lang": "painless",
"source": "int sum = 0; for (value in doc.employee_age) { sum += value } return sum;"
},
"type": "number",
"order": "desc"
}
}
]
}'
The response should contain two hits, one for each company, sorted by the sum of their employees age as follows:
{
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [
{
"_index" : "company",
"_type" : "company",
"_id" : "1",
"_score" : null,
"_source" : {
"id" : 1,
"name" : "Acme"
},
"sort" : [
57.0
]
},
{
"_index" : "company",
"_type" : "company",
"_id" : "2",
"_score" : null,
"_source" : {
"id" : 2,
"name" : "Bueno"
},
"sort" : [
42.0
]
}
]
}
}
Scoring based on a projected field
A script-based scoring method can be used to customize the scoring based on the values of a projected field. For example, we can project the score of the matching documents from the child set and aggregate them into the parent document as follows:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/company/_search?pretty' -d '{
"query": {
"function_score": {
"query": {
"join": {
"indices": [ "people" ],
"on": [ "id", "employed_by" ],
"request": {
"project" : [
{ "field" : { "name" : "_score", "alias" : "child_score" } }
],
"query": {
"match_all": {}
}
}
}
},
"functions": [
{
"script_score": {
"script": {
"lang": "painless",
"source": "float sum = 0; for (value in doc.child_score) { sum += value } return sum;"
}
}
}
],
"score_mode": "multiply",
"boost_mode": "replace"
}
}
}'
The response should contain two hits, one for each company, sorted by the sum of their child scores as follows:
{
"hits" : {
"total" : 2,
"max_score" : 2.0,
"hits" : [
{
"_index" : "company",
"_type" : "company",
"_id" : "1",
"_score" : 2.0,
"_source" : {
"id" : 1,
"name" : "Acme"
}
},
{
"_index" : "company",
"_type" : "company",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : 2,
"name" : "Bueno"
}
}
]
}
}
Aggregating based on a projected field
A script can be used to access and
aggregate the values of a projected field. For example, we can project the values of
the field gender
of the matching documents from the people
index and aggregate the documents from the company
index by using these values as follows:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/company/_search?pretty' -d '{
"query" : {
"join" : {
"indices" : ["people"],
"on" : ["id", "employed_by"],
"request" : {
"project" : [
{ "field" : { "name" : "gender.keyword", "alias" : "employee_gender" } }
],
"query" : {
"match_all" : {}
}
}
}
},
"aggs": {
"count_by_gender": {
"terms": {
"script": {
"lang": "painless",
"source": "doc.employee_gender"
}
}
}
}
}'
The response should contain an aggregation result count_by_gender
with two buckets Female
and Male
as follows:
{
"aggregations": {
"count_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Female",
"doc_count": 1
},
{
"key": "Male",
"doc_count": 1
}
]
}
}
}
Compatibility with Nested Query
The join
filter within a nested
query is supported. The join key must specify
the field path within the scope of the nested object. For example, as shown below, the join key must be foreign_key
and not nested_obj.foreign_key
.
curl -H 'Content-Type: application/json' -XGET 'http://localhost:9200/siren/target_index/_search' -d '
{
"query" : {
"nested" : {
"path" : "nested_obj",
"query" : {
"join" : {
"indices" : ["source_index"],
"on" : ["foreign_key", "id"],
"request" : {
"query" : {
"match_all" : {}
}
}
}
}
}
}
}
'
A nested
query within a join
filter is also supported if and only if the join key does not refer to
a field of the nested object.
Known limitations
-
A join query involving one or more runtime fields is currently supported using
HASH_JOIN
orBROADCAST_JOIN
algorithms. TheINDEX_JOIN
algorithm only supports joins with the runtime field on the child set. Finally, dynamic runtime fields defined in a search request cannot be used in a join query. Runtime fields must be defined in advance in an index mapping.