Paginating a Search Request
A search request can be paginated in Federate using the search_after parameter. The process starts by opening a Point-In-Time (PIT) on the parent indices at the root. This operation creates an identifier that is then passed to the search request to be paginated. This effectively caches the results of the request and ensures consistency of the hits later on. Subsequent pages are then retrieved by re-executing the request and updating the search_after
parameter. Finally, the PIT must be closed in order to free memory.
Open and Close Point-In-Times
Federate exposes two REST endpoints that allow to open and close Point-In-Times on indices. The state of the indices in the PIT remains unchanged for the duration of the PIT, even if they get updated in the meantime. This allows search requests to be executed against a consistent index, over a long period of time in the midst of potential changes to the indices.
POST /siren/<index>/_pit
DELETE /siren/_pit
{
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
The POST method opens a PIT on the given index pattern and returns an identifier. The DELETE method closes the PIT referenced by the identifier in its body.
Pagination
Paginating a search request requires the PIT identifier returned by REST API, and a tiebreaker sort
parameter. The sort parameter is needed to paginate hits: this adds a sort field in the search response that is then passed to the search_after
. Getting the next page is done by getting the sort
value of the last returned hit and setting it to the search_after
.
The tiebreaker |
Below is a search request that contains a join, where the parent set is machine-*
, and the child set is beat-*
.
GET /siren/machine-*/_search
{
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
}
}
A PIT over the parent set at the root is created, i.e., over the index pattern machine-*
:
POST /siren/machine-*/_pit
{
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}
In order to retrieve the first page, we issue the search request with the identifier and a sort parameter. The index pattern that is normally passed as part of the _search
endpoint is omitted: indices resolved during the PIT creation are retrieved from the given PIT identifier.
GET /siren/_search
{
"sort": { (1)
"_shard_doc": "asc"
},
"pit": { (2)
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
},
"size": 2 (3)
}
1 | A sort explicitly set with the tiebreaker field _shard_doc. |
2 | The PIT identifier returned by the call to the _pit REST API. |
3 | The number of hits returned in a page. |
In order to retrieve the next pages, the same request must be re-executed, unchanged; the only change is the search_after
parameter that is added, with the sort
value from the last returned hit.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"match_all": {}
}
}
}
},
"size": 2,
"search_after": [ (1)
1
]
}
1 | The search_after is given the value of the last returned hit’s sort field. |
Limitations
The pagination of a search request in Federate currently has the following limitations.
-
The PIT identifier returned by the
/siren/_pit
REST API can only be used by a single search request. -
A join performed against a virtual indices located on a remote Elasticsearch cluster is not supported, if that remote cluster doesn’t have the Federate plugin installed.
-
Paginating a search request with a
project
clause is only possible if theproject
clause occurs in a join that is nested. If it occurs instead on a root join, then an error would be thrown.
Example
Paginating a search request with a project
clause in a nested join.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"id"
],
"request": {
"project": [
{
"field": {
"name": "date"
}
}
],
"query": {
"match_all": {}
}
}
}
}
}
}
},
"size": 2
}
Paginating a search request with a project
clause in the root join.
GET /siren/_search
{
"sort": {
"_shard_doc": "asc"
},
"pit": {
"id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
},
"query": {
"join": {
"indices": [
"beat-*"
],
"on": [
"id",
"machine"
],
"request": {
"project": [
{
"field": {
"name": "date"
}
}
],
"query": {
"match_all": {}
}
}
}
},
"size": 2
}