Paginating a Search Request

A search request can be paginated in Federate using the search_after parameter. The process starts by opening a Point-In-Time (PIT) on the parent indices at the root. This operation creates an identifier that is then passed to the search request to be paginated. This effectively caches the results of the request and ensures consistency of the hits later on. Subsequent pages are then retrieved by re-executing the request and updating the search_after parameter. Finally, the PIT must be closed in order to free memory.

Open and Close Point-In-Times

Federate exposes two REST endpoints that allow to open and close Point-In-Times on indices. The state of the indices in the PIT remains unchanged for the duration of the PIT, even if they get updated in the meantime. This allows search requests to be executed against a consistent index, over a long period of time in the midst of potential changes to the indices.

POST /siren/<index>/_pit

DELETE /siren/_pit
{
   "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}

The POST method opens a PIT on the given index pattern and returns an identifier. The DELETE method closes the PIT referenced by the identifier in its body.

Pagination

Paginating a search request requires the PIT identifier returned by REST API, and a tiebreaker sort parameter. The sort parameter is needed to paginate hits: this adds a sort field in the search response that is then passed to the search_after. Getting the next page is done by getting the sort value of the last returned hit and setting it to the search_after.

The tiebreaker sort parameter is automatically added if there is already a sort in the request.

Below is a search request that contains a join, where the parent set is machine-*, and the child set is beat-*.

GET /siren/machine-*/_search
{
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "match_all": {}
        }
      }
    }
  }
}

A PIT over the parent set at the root is created, i.e., over the index pattern machine-*:

POST /siren/machine-*/_pit
{
   "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
}

In order to retrieve the first page, we issue the search request with the identifier and a sort parameter. The index pattern that is normally passed as part of the _search endpoint is omitted: indices resolved during the PIT creation are retrieved from the given PIT identifier.

GET /siren/_search
{
  "sort": { (1)
    "_shard_doc": "asc"
  },
  "pit": { (2)
    "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
  },
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "match_all": {}
        }
      }
    }
  },
  "size": 2 (3)
}
1 A sort explicitely set with the tiebreaker field _shard_doc.
2 The PIT identifier returned by the call to the _pit REST API.
3 The number of hits returned in a page.

In order to retrieve the next pages, the same request must be re-executed, unchanged; the only change is the search_after parameter that is added, with the sort value from the last returned hit.

GET /siren/_search
{
  "sort": {
    "_shard_doc": "asc"
  },
  "pit": {
    "id": "AQ92aV82NzVhOTQ5YV80MWU=#15izAwEWdGVzdHNjcm9sbG5vam9pbi1pbmRleBZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3ABZoR2FERnR6VlNmbUtPR2RaaXVUVjZnAAAAAAAAAAADFjZJN0t0YTdsUVdtMG95a3pvYjd4NkEAARZtWjNtUVROdlRsT1NkTk9ZYlVGS1d3AAA="
  },
  "query": {
    "join": {
      "indices": [
        "beat-*"
      ],
      "on": [
        "id",
        "machine"
      ],
      "request": {
        "query": {
          "match_all": {}
        }
      }
    }
  },
  "size": 2,
  "search_after": [ (1)
    1
  ]
}
1 The search_after is given the value of the last returned hit’s sort field.

Limitations

The pagination of a search request in Federate currently has the following limitations.

  1. The PIT identifier returned by the /siren/_pit REST API can only be used by a single search request.

  2. A join performed against a virtual indices located on a remote Elasticsearch cluster is not supported, if that remote cluster doesn’t have the Federate plugin installed.