Configuring a remote Elasticsearch connector

Siren Federate provides the capability to query data from an Elasticsearch cluster through the remote clusters module and the Siren Federate connector APIs .

The remote Elasticsearch cluster does not have the Siren Federate plugin installed. Therefore Siren Federate cannot push down a join to the remote cluster. Instead, the computation of the join is done on the local cluster using the broadcast_join implementation.

Compatibility with security systems

To execute joins spanning several clusters, set the following cluster- and index-level permissions on the clusters.

On the local Federate cluster:

  • cluster:internal/federate/*

  • indices:data/read/mget

  • indices:data/read/msearch

  • indices:data/read/mtv

  • indices:data/read/open_point_in_time

  • indices:data/read/close_point_in_time

  • indices:data/read*

  • indices:admin/template/get

  • indices:admin/aliases/get

  • indices:admin/aliases/exists

  • indices:admin/get

  • indices:admin/exists

  • indices:admin/mappings/fields/get*

  • indices:admin/mappings/get*

  • indices:admin/mappings/federate/connector/get*

  • indices:admin/mappings/federate/connector/fields/get*

  • indices:admin/types/exists

  • indices:admin/validate/query

  • indices:monitor/settings/get

For the remote ES cluster:

  • indices:data/read/mget

  • indices:data/read/msearch

  • indices:data/read/mtv

  • indices:data/read/open_point_in_time

  • indices:data/read/close_point_in_time

  • indices:admin/template/get

  • indices:data/read*

  • indices:data/read/search

The remote Elasticsearch connector is compatible with the following security systems:

Before you begin

  1. Ensure that the remote clusters are configured as described in the Configuring remote clusters section of the Elasticsearch documentation.

  2. Set up the remote Elasticsearch clusters. For example, use the following settings:

    curl -X PUT http://localhost:9200/_cluster/settings -H 'Content-type: application/json' -d '
    {
        "persistent": {
            "cluster": {
                "remote": {
                    "remotefederate": {
                        "seeds": [
                            "127.0.0.1:9330"
                        ]
                    }
                }
            }
        }
    }
    '

Procedure

In this procedure, we are using the example of a remote Elasticsearch cluster called remoteelasticsearch, which contains indices called logs-2019.01, logs-2019.02, …​, logs-2019.12, and so on.

  1. Define the datasource as an alias to the remote Elasticsearch cluster, by using the Siren Federate datasource API as follows:

    curl -X PUT http://localhost:9200/_siren/connector/datasource/remoteelasticsearchds -H 'Content-type: application/json' -d '
      {
        "elastic": {
          "alias": "remoteelasticsearch"
        }
      }
      '
  2. Define a virtual index on the coordinator cluster that matches the wildcard index pattern logs-*, by using the Siren Federate virtual index API as follows:

    curl -X PUT http://localhost:9200/_siren/connector/index/logsvi -H 'Content-type: application/json' -d '
    {
      "datasource": "remoteelasticsearchds",
      "resource": "logs-*",
      "key": "_id"
    }
    '
  3. Execute a join query. For example, the coordinator cluster contains an index called machines, which contains information about IP addresses on machines of interest. To find out about the logs that are associated to these machines, execute the following Federate join query:

    curl -X GET http://localhost:9200/siren/logsvi/_search -H 'Content-Type: application/json' -d '
    {
        "query": {
            "join": {
                "indices": [
                    "machines"
                ],
                "on": [
                    "logs_ip_hash",
                    "machines_ip_hash"
                ],
                "request": {
                    "query": {
                        "match_all": {
    
                        }
                    }
                }
            }
        }
    }
    '

    logs_ip_hash is the IP field in the index logsvi and machines_ip_hash is the IP field in the index machines.

    The API returns the following response:

    {
      "took": 150,
      "timed_out": false,
      "hits": {
        "total" : {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
          {
            "_index": "logs-2019-11-12",
            "_id": "0",
            "_score": 2,
            "_source": {
              "date": "2019-11-12T12:12:12",
              "message": "trying out Siren"
            }
          }
        ]
      }
    }

Known limitations for the Federate connector with a remote Elasticsearch cluster

To use Siren Federate with a remote Elasticsearch cluster, a coordinator Federate cluster must run version 7.11.0.-23.0 or later.