Siren Platform User Guide

Connecting to JDBC datasources

The Siren Federate plugin provides the capability to query data in remote datasources through the Elasticsearch API by mapping tables to virtual indices.

The plugin stores its configuration in two Elasticsearch indices:

  • .siren-federate-datasources: used to store the JDBC configuration parameters of remote datasources.

  • .siren-federate-indices: used to store the configuration parameters of virtual indices.

You should restrict access to these indices to the Federate user.

Datasources and virtual indices can be managed using the REST API or the user interface available in Siren Investigate.

These indices are created automatically when required.

Settings

To send queries to virtual indices the Elasticsearch cluster must contain at least one node enabled to issue queries over JDBC. You should use a coordinating only node for this role, although this is not a requirement for testing purposes.

JDBC node settings

To enable JDBC on a node where the Siren Federate plugin is installed, add the following setting to elasticsearch.yml:

node.attr.connector.jdbc: true

Then, create a folder named jdbc-drivers inside the configuration folder of the node (for example elasticsearch/config or /etc/elasticsearch).

Finally, copy the JDBC driver for your remote datasource and its dependencies to the jdbc-drivers directory created above and restart the node; see the JDBC driver installation and compatibility section for a list of compatible drivers and dependencies.

Common configuration settings
Encryption

JDBC passwords are encrypted by default using a predefined 128 bit AES key; before creating datasources, it is advised to generate a custom key by running the keygen.sh script included in the siren-federate plugin directory as follows:

bash plugins/siren-federate/tools/keygen.sh -s 128

The command will output a random base64 key; it is also possible to generate keys longer than 128 bit if your JVM supports it.

To use the custom key, the following parameters must be set in`elasticsearch.yml` on master nodes and on all the JDBC nodes:

  • siren.connector.encryption.enabled: true by default, can be set to false to switch off JDBC password encryption.

  • siren.connector.encryption.secret_key: A base64 encoded AES key used to encrypt JDBC passwords.

Example elasticsearch.yml settings for a master node with a custom encryption key:

siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="

Example elasticsearch.yml settings for a JDBC node with a custom encryption key:

siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="
node.attr.connector.jdbc: true

Restart the nodes after changing the configuration to apply the settings.

Cluster wide settings

The following parameters can be set in elasticsearch.yml on JDBC nodes or by using the Elasticsearch cluster update settings API:

  • siren.connector.siren.timeout.connection: The maximum amount of seconds to wait when establishing or acquiring a JDBC connection (30 by default).

  • siren.connector.timeout.query: The maximum execution time for JDBC queries, in seconds (30 by default).

  • siren.connector.enable_union_aggregations: true by default. Set to false to switch off the use of unions in nested aggregations.

  • siren.connector.query.max_result_rows: The maximum number of rows that will be retrieved from a resultset when performing a join across datasources. Defaults to 50000.

  • siren.connector.query.max_bucket_queries: The maximum number of JDBC queries that will be generated to compute aggregation buckets. Defaults to 500.

Additional node settings

The following settings can be used to tune query processing on JDBC enabled nodes:

  • siren.connector.pool.size: the number of threads that will be allocated to process the execution of queries to remote datasources; by default it is set to int((number of available_processors * 3) / 2) + 1.

  • siren.connector.pool.queue: the maximum number of requests that should be queued if all the threads are busy. Defaults to 40.

Authentication
The Federate server role

If your cluster is protected by Search Guard or Elastic X-Pack, it is required to define a role with access to the Federate indices and internal operations and to create a user with this role.

For interoperability with these plugins, whenever a virtual index is created, the Federate plugin creates a concrete Elasticsearch index with the same name as the virtual index; when starting up, the Federate plugin will check for missing concrete indices and will attempt to create them automatically.

Sample Search Guard role definition:

federateserver:
  cluster:
    - "indices:admin/aliases"
  indices:
    ?siren-federate-datasources:
      '*':
        - ALL
    ?siren-federate-indices:
      '*':
        - ALL
    ?siren-federate-target:
      '*':
        - ALL

Sample X-Pack role definition:

{
  "cluster": [
    "monitor",
    "cluster:admin/siren/connector"
  ],
  "indices" : [
    {
      "names" : [ "*" ],
      "privileges" : [ "create_index", "indices:data/read/get", "indices:admin/siren/connector" ]
    },
    {
      "names" : [ ".siren-federate-*" ],
      "privileges" : [ "all", "indices:admin/siren/connector" ]
    }
  ]
}

Then create a user with that role for example, a user called federateserver.

Example elasticsearch.yml settings for a master node in a cluster with authentication and federateserver user:

siren.connector.username: federateserver
siren.connector.password: password
siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="

Example elasticsearch.yml settings for a JDBC node in a cluster with authentication and federateserver user:

siren.connector.username: federateserver
siren.connector.password: password
siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="
node.attr.connector.jdbc: true

Restart the nodes after setting the appropriate configuration parameters.

Administrative role

To manage datasources and virtual indices, it is required to grant the cluster:admin/siren/connector/* permissions at the cluster level.

In addition, the user must have the indices:admin/siren/connector/* and indices:data/siren/connector/* permissions on all the index names that he’s allowed to define, in addition to create, write, read and search permissions.

Write permissions are required because when a virtual index is defined the plugin will create a concrete Elasticsearch index with the same name for interoperability with authentication plugins, unless such an index already exists.

Example Search Guard role allowed to manage virtual indices starting with db-:

sirenadmin:
  cluster:
    - SIREN_CLUSTER
    - cluster:admin/plugin/siren/license/put
    - cluster:admin/plugin/siren/license/get
    - cluster:admin/siren/connector/*
  indices:
    'db-*':
      '*':
        - SIREN_READWRITE
        - indices:admin/create
        - indices:admin/siren/connector/*;
    '*':
      '*':
        - SIREN_COMPOSITE

Example X-Pack role allowed to manage virtual indices starting with db-:

{
  "cluster": [
    "cluster:admin/siren/connector"
    "cluster:admin/plugin/siren/license",
    "cluster:siren/internal",
    "manage"
  ],
  "indices" : [
    {
      "names" : [ "*" ],
      "privileges" : [ "indices:siren/mplan" ]
    },
    {
      "names" : [ "db-*" ],
      "privileges" : [
        "read",
        "create_index",
        "view_index_metadata",
        "indices:data/siren",
        "indices:siren",
        "indices:admin/version/get",
        "indices:admin/get",
        "indices:admin/siren/connector"
      ]
    }
  ]
}
Search role

In order to search virtual indices, users must have the indices:data/siren/connector/* permissions on these indices in addition to standard read and search permissions.

Example Search Guard role allowed to search virtual indices starting with db-:

sirenuser:
  cluster:
    - SIREN_CLUSTER
  indices:
    '*':
      '*':
        SIREN_COMPOSITE
    'db-*':
      '*':
        - SIREN_READONLY
        - indices:data/siren/connector*;

Example X-Pack role allowed to search virtual indices starting with db-:

{
  "cluster": [
    "cluster:admin/plugin/siren/license/get",
    "cluster:siren/internal"
  ],
  "indices" : [
    {
      "names" : [ "*" ],
      "privileges" : [ "indices:siren/mplan" ]
    },
    {
      "names" : [ "db-*" ],
      "privileges" : [
        "read",
        "view_index_metadata",
        "indices:data/siren",
        "indices:siren",
        "indices:admin/version/get",
        "indices:admin/get"
      ]
    }
  ]
}

JDBC driver installation and compatibility

The JDBC driver for your remote datasource and its dependencies must be copied to the jdbc-drivers subdirectory inside the configuration directory of JDBC nodes (e.g. elasticsearch/config/jdbc-drivers).

It is neither required nor recommended to copy these drivers to nodes which are not enabled to execute queries.

Table 16. List of supported JDBC drivers

Name

JDBC class

Notes

PostgreSQL

org.postgresql.Driver

Download the latest JDBC 4.2 driver from https://jdbc.postgresql.org/download.html and copy the postgresql-<version>.jar file to the jdbc-drivers directory.

MySQL

com.mysql.jdbc.Driver

Download the latest GA release from https://dev.mysql.com/downloads/connector/j/, extract it, then copy mysql-connector-java-<version>.jar to the jdbc-drivers plugin directory.

When writing the JDBC connection string, set the useLegacyDatetimeCode parameter to false to avoid issues when converting timestamps.

Microsoft SQL Server 2014 or greater

com.microsoft.sqlserver.jdbc.SQLServerDriver

Download sqljdbc_<version>_enu.tar.gz from https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-2017#available-downloads-of-jdbc-driver-for-sql-server, extract it, then copy mssql-jdbc-<version>.jre8.jar to the jdbc-drivers directory.

Sybase ASE 15.7+

com.sybase.jdbc4.jdbc.SybDriver

OR

net.sourceforge.jtds.jdbc.Driver

To use the FreeTDS driver, download the latest version from https://sourceforge.net/projects/jtds/files/, extract it, then copy jtds-<version>.jar to the jdbc-drivers directory.

To use the jConnect driver, copy jConnect-<version>.jar from your ASE directory to the jdbc-drivers directory.

Oracle 12c+

oracle.jdbc.OracleDriver

Download the latest ojdbc8.jar from http://www.oracle.com/technetwork/database/features/jdbc/jdbc-ucp-122-3110062.html and copy it to the jdbc-drivers plugin directory.

Presto

com.facebook.presto.jdbc.PrestoDriver

Download the latest JDBC driver from https://prestodb.io/docs/current/installation/jdbc.html and copy it to the jdbc-drivers plugin directory.

Spark SQL 2.2+

com.simba.spark.jdbc41.Driver

The Magnitude JDBC driver for Spark can be purchased at https://www.simba.com/product/spark-drivers-with-sql-connector/; once downloaded, extract the bundle, then extract the JDBC 4.1 archive and copy the following jars to the jdbc-drivers plugin directory:

SparkJDBC41.jar

commons-codec-<version>.jar

hive_metastore.jar

hive_service.jar

libfb303-<version>.jar

libthrift-<version>.jar

ql.jar

TCLIServiceClient.jar

zookeeper-<version>.jar

In addition, copy your license file to the jdbc-drivers plugin directory.

Dremio

com.dremio.jdbc.Driver

Download the jar at https://download.siren.io/dremio-jdbc-driver-1.4.4-201801230630490666-6d69d32.jar and copy it to the jdbc-drivers plugin directory.

Impala

com.cloudera.impala.jdbc41.Driver

Download the latest JDBC bundle from https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-4.html, extract the bundle, then extract the JDBC 4.1 archive and copy the following jars to the jdbc-drivers plugin directory:

ImpalaJDBC41.jar

commons-codec-<version>.jar

hive_metastore.jar

hive_service.jar

libfb303-<version>.jar

libthrift-<version>.jar

ql.jar

TCLIServiceClient.jar

zookeeper-<version>.jar



Restart the JDBC node after copying the drivers.

Search results

    No results found