Siren Platform User Guide

Connecting to JDBC datasources

Siren Federate provides the capability to query data from a remote JDBC databases.

Settings

In order to send queries to virtual indices the Elasticsearch cluster must contain at least one node enabled to issue queries over JDBC; it is advised to use a coordinating only node for this role, although this is not a requirement for testing purposes.

JDBC node settings

In order to enable JDBC on a node where the Siren Federate plugin is installed, add the following setting to elasticsearch.yml:

node.attr.connector.jdbc: true

Then, create a directory named jdbc-drivers inside the configuration directory of the node (e.g. elasticsearch/config or /etc/elasticsearch).

Finally, copy the JDBC driver for your remote datasource and its dependencies to the jdbc-drivers directory created above and restart the node; see the JDBC driver installation and compatibility section for a list of compatible drivers and dependencies.

Common configuration settings
Encryption

JDBC passwords are encrypted by default using a predefined 128 bit AES key; before creating datasources, it is advised to generate a custom key by running the keygen.sh script included in the siren-federate plugin directory as follows:

bash plugins/siren-federate/tools/keygen.sh -s 128

The command will output a random base64 key; it is also possible to generate keys longer than 128 bit if your JVM supports it.

To use the custom key, the following parameters must be set in elasticsearch.yml on master nodes and on all the JDBC nodes:

  • siren.connector.encryption.enabled: true by default, can be set to false to disable JDBC password encryption.
  • siren.connector.encryption.secret_key: a base64 encoded AES key used to encrypt JDBC passwords.

Example elasticsearch.yml settings for a master node with a custom encryption key:

siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="

Example elasticsearch.yml settings for a JDBC node with a custom encryption key:

siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="
node.attr.connector.jdbc: true

Restart the nodes after changing the configuration to apply the settings.

JDBC driver installation and compatibility

The JDBC driver for your remote datasource and its dependencies must be copied to the jdbc-drivers subdirectory inside the configuration directory of JDBC nodes (e.g. elasticsearch/config/jdbc-drivers).

It is not required nor recommended to copy these drivers to nodes which are not enabled to execute queries.

Restart the JDBC node after copying the drivers.

You may create a sub-directory within jdbc-drivers to store a driver and to provide a custom security policy file for this driver. It is recommended to use this approach for drivers that come in multiple jars. A custom security policy file enables the definition of driver-specific permissions. The custom security policy file must be named security.policy and must be located inside the driver sub-directory. The following variables can be used within the policy file:

  • codebase.federate.common: Path to the directory storing the security.policy (defaults to jdbc-drivers if the default drivers-security.policy file is used)
  • codebase.federate.${jar_name): Path to a driver jar. Here, ${jar_name} refers to the filename of the jar stored in the directory where the security.policy file is located (defaults to jars found in jdbc-drivers if the default drivers-security.policy file is used). For Example:
grant codeBase "${codebase.federate.postgresql-42.2.5.jar}" {
  // Permissions for postgresql-42.2.5.jar
}

If a security.policy is placed in the main jdbc-drivers directory, then it overrides the default drivers-security.policy

Table 16. List of supported JDBC drivers

Name

JDBC class

Notes

PostgreSQL

org.postgresql.Driver

Download the latest JDBC 4.2 driver from https://jdbc.postgresql.org/download.html and copy the postgresql-<version>.jar file to the jdbc-drivers directory.

MySQL

com.mysql.jdbc.Driver

Download the latest GA release from https://dev.mysql.com/downloads/connector/j/, extract it, then copy mysql-connector-java-<version>.jar to the jdbc-drivers plugin directory.

When writing the JDBC connection string, set the useLegacyDatetimeCode parameter to false to avoid issues when converting timestamps.

Microsoft SQL Server 2014 or greater

com.microsoft.sqlserver.jdbc.SQLServerDriver

Download sqljdbc_<version>_enu.tar.gz from https://www.microsoft.com/en-us/download/details.aspx?id=55539, extract it, then copy mssql-jdbc-<version>.jre8.jar to the jdbc-drivers directory.

Sybase ASE 15.7+

com.sybase.jdbc4.jdbc.SybDriver

OR

net.sourceforge.jtds.jdbc.Driver

To use the FreeTDS driver, download the latest version from https://sourceforge.net/projects/jtds/files/, extract it, then copy jtds-<version>.jar to the jdbc-drivers directory.

To use the jConnect driver, copy jConnect-<version>.jar from your ASE directory to the jdbc-drivers directory.

Oracle 12c+

oracle.jdbc.OracleDriver

Download the latest ojdbc8.jar from http://www.oracle.com/technetwork/database/features/jdbc/jdbc-ucp-122-3110062.html and copy it to the jdbc-drivers plugin directory.

Presto

com.facebook.presto.jdbc.PrestoDriver

Download the latest JDBC driver from https://prestodb.io/docs/current/installation/jdbc.html and copy it to the jdbc-drivers plugin directory.

Spark SQL 2.2+

com.simba.spark.jdbc41.Driver

The Magnitude JDBC driver for Spark can be purchased at https://www.simba.com/product/spark-drivers-with-sql-connector/; once downloaded, extract the bundle, then extract the JDBC 4.1 archive and copy the following jars to the jdbc-drivers plugin directory:

SparkJDBC41.jar

commons-codec-<version>.jar

hive_metastore.jar

hive_service.jar

libfb303-<version>.jar

libthrift-<version>.jar

ql.jar

TCLIServiceClient.jar

zookeeper-<version>.jar

In addition, copy your license file to the jdbc-drivers plugin directory.

Dremio

com.dremio.jdbc.Driver

Download the jar at https://download.siren.io/dremio-jdbc-driver-1.4.4-201801230630490666-6d69d32.jar and copy it to the jdbc-drivers plugin directory.

Impala

com.cloudera.impala.jdbc41.Driver

Download the latest JDBC bundle from https://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-42.html, extract the bundle, then extract the JDBC 4.1 archive and copy the following jars to the jdbc-drivers plugin directory:

ImpalaJDBC41.jar

commons-codec-<version>.jar

hive_metastore.jar

hive_service.jar

libfb303-<version>.jar

libthrift-<version>.jar

ql.jar

TCLIServiceClient.jar

zookeeper-<version>.jar



Operations on virtual indices

The plugin supports the following operations on virtual indices:

  • get mapping
  • get field capabilities
  • search
  • msearch
  • get
  • mget

Search requests involving a mixture of virtual and normal Elasticsearch indices (e.g. when using a wildcard) are not supported and will be rejected; it is however possible to issue msearch requests containing requests on normal Elasticsearch indices and virtual indices.

When creating a virtual index, the plugin will create an empty Elasticsearch index for interoperability with Search Guard and Elastic X-Pack; if an Elasticsearch index with the same name as the virtual index already exists and it is not empty, the virtual index creation will fail.

When deleting a virtual index, the corresponding Elasticsearch index will not be removed.

Type conversion

The plugin converts JDBC types to their closest Elasticsearch equivalent:

  • String types are handled as keyword fields.
  • Boolean types are handled as boolean fields.
  • Date and timestamp are handled as date fields.
  • Integer types are handled as long fields.
  • Floating point types are handled as double fields.

Complex JDBC types which are not recognized by the plugin are skipped during query processing and resultset fetching.

Supported search queries

The plugin supports the following queries:

  • match_all
  • term
  • terms
  • range
  • exists
  • prefix
  • wildcard
  • ids
  • bool

At this time the plugin provides no support for datasource specific full text search functions, so all these queries will work as if they were issued against keyword fields.

Supported aggregations

Currently the plugin provides support for the following aggregations:

Metric:

  • Average
  • Cardinality
  • Max
  • Min
  • Sum

Bucket:

  • Date histogram
  • Histogram
  • Date range
  • Range
  • Terms
  • Filters

Only terms aggregations can be nested inside a parent bucket aggregation.

Troubleshooting
Cannot reconnect to datasource by hostname after DNS update

When the Java security manager is enabled, the JVM will cache name resolutions indefinitely; if the system you’re connecting to uses round-robin DNS or the IP address of the system changes frequently, you will need to modify the following Java Security Policy properties:

  • networkaddress.cache.ttl: the number of seconds to cache a successful DNS lookup. Defaults to -1 (forever).
  • networkaddress.cache.negative.ttl: the number of seconds to cache an unsuccessful DNS lookup. Defaults to 10, set to 0 to avoid caching.