Configuring a JDBC connector

You can query data from a JDBC datasource on a node where the Siren Federate plugin is installed.

Before you begin

The Elasticsearch cluster must contain at least one node that is enabled to issue queries over JDBC. It is recommended that you use a coordinating-only node for this role, although this is not a requirement for testing purposes.

To configure the JDBC datasource, you need a compatible driver. For more information, see Supported JDBC drivers.

If your system needs additional encryption, generate a custom key by running the keygen.sh script.

Procedure

To configure a JDBC datasource, complete the following steps:

Open the elasticsearch.yml file and add the following setting:
```
node.attr.connector.jdbc: true
```
Create a directory called jdbc-drivers inside the configuration directory of the node, for example, create the directory in elasticsearch/config/ or /etc/elasticsearch/.
Copy the JDBC driver for your remote datasource and its dependencies into the jdbc-drivers directory.
Restart the Elasticsearch service.

It is not required nor recommended to copy JDBC drivers to nodes that are not enabled to execute queries.

Define a datasource called mymysqldatabase, based on the assumption, for instance that, one has a MySQL server running on port 3306 and the server has a table called table1, by using the Siren Federate datasource API as follows:

curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_siren/connector/datasource/mymysqldatabase' -d '
{
  "jdbc": {
    "driver": "com.db.Driver",
    "url": "jdbc:db://localhost:3306/default",
    "username": "username",
    "password": "password",
    "properties": {
      "ssl": true
    }
  }
}
'

Define a virtual index on the coordinator cluster that matches a table name called table1 from the database of choice, by using the Siren Federate virtual index API as follows:

curl -X PUT http://localhost:9200/_siren/connector/index/table1 -H 'Content-type: application/json' -d '
{
  "datasource": "mymysqldatabase",
  "resource": "table1",
  "key": "_id"
}
'

Execute a join query. For example, the coordinator cluster contains an index called machines, which contains information about IP addresses on machines of interest. To find out about the logs that are associated to these machines, execute the following Federate join query:

curl -X GET http://localhost:9200/siren/table1/_search -H 'Content-Type: application/json' -d '
{
    "query": {
        "join": {
            "indices": [
                "machines"
            ],
            "on": [
                "logs_ip_hash",
                "machines_ip_hash"
            ],
            "request": {
                "query": {
                    "match_all": {}
                }
            }
        }
    }
}
'

logs_ip_hash is the IP field in the index table1 and machines_ip_hash is the IP field in the index machines.

The API returns the following response:

{
  "took": 150,
  "timed_out": false,
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "table1",
        "_id": "0",
        "_score": 2,
        "_source": {
          "date": "2019-11-12T12:12:12",
          "message": "trying out Siren"
        }
      }
    ]
  }
}

Common configuration settings

Configuring encryption for JDBC datasources

JDBC passwords are encrypted by default by using a predefined 128-bit AES key. However, additional encryption is advisable in a production environment.

Before you create datasources, it is recommended that you generate a custom key by running the keygen.sh script that is included in the siren-federate plugin directory.

Procedure

From the siren-federate plugin directory, run the following command:
```
bash plugins/siren-federate/tools/keygen.sh -s 128
```
The command outputs a random base64 key. It is also possible to generate keys longer than 128 bit if your JVM supports it.
To use the custom key, set the following parameters in the elasticsearch.yml file on master nodes and on all of the JDBC nodes:
- siren.connector.encryption.enabled: true by default, but can be set to false to disable JDBC password encryption.
- siren.connector.encryption.secret_key: a base64 encoded AES key that is used to encrypt JDBC passwords.

Examples

The following are elasticsearch.yml settings that can be used for a master node with a custom encryption key:

siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="

The following are elasticsearch.yml settings for a JDBC node with a custom encryption key:

siren.connector.encryption.secret_key: "1zxtIE6/EkAKap+5OsPWRw=="
node.attr.connector.jdbc: true

After you save the configuration, restart the nodes to apply the settings.

Configuring a custom security policy

You may create a sub-directory within the jdbc-drivers directory to store a driver and to provide a custom security policy file for this driver.

It is recommended to use this approach for drivers that come in multiple jars. A custom security policy file enables the definition of driver-specific permissions.

The custom security policy file must be named security.policy and must be located inside the driver sub-directory. The following variables can be used within the policy file:

codebase.federate.common: Path to the directory that stores the security.policy. The path defaults to jdbc-drivers if the default drivers-security.policy file is used.
codebase.federate.${jar_name}: Path to a driver JAR file. Here, ${jar_name} refers to the name of the JAR file that is stored in the directory where the security.policy file is located. If the default drivers-security.policy file is used, then the system uses JAR files that are stored in the jdbc-drivers directory. For example:

grant codeBase "${codebase.federate.postgresql-42.2.5.jar}" {
  // Permissions for postgresql-42.2.5.jar
}

If a security.policy is placed in the main jdbc-drivers directory, then it overrides the default drivers-security.policy.

After copying the drivers, restart the JDBC node to apply the settings.

Type conversion

The plugin converts JDBC types to their closest Elasticsearch equivalent:

String types are handled as keyword fields.
Boolean types are handled as boolean fields.
Date and timestamp are handled as date fields.
Integer types are handled as long fields.
Floating point types are handled as double fields.

Complex JDBC types that are not recognized by the plugin are skipped during query processing and resultset fetching.

Complex types are supported when their property types are scalar (text, numbers, boolean) or collections (list, map). Arrays of complex type are supported if the complex type meets the previous requirement.

Supported search queries

The plugin supports the following queries:

match_all
term
terms
range
exists
prefix
wildcard
ids
bool

The Siren Federate plugin does not currently provide support for datasource-specific full text search functions, so the listed queries work as if they were issued against keyword fields.

Supported aggregations

The Siren Federate plugin provides support for the following aggregations.

Metric:

Average
Cardinality
Max
Min
Sum

Bucket:

Date histogram
Histogram
Date range
Range
Terms
Filters

Only terms aggregations can be nested inside a parent bucket aggregation.

Known limitations for the JDBC connector

Wildcards on virtual index names are not supported by any API; a wildcard search will silently ignore virtual indices.
Comma-separated lists of index patterns which target virtual indices are not supported.
Document-level security and field-level security are currently not supported.
Only terms aggregations can be nested inside a parent bucket aggregation.
The missing parameter in bucket aggregations is not supported.
Scripted fields are not supported.
When issuing queries containing string comparisons, the plugin does not force a specific collation, if a table behind a virtual indices uses a case insensitive collation, string comparisons will be case insensitive.
Complex types are supported when their property types are scalar (text, numbers, boolean) or collections (list, map).
Arrays of complex type are supported if the complex type meets the previous requirement.