Getting Started
In this short guide, you will learn how you can quickly install the Siren Federate plugin in Elasticsearch, load two sets of documents inter-connected by a common attribute, and execute a relational query across the two sets within the Elasticsearch environment.
Installing the Siren Federate Plugin
From the Elasticsearch installation directory, run the following command:
$ ./bin/elasticsearch-plugin install https://download.support.siren.io/federate/8.14.3-36.3.zip
-> Downloading https://download.support.siren.io/federate/8.14.3-36.3-proguard-plugin.zip
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.io.FilePermission cloudera.properties read
* java.io.FilePermission simba.properties read
* java.lang.RuntimePermission accessClassInPackage.sun.misc
* java.lang.RuntimePermission accessClassInPackage.sun.misc.*
* java.lang.RuntimePermission accessClassInPackage.sun.security.provider
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission createClassLoader
* java.lang.RuntimePermission getClassLoader
...
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]y
-> Installed siren-federate
To remove the plugin, run the following command:
$ bin/elasticsearch-plugin remove siren-federate
-> Removing siren-federate...
Removed siren-federate
Each Federate version is tightly coupled with a specific version of Elasticsearch. We strongly recommend you use the Java embedded in the Elasticsearch distribution only. |
Starting Elasticsearch
To launch Elasticsearch, run the following command:
$ ./bin/elasticsearch
In the output, you should see a line like the following which indicates that the Siren Federate plugin is installed and running:
[2017-04-11T10:42:02,209][INFO ][o.e.p.PluginsService ] [etZuTTn] loaded plugin [siren-federate]
Loading Some Relational Data
We will use a simple synthetic dataset for the purpose of this demo. The dataset consists of two sets
of documents: Article and Company. An article is connected to a company with the attribute mentions
.
Article will be loaded into the article
index and company in the company
index. To load the dataset, run
the following command:
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/article'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/article/_mapping' -d '
{
"properties": {
"mentions": {
"type": "keyword"
}
}
}
'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/company'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/company/_mapping' -d '
{
"properties": {
"id": {
"type": "keyword"
}
}
}
'
$ curl -H 'Content-Type: application/json' -XPUT 'http://localhost:9200/_bulk?pretty&refresh=true' -d '
{ "index" : { "_index" : "article", "_id" : "1" } }
{ "title" : "The NoSQL database glut", "mentions" : ["1", "2"] }
{ "index" : { "_index" : "article", "_id" : "2" } }
{ "title" : "Graph Databases Seen Connecting the Dots", "mentions" : [] }
{ "index" : { "_index" : "article", "_id" : "3" } }
{ "title" : "How to determine which NoSQL DBMS best fits your needs", "mentions" : ["2", "4"] }
{ "index" : { "_index" : "article", "_id" : "4" } }
{ "title" : "MapR ships Apache Drill", "mentions" : ["4"] }
{ "index" : { "_index" : "company", "_id" : "1" } }
{ "id": "1", "name" : "Elastic" }
{ "index" : { "_index" : "company", "_id" : "2" } }
{ "id": "2", "name" : "Orient Technologies" }
{ "index" : { "_index" : "company", "_id" : "3" } }
{ "id": "3", "name" : "Cloudera" }
{ "index" : { "_index" : "company", "_id" : "4" } }
{ "id": "4", "name" : "MapR" }
'
{
"took" : 8,
"errors" : false,
"items" : [ {
"index" : {
"_index" : "article",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
...
}
Relational Querying of the Data
We will now show you how to execute a relational query across the two indices. For example, we would like
to retrieve all the articles that mention companies whose name matches orient
. This relational query can be decomposed in
two search queries: the first one to find all the companies whose name matches orient
, and a second
query to filter out all articles that do not mention a company from the first result set. The Siren Federate plugin
introduces a new Elasticsearch filter,
named join
, that allows to
define such a query plan and a new search API siren/<index>/_search
that allows to execute this query plan.
Below is the command to run the relational query:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/article/_search?pretty' -d '{ (1)
"query" : {
"join" : { (2)
"indices" : ["company"], (3)
"on" : ["mentions", "id"], (4)
"request" : { (5)
"query" : {
"term" : {
"name" : "orient"
}
}
}
}
}
}'
1 | The parent indices (i.e. article ) |
2 | The join query clause |
3 | The child indices (i.e., company ) |
4 | The clause specifying the paths for join keys in both child and parent indices |
5 | The search request that will be used to filter out company (child set) |
The command should return you the following response with two search hits:
{
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "article",
"_id" : "1",
"_score" : 1.0,
"_source":{ "title" : "The NoSQL database glut", "mentions" : ["1", "2"] }
}, {
"_index" : "article",
"_id" : "3",
"_score" : 1.0,
"_source":{ "title" : "How to determine which NoSQL DBMS best fits your needs", "mentions" : ["2", "4"] }
} ]
}
}
You can also reverse the order of the join, and query for all the companies that are mentioned
in articles whose title matches nosql
:
$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/company/_search?pretty' -d '{
"query" : {
"join" : {
"indices" : ["article"],
"on": ["id", "mentions"],
"request" : {
"query" : {
"term" : {
"title" : "nosql"
}
}
}
}
}
}'
The command should return you the following response with three search hits:
{
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "company",
"_id" : "4",
"_score" : 1.0,
"_source":{ "id": "4", "name" : "MapR" }
}, {
"_index" : "company",
"_id" : "1",
"_score" : 1.0,
"_source":{ "id": "1", "name" : "Elastic" }
}, {
"_index" : "company",
"_id" : "2",
"_score" : 1.0,
"_source":{ "id": "2", "name" : "Orient Technologies" }
} ]
}
}