Glossary

Many of the terms that are used in the Siren Federate documentation are also used in Elasticsearch. For more information, see the Elasticsearch glossary.

action: The type of request that can be executed on a cluster or an index. Actions are controlled and limited by user role permissions. For more information, see Configuring security for Siren Federate.

API: The acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.

broadcast join: A distributed join execution strategy, which copies the child set and duplicates it across every node of the cluster.

child set: During a join of indices A and B, a search is performed against index A as it is filtered by its relation to index B. In this example, the child set is index B (the filtering set) and the parent set is index A (the filtered set). Note: A set of documents can come from multiple indices.

cluster: One or more nodes that share the same cluster name.

datasource: An external source of data, such as a remote Elasticsearch cluster or a MySQL database behind an Avatica server. For more information, see Connecting to remote datasources.

document: A JSON document that is stored in Elasticsearch. A document is like a row in a table in a relational database. Each document is stored in an index and has a unique identifier associated with it.

Federate cluster: An Elasticsearch cluster that has the Siren Federate plugin installed.

federation: The process that maps different external database systems into a unified API so that it can be used for business intelligence (BI) or other analysis.

hash join: A distributed join execution strategy, where the two data sets are partitioned using a hash function across every node of the cluster, and where a hash table is used to find matching rows between the two inputs.

index: An optimized collection of JSON documents. An index is a logical namespace that maps to one or more primary shards and can have zero or more replica shards.

inner join: Enables the projection of arbitrary fields (including script fields and document’s scores) from the child set, B, and combines them with the parent set, A. The projected fields and associated values of a document from set B are mapped to all of the documents from set A that satisfy the join condition. The result of the join is the parent set, A, augmented by the projected fields from the child set, B. See also, parent set, child set.

I/O: Disk I/O and caching occurs when the database engine reads and writes blocks containing records to and from a disk into memory. The next time the engine needs that block, it can access it from memory, rather than reading it from the disk.

join: A binary operator that is used to combine data from two sets of documents. The result of a join is the set of all combinations of documents in the two sets of documents that are equal on their common attribute names. For information about the different join strategies that are available, see Configuring joins by type.

join query: The type of query syntax to use when you want to perform a join. See also, query. For more information, see Query DSL .

left-side set: See parent set. Also known as the 'left index'.

node: An instance of Elasticsearch that belongs to a cluster. A node can combine different roles, such as a master-eligible node, a data node, an ingestion node, a transformation node, or a machine-learning (ML) node.

parallelization: A method of processing, whereby many operations are performed simultaneously - as opposed to serial processing, in which the computational steps are performed sequentially. Parallelization improves system performance through the simultaneous processing of various operations, such as loading data, building indexes, and evaluating queries.

parent set: During a join of indices A and B, a search is performed against index A as it is filtered by its relation to index B. In this example, the parent set is index A (the filtered set) and the child set is index B (the filtering set). Note: A set of documents may come from multiple indices.

partitioning: The process of breaking data in a database down into partitions. Each piece of data resides in exactly one partition. Partitioning is performed to ensure scalability, as entire data might not fit into a single node. Different partitions can reside on different nodes and each node can serve the queries with its own partition. See also, shard.

primary shard: Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard.

query: A request for information from Elasticsearch. A query represents a question, which is written in a way that Elasticsearch understands. A search consists of one or more queries combined.

reflection: The import and mapping of data to Elasticsearch from external datasources. A reflection is a recurrent and fully-managed ingestion that replicates the data from a datasource into an Elasticsearch index. See also, datasource.

replica shard

Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:

Increased failover: A replica shard can be promoted to a primary shard if the existing primary shard fails.
Improved performance: The get and search requests can be handled by primary or replica shards.

right-side set: See child set. Also known as the 'right index'.

routing join: A distributed join execution strategy, which uploads the child set's tuples to specific nodes of the cluster. Those nodes are the ones hosting the parent set's shards that may contain a join match for the tuples given Elasticsearch document routing.

semi-join: Filters the parent set (A), based on the child set (B). A semi-join returns the documents of A that satisfy the join condition with the documents of B. This is equivalent to the EXISTS() operator in SQL.

shard: A partition of an index in Elasticsearch. Each shard is held on a separate node to spread load. See also, partitioning and primary shard.

tuple: A single row that is composed of one or more columns, where one column is mapped to one field of a document. For example, a tuple can be a row that is composed of two elements, such as the document identifier and the key value of the join condition. If a document has a multi-valued field, this will generate as many tuples as there are values.
virtual index: An Elasticsearch index that is created by the Federate plugin when mapping remote Elasticsearch clusters. The virtual index that is created does not contain the data itself. Instead, it contains information about the data source and its metadata. It is then used in search and get queries as any other index would be. For more information, see Connecting to remote datasources.