Getting started with your own data

The recommended way to get started with your own data is to follow the Siren Tutorial https://siren.io/getting-started/ The instructions in this section refer to an advanced production grade example.

This section describes and advanced scenario. Use this to learn how to connect/import data using JDBC database connections, or by using Logstash (typically to load live logs).

After you have connected your data, you can use Siren Investigate to create an initial data model, create dashboards, and explore your data in the Graph Browser.

This section includes an example of how to use complex normalized databases with Siren Platform.

Installing Siren Platform

Prerequisites

The minimum hardware requirements are:

  • x64 CPU with four processing units (cores)

  • 16GB RAM

  • 10GB free SSD disk space

We support the following operating systems:

  • Microsoft Windows (64-bit)

  • Linux 2.6.32 or later (x86-64)

We support the following browsers:

  • Google Chrome (recommended)

  • Mozilla Firefox

  • Microsoft IE 11

  • Microsoft Edge

You must install one of these Java versions:

  • Oracle JDK 8

  • OpenJDK 8

Ensure that the JAVA_HOME environment variable is set to the appropriate path. To set the JAVA_HOME environment variable, follow the instructions here. output-pdf If you want to connect an external datasource by using a JDBC connector, see JDBC Driver Installation and Compatibility.

For information about compatibility between versions of Siren Investigate, Siren Federate, and Elasticsearch, see the version compatibility sections.

Download the Siren platform

  1. Download Siren Platform from https://siren.io/downloads/.

  2. Complete the validation form, accept the license, and click Proceed.

Install Elasticsearch as a Windows service

Installing Elasticsearch and Investigate as a servier is only required if you want Investigate to start automatically at startup.
  1. Copy the elasticsearch folder and its contents from the ZIP archive you downloaded to your Program Files folder.

  2. Edit the elasticsearch.yml file in the %ProgramFiles%\elasticsearch\config folder.

  3. In the Path section, enter the data and log paths, for example:

    path.data: C:\Program Files\elasticsearch\data
    path.logs: C:\Program Files\elasticsearch\logs
  4. In the Network section, change the network.host to 127.0.0.1 and save the file.

  5. From the command prompt, enter:

    cd %ProgramFiles%\elasticsearch
    bin\elasticsearch-service install
  6. Open the Services management console (you can enter services.msc at the command prompt).

  7. Locate the  Elasticsearch service and change Startup Type to Automatic.

  8. Right-click the service and select Start.

Install Elasticsearch as a Linux service

  1. Create a system user for the service, for example adduser --system elasticsearch.

  2. Copy the elasticsearch folder and its contents from the ZIP archive you downloaded to the /opt folder and then set the permissions for the system user, for example sudo chown -R elasticsearch /opt/elasticsearch.

  3. Edit the elasticsearch.yml file in the /opt/elasticsearch/config folder.

  4. In the Path section, enter the data and log paths, for example:

    path.data: /opt/elasticsearch/data
    path.logs: /opt/elasticsearch/logs
  5. In the Network section, change the network.host to 127.0.0.1 and save the file.

  6. From the command prompt, as root enter:

    cat <<EOF >/opt/elasticsearch.environment
    ES_JAVA_OPTS="-Xms4g -Xmx4g"
    EOF
    
    cat <<EOF >/etc/systemd/system/elasticsearch.service
    [Unit]
    Description=Elasticsearch (Siren)
    After=network.target auditd.service
    
    [Service]
    WorkingDirectory=/opt/elasticsearch
    EnvironmentFile=-/opt/elasticsearch.environment
    ExecStart=/opt/elasticsearch/bin/elasticsearch
    KillMode=process
    Restart=on-failure
    RestartPreventExitStatus=255
    Type=simple
    User=elasticsearch
    LimitMEMLOCK=infinity
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    Alias=elasticsearch.service
    EOF
    
    echo "vm.max_map_count = 262144" > /etc/sysctl.d/99-elasticsearch.conf
    sysctl -p /etc/sysctl.d/99-elasticsearch.conf
    ln -s ../elasticsearch.service /etc/systemd/system/multi-user.target.wants/
    systemctl daemon-reload
    systemctl start elasticsearch

Install Siren Investigate as a Windows service

Installing Siren Investigate as a service with Windows requires use of the third-party tool NSSM (https://nssm.cc/download). Because it configures services, anti-virus software may identify it as "riskware". However, an SHA checksum and source code are provided. You can verify the checksum using the Microsoft File Checksum Integrity Verifier (https://www.microsoft.com/en-us/download/details.aspx?id=11533).

  1. Copy the siren-investigate folder and its contents from the Siren platform ZIP archive you downloaded to your %ProgramFiles% folder.

  2. Copy the nssm.exe program from the win64 folder in the NSSM ZIP archive you downloaded to the %ProgramFiles%\siren-investigate\bin folder.

  3. Set the INVESTIGATE_HOME environment variable to %ProgramFiles%\siren-investigate.

  4. From the command prompt, enter %ProgramFiles%\siren-investigate\bin\nssm install "Siren Investigate".

  5. In the Application Path box, enter %ProgramFiles%\siren-investigate\bin\investigate.bat.

  6. In the Startup directory box, enter %ProgramFiles%\siren-investigate.

  7. On the Details tab, in the Display name box, enter Siren Investigate.

  8. On the Dependencies tab, in the box enter elasticsearch-service-x64.

  9. Click Install service.

  10. Open the Services management console (you can enter services.msc at the command prompt).

  11. Locate the Siren Investigate  service, right-click it and select Start .

Install Siren Investigate as a Linux service

  1. Create a system user for the service, for example adduser --system siren.

  2. Copy the siren-investigate folder and its contents from the ZIP archive you downloaded to the /opt folder and then set the permissions for the system user, for example sudo chown -R siren /opt/siren-investigate.

  3. From the command prompt, as root enter:

    cat <<EOF >/etc/systemd/system/siren.service
    [Unit]
    Description=Siren Investigate
    After=network.target auditd.service
    
    [Service]
    WorkingDirectory=/opt/siren-investigate
    EnvironmentFile=-/opt/siren.environment
    ExecStart=/opt/siren-investigate/bin/investigate
    KillMode=process
    Restart=on-failure
    RestartPreventExitStatus=255
    Type=simple
    User=siren
    
    [Install]
    WantedBy=multi-user.target
    Alias=siren.service
    EOF
    
    ln -s ../siren.service /etc/systemd/system/multi-user.target.wants/
    systemctl daemon-reload
    systemctl start siren

Test your connection

In your browser, navigate to \http:// localhost:5606/status. If the Elasticsearch and Siren Investigate services are running, the sign in screen is displayed.

Next steps

Import data either by using Logstash, by connecting to JDBC datasources, or by uploading Excel or CSV files.

Connecting to data from an external JDBC datasource

There are 2 ways to use data that is in remote JDBC datasource: * without importing it, by the use of "Virtual Indexes" which map directly to remote backends. * by "reflecting" it, that is having Siren copy the data locally

Prerequisites

Ensure that you have completed the installation as described in Installing Siren Investigate.

Check the list of supported databases in JDBC Driver Installation and Compatibility.

The schema that you want to connect to must be the default schema of the connection user.

Configuring the datasource

  1. To enable JDBC on a node where the Siren Federate plugin is installed, add the following setting to the elasticsearch.yml file:

    node.attr.connector.jdbc: true
  2. Create a directory named jdbc-drivers inside the configuration directory of the node. For example, create the directory in elasticsearch/config or etc/elasticsearch.

  3. Copy the JDBC driver to the jdbc-drivers directory.

  4. Restart the Elasticsearch service.

Connect your database to the Siren platform

  1. In Siren Investigate, navigate to Management > Datasources.

  2. Select JDBC from the Type box.

  3. Select the Database Type.

  4. Enter a display Name for the datasource in Siren Investigate.

  5. Enter the database Username and Password.

  6. Click Test connection. If the connection is successful, a dialog is displayed.

  7. Click No, will do later, then click SAVE.

Create a virtual index

  1. In Siren Investigate, navigate to Management > Virtual Indices.

  2. Select the Datasource name.

  3. Select the Resource name from the Datasource browser.

  4. Enter a valid lowercase Elasticsearch Virtual index name.

  5. (Optional) Enter a Primary key. This is mandatory if you want to see individual records e.g. on the graph or in the record table.

  6. Click SAVE. A dialog is displayed.

  7. Click Yes, take me there (see Creating an initial data model). Alternatively, click No, will do later, then click SAVE.

Importing data by using Logstash

The following section provides an example of how to load data sets into Siren Platform by using Logstash. This is typically what you want to do to stream live data (e.g. logs) to the Elasticsearch cluster.

In this example, however, we will use Logstash for a one off loading of a CSV file and a JSON file. You should adapt this example for use with your own data set.

The data sets used in the example contains millions of records. If you use these data sets, loading will take some time to complete.
While this training example uses Logstash for the CSV loading, this could have been accomplished also using the the CSV upload UI (Data Reflection application in the sidebar) .

Prerequisites

The example uses publicly available data from Companies House. If you want to try it for yourself, you can download:

Extract the CSV and TXT files. Edit the example scripts to match the path and file names.

Create a configuration file for the company data

Create a plain text file with the following content:

input {
 file {
   path => "<location of BasicCompanyDataAsOneFile-date.csv>"
   start_position => beginning
 }
}
filter {
   csv {
separator => ","
autodetect_column_names => true
autogenerate_column_names => true
   }
}
output {
   elasticsearch {
       hosts => ["127.0.0.1:9220"]
       index => "company"
   }
}

Edit the path to match the location of the CSV file and save it as logstash_csv.conf in the same path as the data set.

Create a configuration file for the person with significant control data

Create a plain text file with the following content:

input {
 file {
   type => "json"
   path => "<location of persons-with-significant-control-snapshot-date.txt>"
   start_position => beginning
 }
}
filter {
 json {
   source => "message"
 }
 mutate {
   uppercase => [ "data[name]" ]
 }
}
output {
   elasticsearch {
       hosts => ["127.0.0.1:9220"]
       index => "persons-control"
   }
}

Edit the path to match the location of the TXT file and save it as logstash_json.conf in the same path as the data set.

Load the data

From the command prompt, navigate to the logstash/bin folder and run Logstash with the configuration files you created earlier. For example:

logstash -f C:\data\logstash_csv.conf
logstash -f C:\data\logstash_json.conf
You can speed up the import process by installing a second instance of Logstash and running the imports concurrently.

Next steps

  1. (Optional) Connect an external datasource with Siren Federate.

  2. Create a data model (ontology).

Creating an initial data model

You can create a data model, also known as an ontology, by defining relations between indexes. This effectively treats indexes as classes and records as entities.

  1. In Siren Investigate, navigate to Management > Data Model.

  2. Click Create Index Pattern Search.

  3. Enter the index name in the Index pattern id box. This is either the name of an existing index on Elasticsearch or the name that you have defined for the virtual index that connects an external table.

  4. Click Save.

Create a relationship

Relationships are defined from a class to other classes. However, it is not possible to define a relationship between two entity identifiers.

A relationship is defined as a join operation between two indexes with:

  • The field of the local index to join on.

  • The class (index pattern or entity) to connect to.

  • (If the class is an index pattern) the field of the index to join with.

  • The label of the relation.

The examples given here are from the Loading CSV and JSON data sets with Logstash quick start guide.

  1. Click Management (image).

  2. Click Data Model.

  3. Click an Index Pattern, for example company.

  4. In the Relations tab, click Add relation.

  5. Select a Field in the Source Entity, for example CompanyName.keyword.

  6. Select a Target Entity. This can be an index pattern or an entity identifier, for example persons-control.

  7. If you selected an index pattern as the Target Entity, select a Field, for example data.name.keyword.

  8. Enter a short description of the relationship in the Labels boxes. For example, CompanyName.keyword in the company index pattern "is owned by" data.name.keyword in the persons-control index pattern and data.name.keyword "owns" CompanyName.keyword.

  9. Click Save.

By default, the join type is automatic. You can click Edit to manually set the Join type and Relation join task timeout.

You can click the Graph View tab to show a graphical representation of the relationship with the currently selected class highlighted.

Create an entity identifier

Entity identifiers enable you to navigate between two or more indexes without requiring a direct relationship between them. They also act as a central node element when doing graph analysis.

For example, you may have many indexes with IPs in multiple roles (source, destination) and want to join them with other roles and indexes.

  1. Click Management (image).

  2. Click Data Model.

  3. Click Create Entity Identifier.

  4. Enter an Entity identifier name.

  5. Enter a Short Description.

  6. Enter a Long Description.

  7. Select an Icon.

  8. Select a Color.

  9. Click Save.

For more information about entity identifiers, see Creating an index pattern search.

Connect an entity identifier to the data model

This example uses the Companies House data set.

  1. Create an entity identifier with the ID PostCode as described in the previous section.

  2. From the Relations tab, click Add relation.

  3. Using the boxes, set the relationship so that the source entity is owned by the target entity and the target entity owns the source entity.

  4. Select company from the index box.

  5. Select RegAddress.PostCode from the Field box.

Next steps

  1. Create dashboards.

  2. Add a Graph Browser visualization to a dashboard.

Creating dashboards

A dashboard displays a set of saved visualizations in a grid layout that you can customize. It requires at least one visualization (for more information, see Visualizations ). You can save a dashboard to share or view at a later time.

Click Dashboard (image) to view the first dashboard in the list. You can drag and drop dashboards to change the order of the list.

Generate a dashboard

  1. Click Discover (image).

  2. Click New.

  3. Click Autoselect Most Relevant.

  4. Click Generate Dashboard.

  5. Click Create.

  6. Click OK.

Create a dashboard

  1. In Siren Investigate, click Dashboard (image).

  2. Click Create new dashboard (image).

  3. Enter a unique name for the dashboard in the box.

  4. (Optional) Select a saved search. Typically, dashboards without a saved search are used only for cross-index summary pages.

  5. Click Create.

Add visualizations to a dashboard

  1. Click Dashboard (image) to display the dashboards list.

  2. Click the dashboard in the list and then click Edit.

  3. Click Add to display the available visualizations. You can filter the list of visualizations by typing a filter string into the Visualization Filter box.

  4. In the list, click a visualization to add it to your dashboard. The visualization you select appears in a container on your dashboard.

  5. (Optional) Click Options, then select or clear the Use dark theme and Hide borders checkboxes to configure how the dashboard is displayed.

Configure a container

The visualizations in your dashboard are stored in containers that you can resize and arrange on the dashboard.

To move a container around the dashboard, drag and drop the container’s header. Other containers will move as required to make room.

To resize a container, move the mouse pointer over the lower right corner of the container until the cursor changes to the resize pointer then click and drag to the required size.

To remove a container, click Remove (image). Removing a container from a dashboard does not remove the saved visualization in that container.

Save a dashboard

  1. Click Save.

  2. (Optional) Select Store time with dashboard to change the time filter to the currently selected time each time the dashboard is loaded.

  3. (Optional) If you did not add a saved search when you created the dashboard, you can do so now.

  4. Click Save.

Dashboards can be saved with specific filters, custom queries and specific time ranges. You can click Reset (image) to reset these properties to their saved state for all dashboards.

Share a dashboard

You can share dashboards with other users by sending a link or by embedding them into HTML pages.

Ensure that your Siren Investigate installation is properly secured when sharing a dashboard on a public facing server. To view shared dashboards users must be able to access Siren Investigate; keep this in mind if your Siren Investigate instance is protected by an authentication proxy.
  1. Click Share to display the Sharing panel.

  2. Click Copy to copy the native URL or embedded HTML to the clipboard. The Share Snapshot section contains shortened versions of the URLs in the Share saved dashboard section.

Using the Graph Browser

Graph Browser is a tile that you can add to dashboards. The Graph Browser displays Elasticsearch documents as nodes, and Siren Investigate relations as links of a graph.

Before you begin, we recommend that you watch our Graph Browser training video.

Create a graph dashboard

  1. In the Dashboards sidebar, click the Create new dashboard icon (image).

  2. Click Add, then click Graph Browser and drag the lower right corner of the tile to fill the view.

  3. Click Add all available lens and contextual scripts.

  4. Click the Play icon (image) at top left of the screen.

  5. Click Save, and name it General Graph Browser. Then click Save and Add to Dashboard.

Save your changes

When you have finished with the Graph Browser you can click Save to save the dashboard for future use.

You must save the dashboard before you use it.

Filter a datasource

Before you enter data into the Graph Browser, you should filter the datasource that you will use to produce a manageable number of results.

  1. Open an existing dashboard.

  2. Click Add a filter.

  3. Select a field to Filter by and then select an option from the list:

    • is

    • is not

    • is one of

    • is not one of

    • exists

    • does not exist

  4. Enter a Value to match or click Edit Query DSL to use an Elasticsearch Query then click Save.

Add data from another dashboard

  1. Open the Graph Dashboard.

  2. In the Graph Browser tile, click +Add and select a dashboard from the Add from another dashboard list. You can repeat this step to add data from other dashboards.

Navigate the graph

The number of connections to each node is shown. You can double-click a node to drill down into the data.

To move in or out of the graph, use the mouse scroll wheel or the slider at the top left of the Graph Browser window.

Click the icon above the slider to toggle between select and panning mode. In select mode you can select nodes by dragging. In panning mode, clicking and dragging enables you to move the nodes around in the window. You can also pan by using the direction icons above the slider.

If you open a large node you will be prompted to confirm that you want to open all the child nodes or only a selection of them.

You can click standard or hierarchy to arrange the nodes.

You can apply filters from existing dashboards by clicking the Expand box and selecting the required dashboards.

To expand a node or set of nodes, select the required nodes and click Expand. You can also select one or more nodes, right click and select Expand by relation from the context menu.

You can click Toggle map mode or Toggle timeline mode to change how the data is displayed.

You can click Toggle relation direction to change which relationships are displayed.

You can click Toggle node highlight to toggle dimming of nodes that are not selected.

Select nodes

Right click anywhere on the graph to display the context menu. From here you can choose:

  • Select - By Edge Count

  • Replace Investment with edge (works only with Siren Investigate Demo data).

  • Shortest Path

  • Select - All

  • Expand by top comention

  • Select - Invert

  • Select - Extend

  • Select - By Type

  • Show nodes count by type

  • Select - By Entity

  • Expand by relation

You can press Del to remove selected nodes from the graph.

You can click Crop to remove all but the selected nodes from the graph.

You can click the Undo or Redo icons to step backward or forward through your changes.

Use lenses and selection

Click the Toggle Sidebar to display the Lenses and Selection tabs.

The Lenses tab enables you apply visual filters to the data displayed in the graph.

  1. From the Lenses tab, select Add a lens > Advanced > Advanced lens.

  2. Enter a unique Lens name.

  3. Select the Active checkbox to enable the lens. When the Live update check box is selected, changes you make to the lens are shown immediately in the Graph Browser. If the check box is cleared you can click the Apply lens parameters icon to update the Graph Browser.

  4. In the Parameters section, select an Entity Type.

  5. Select a match condition:

    • Always.

    • Only for the selected elements.

    • Only if the condition is true.

      If you selected Only if the condition is true, enter a condition in the box.

  6. Select the property to set from the list:

    • Color (string)

    • Node font icon (string)

    • Node glyphs (array of glyphs)

    • Hidden (Boolean)

    • Label (string)

    • Location (string)

    • Node image (string)

      Node icons that link to web images are not always shown properly due to security restrictions. You may need to configure the Image Proxy feature to display them.
    • Size (number)

    • Time (string)

    • Tooltip (string)

  7. Enter the property in the box then click OK. For example, using the Companies data set select Color, then select SICCode.SicText_1.

The Selection tab displays a list of the currently selected nodes.

Enter a string in the search box to show results from all the matching records in the current selection.

The first column on the left enables you to select or deselect individual nodes. You can click the column head to select or deselect all nodes.

For each field, you can enter a string to match from the selection in the box under the column heading.

You can click Reset column and global filters (image) to reset all filters.

What is the most useful form of data to load in Siren? (Using complex normalized databases)

It is often the case, especially in relational databases that the data is highly normalized. In these situations, to increase the value for the user it makes a lot of sense to create semi-denormalized views. Typically, the right level of abstraction is the entity level, in other words, creating views which reflect useful representations of the entities in the domain. As an example of this, here you can see how our demo distribution was created. While the user sees only four indexes, which represent the entities which make sense in the investment domain (articles, companies, investments, and investors), the original data is much more normalized as per the following structure.