Siren Platform User Guide

Importing data by using Logstash

The following section provides an example of how to load data sets into Siren Platform by using Logstash. You should adapt this example for use with your own data set.

Note

The data sets used in the example contains millions of records. If you use these data sets, loading may take a long time to complete.

Prerequisites

The example uses publicly available data from Companies House. If you want to try it for yourself, you can download:

Extract the CSV and TXT files. Edit the example scripts to match the path and file names. 

Create a configuration file for the company data

Create a plain text file with the following content:

input {
 file {
   path => "<location of BasicCompanyDataAsOneFile-date.csv>"
   start_position => beginning
 }
}
filter {
   csv {
separator => ","
autodetect_column_names => true
autogenerate_column_names => true
   }
}
output {
   elasticsearch {
       hosts => ["127.0.0.1:9220"]
       index => "company"
   }
}

Edit the path to match the location of the CSV file and save it as logstash_csv.conf in the same path as the data set.

Create a configuration file for the person with significant control data

Create a plain text file with the following content:

input {
 file {
   type => "json"
   path => "<location of persons-with-significant-control-snapshot-date.txt>"
   start_position => beginning
 }
}
filter {
 json {
   source => "message"
 }
 mutate {
   uppercase => [ "data[name]" ]
 }
}
output {
   elasticsearch {
       hosts => ["127.0.0.1:9220"]
       index => "persons-control"
   }
}

Edit the path to match the location of the TXT file and save it as logstash_json.conf in the same path as the data set.

Load the data

From the command prompt, navigate to the logstash/bin folder and run Logstash with the configuration files you created earlier. For example:

logstash -f C:\data\logstash_csv.conf
logstash -f C:\data\logstash_json.conf

Tip

You can speed up the import process by installing a second instance of Logstash and running the imports concurrently.

Next steps
  1. (Optional) Connect an external datasource with Siren Federate.

  2. Create a data model (ontology).