Importing data by using Logstash
This process is for advanced users. For a simpler approach, try importing data from a spreadsheet. |
If you want to stream live data, such as logs, into the Elasticsearch cluster, then you can import data by using Logstash. You can also import CSV or JSON files in this way.
The following section contains an example of how to configure Logstash for importing data. You can adapt this example for use with your own data set.
The data sets used in the example contains millions of records. If you use these data sets, loading will take some time to complete. |
Before you begin
-
(Optional) To walk through this example before using your own data, download the following publicly-available files:
-
Download company data as one CSV file from this Web page.
-
Download 'person of significant control' data as one JSON file from this Web page.
-
-
Extract the
.csv
and.txt
files. -
Open the example scripts and edit them to match the path and file names.
Creating configuration files
-
Create a plain text file and enter the following content:
input { file { path => "<location of BasicCompanyDataAsOneFile-date.csv>" start_position => beginning } } filter { csv { separator => "," autodetect_column_names => true autogenerate_column_names => true } } output { elasticsearch { hosts => ["127.0.0.1:9220"] index => "company" } }
-
Edit the path to match the location of the
.csv
file and save it aslogstash_csv.conf
in the same path as the data set. -
Create another plain text file and enter the following content:
input { file { type => "json" path => "<location of persons-with-significant-control-snapshot-date.txt>" start_position => beginning } } filter { json { source => "message" } mutate { uppercase => [ "data[name]" ] } } output { elasticsearch { hosts => ["127.0.0.1:9220"] index => "persons-control" } }
-
Edit the path to match the location of the
.txt
file and save it aslogstash_json.conf
in the same path as the data set.
Loading the data
From a command prompt, navigate to the logstash/bin
folder and run Logstash with the configuration files that you created.
For example, run the following commands:
logstash -f C:\data\logstash_csv.conf
logstash -f C:\data\logstash_json.conf
To speed up the import process, you can install a second instance of Logstash and run the imports concurrently. |
Next steps
Add relations between the entities in your data.