Sample transform pipelines

During the import process, you can specify an additional transform pipeline.

A pipeline is a definition of a series of processors that are to be executed in the same order as they are declared.

A pipeline consists of two main fields: a description and a list of processors. The pipeline is structured as follows:

{
  "description": "...",
  "processors": []
}

description: Contains a helpful description of what the pipeline does.

processors: Specifies a list of processors to be executed in order.

The following section contains some sample transform pipelines that will help you to get started.

Split fields

To split a string, separated by delimiter | into a list of sub-strings, and if no initial string exists, fill the target field with an empty string.

{
  "description": "_description",
  "processors": [
    {
      "split": {
        "on_failure": [
          {
            "set": {
              "field": "parents",
              "value": ""
            }
          }
        ],
        "field": "parents",
        "separator": "\\|"
      }
    }
  ]
}

Split fields to a "long"

To accomplish a similar goal, but this time convert each sub-string to a long, and if no value exists in the initial field, on failure set the target field to -1.

{
  "description": "_description",
  "processors": [
    {
      "split": {
        "on_failure": [
          {
            "set": {
              "field": "parents",
              "value": -1
            }
          }
        ],
        "field": "parents",
        "separator": "\\|"
      },
      "convert": {
        "field": "parents",
        "type": "long"
      }
    }
  ]
}

To extract text and create a new field (Using regex)

Extract the text between the first set of parentheses in the Title field and create a new field for it called Patent_ID.

You must first enable regex in the elasticsearch.yml file by setting the parameter to true: script.painless.regex.enabled: true

{
  "description": "extract the text between the first set of parentheses",
  "processors": [
    {
      "script": {
        "source": "def f = ctx['Title']; if(f != null){ def m= /\\((.*?)\\)/.matcher(f); m.find(); ctx.Patent_ID=m.group(1);)}"
      }
    }
  ]
}

Merge two fields to create a geo_point

Merge two fields that contain 'latitude' and 'longitude' values to create a single Elasticsearch geo_point field:

{
 "description": "Create geo point field",
 "processors": [
     {
      "drop": {
        "if": "ctx.latitude_field == null || ctx.longitude_field == null"
      }
    },
   {
     "set": {
       "field": "geo_location",
       "value": {
           "lat": "{{latitude_field}}",
           "lon": "{{longitude_field}}"
       }
     }
   }
 ]
}