Siren Platform User Guide

Datasource reflection pipelines

Pipelines may be used to enrich documents before they are indexed to Elasticsearch.

Examples:

To split a string, separated by delimiter "|" into a list of sub-strings, and if no initial string exists, fill the target field with an empty string

{
  "description": "_description",
  "processors": [
    {
      "split": {
        "on_failure": [
          {
            "set": {
              "field": "parents",
              "value": ""
            }
          }
        ],
        "field": "parents",
        "separator": "\\|"
      }
    }
  ]
}

To accomplish a similar goal, but this time convert each sub-string to a long, and if no value exists in the initial field, on failure set the target field to -1.

{
  "description": "_description",
  "processors": [
    {
      "split": {
        "on_failure": [
          {
            "set": {
              "field": "parents",
              "value": -1
            }
          }
        ],
        "field": "parents",
        "separator": "\\|"
      },
      "convert": {
        "field": "parents",
        "type": "long"
      }
    }
  ]
}

For enriching documents from a web service; it takes the value of the 'Abstract' field (path syntax) and posts a request {"text": abstract_value} to http://35.189.96.185/bio. If the Abstract field is null, an empty string will be sent instead (input_default). The JSON response object is used as the value of a new field Abstract_text_mined_entities at the top level ($) of the document

{
  "json-ws": {
    "resource_name": "siren-nlp",
    "method": "post",
    "url": "http://35.189.96.185/bio",
    "input_map": {
      "$.Abstract": "text"
    },
    "output_map": {
      "Abstract_text_mined_entities": "$"
    },
    "input_default": {
      "text": "''"
    }
  }
}

To extract the text between the first set of parentheses in the Title field and create a new field Patent_ID for it.

{
  "script": {
    "source": "def f = ctx['Title']; if(f != null){ def m= /\\((.*?)\\)/.matcher(f); m.find(); ctx.Patent_ID=m.group(1);)}"
  }
}

Note

You need to enable regex in the elasticsearch.yml file: script.painless.regex.enabled: true

Search results

    No results found