High Availability (HA) for node clusters

Siren Alert supports High Availability (HA) for reporting on node clusters. This provides continued service of the alerting system when a cluster’s master node fails, by switching the master’s responsibilities to another node. The reporting functionality is muted on all but the master node, preventing duplicate reports being sent for each alert.

Functional overview

In a cluster, there is only one master node—all other nodes are slaves. If the master is down, the first slave that detects this is elected the new master.

Time is taken into account to define master and slave statuses, and identify dead nodes; this time is represented as the number of seconds since Unix Epoch. The master has priority (priority_for_master) and all slaves watch the current master with a specific timeout (loop_delay). If the master does not update its time within the specified period of time (absent_time), it is considered offline, and an election of a new master takes place. All nodes which are absent for a specified period of time (absent_time_for_delete) are considered dead and are deleted from memory.

Siren Alert cluster setup

Cluster configuration with High Availability

sentinl:
  settings:
    cluster: # configuration for the cluster
      enabled: boolean               # (optional: default: false) enable / disable the cluster configuration
      debug: boolean                 # (optional: default: false) debug output in the Investigate console
      name: string                   # (optional, default: 'sentinl') name of the cluster configuration
      priority_for_master: number    # (optional, default: 0) master's node priority, see below host.priority
      absent_time_for_delete: number # (optional, default: 86400) how long before a node is removed from the cluster in seconds
      absent_time: number            # (optional, default: 15) how long the slaves wait for a response from the master before electing a new master in seconds
      loop_delay: number             # (optional, default: 15) how long between polls from slave to master in seconds
      cert: # configuration for security's certificate
        selfsigned: boolean          # (optional, default: true) if certificate is self-assigned
        valid: number                # (optional, default: 10) validation of certificate
        key: string | null           # (optional, default: undefined) path to key
        cert: string | null          # (optional, default: undefined) path to certificate
      gun: # configuration for each gun db host
        peers: string[]     # (required) contain urls to all gun db instances including this one
        host: string        # (optional, default: localhost) gun db host
        port: number        # (optional, default 9000) unique for each gun db host
                            # Note: Must be set to unique value when you run more than one Investigate process on the same machine
        cache: string       # (optional, default: 'optimize/gun-server-data.json') path to gun server db cache file
                            # Note: Must be set to unique value when you run more than one Investigate process from the same folder
      host: # host's configuration
        id: string          # (required) must be a unique ID
        priority: number    # (required) priority 0 = master, priority 1+ = slave
        name: string        # (optional, default: 'investigate-gun-host') name of the node
        node: string        # (optional, default: 'investigate-gun-hosts') name of node within gun DB
                            # Note: all gun db instance configurations in HA cluster must share same node name
        cache: string       # (optional, default 'optimize/gun-host-data.json' ) path to gun host db cache file
                            # Note: Must be set to unique value when you run more than one Investigate process from the same folder

Example of configuration

The following cluster topology is an example of HA configuration:

image

In the following configuration examples, the ellipsis (…​) indicates that the options here are identical to the options specified in the example above.

elasticsearch.yml

Host Trex

cluster.name: kibi-distribution
network.host: [_local_, _enp2s0_]
discovery.zen.minimum_master_nodes: 2
node.name: trex
discovery.zen.ping.unicast.hosts: ["172.126.0.5", "192.168.0.12"]

Host Velociraptor

...
node.name: velociraptor
discovery.zen.ping.unicast.hosts: ["10.42.0.2", "192.168.0.12"]

Host Spinosaurus

...
node.name: spinosaurus
discovery.zen.ping.unicast.hosts: ["10.42.0.2", "172.126.0.5"]

investigate.yml

Host Trex

sentinl:
  settings:
    cluster:
      enabled: true
      name: 'sentinl'
      priority_for_master: 0
      absent_time_for_delete: 86400
      absent_time: 15
      loop_delay: 5
      cert:
        selfsigned: true
        valid: 10
      gun:
        port: 9000
        host: '0.0.0.0'
        cache: 'data.json'
        peers: ['https://localhost:9000/gun', 'https://172.16.0.5:9000/gun', 'https://192.168.0.12:9000/gun']
      host:
        id: '123'
        name: 'trex'
        priority: 0
        node: 'hosts'

Host Velociraptor

...
    cluster:
      ...
      gun:
        ...
        peers: ['https://localhost:9000/gun', 'https://10.42.0.2:9000/gun', 'https://192.168.0.12:9000/gun']
      host:
        id: '456'
        name: 'Velociraptor'
        priority: 1
        node: 'hosts'

Host Spinosaurus

...
    cluster:
      ...
      gun:
        ...
        peers: ['https://localhost:9000/gun', 'https://10.42.0.2:9000/gun', 'https://172.16.0.5:9000/gun']
      host:
        id: '789'
        name: 'Spinosaurus'
        priority: 2
        node: 'hosts'