Using ELK to Monitor SOE Infrastructure – Part I: Architecture and Configuration

The Problem

A typical Red Hat Standard Operating Environment deployment consists of Red Hat Satellite and Ansible Tower, along with supporting technologies such as Gitlab or Github. These software products produce copious logging information detailing events such as:

  • login attempts
  • API calls
  • Playbook runs
  • Host deployment
  • Host updates

to name but a few. The logs are, by default, local: on the filesystem in the case of Red Hat Satellite, and stored in a database in the case of Ansible Tower. While this may be acceptable for small scale deployments in a large-scale corporate deployment we need to be able to do the following:

  • Visualise time-series data, such as deployment of servers and patching frequency
  • Highlight and identify operational failures, such as playbook run failures.
  • Enable incident tracing across multiple infrastructure systems, for example tracing multiple events related to a deployment failure in both Ansible Tower and Red Hat Satellite by correlating timestamps.
  • Generating metrics to track performance and business benefits of the SOE infrastructure

The Solution

The Open Source ELK stack addresses the use cases described above. It consists of three components:

  • Logstash is a log collector that can receive logs from client system in different formats and using a variety of transport protocols. It ingests logs from the client systems, optionally performs transformations on them, and forwards them to other systems such as Elasticsearch.
  • Elasticsearch is a database that is optimised for the storage and indexing of log data.
  • Kibana is a WebUI that can be used to build sophisticated visualisations of log data stored in an Elasticsearch database.

A typical enterprise deployment of the ELK stack is shown in the diagram below:

We will not go into detail on how the ELK stack is deployed and configured, as this is likely to already be in place in a corporate environment and in any case is already well covered in articles such as this.

We should note the following:

  1. Two (or more) Logstash servers are typically deployed, in order to provide high availability. This is also the case with the Elasticsearch and Kibana servers, however for simplicity these multiples are not shown.
  2. Both the Red Hat Satellite and Gitlab server send their logs to Logstash using the Filebeat protocol. Filebeat is typically used to read logfiles from a filesystem and send them to Logstash. It requires the installation of a small client program on each server that will be shipping logfiles. Note that the filebeat client can automatically load balance across multiple Logstash servers.
  3. Ansible Tower does not log to the filesystem but rather to its own database so Filebeat is not applicable yet. Ansible Tower can be configured to send logs over HTTPS to a remote logserver and it is this service that we will use. Note that it is not possible to configure multiple log servers in Ansible Tower, so in order to maintain high availability, we need to configure a loadbalancer between Ansible Tower and the Logstash servers. Configuration of this loadbalancer is not configured here, however any HTTPS loadbalancer in passthrough mode with persistent connections will work.

Configuration

In order to effect the architecture installed above, we need to configure Ansible Tower, Filebeat on Red Hat Satellite and Gitlab, and finally Logstash itself.

Ansible Tower Configuration

The following shows the configuration within Ansible Tower. Note that the entry for Logging Aggregator should point to the load balancer if one is being used.

We also add an audit user, so that logstash can interrogate the Tower Metrics API endpoint:

Red Hat Satellite Configuration

After installing Filebeat on the Satellite server, the following configuration file will ship the Satellite production.log to the Logstash server:

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /var/log/foreman/production.log  
output.logstash:
  hosts: ["logstash1.libvirt.oldstables:5044","logstash2.libvirt.oldstables:5044"]

Note that all the Logstash servers are specified in the hosts line and Filebeat will choose one at random to use. If the Logstash server stops responding then it will failover to the next server. This configuration only ships production.log, however can be easily extended to ship other log files.

Gitlab Configuration

The Gitlab configuration is virtually identical, save for the path to the file being shipped:

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /var/log/gitlab/gitlab-rails/production.log  
output.logstash:
  hosts: ["logstash1.libvirt.oldstables:5045","logstash2.libvirt.oldstables:045"]

Other Client Configurations

If the SOE deployment uses additional applications, for example Jenkins, the Filebeat configurations above can be used as a template for configuring log shipping.

Logstash Configuration

In a typical corporate environment, a request will usually have to be made to the ELK administrators to setup the relevant Logstash configuration. An example Logstash configuration is given below, this configuration would generally be identical on all servers where multiple Logstash servers are used:

/etc/logstash/conf.d/satellite.conf:

input {
  beats {
      port => 5044
  }
}

output {
  elasticsearch {
    hosts => ["https://elasticsearch1.libvirt.oldstables:443","https://elasticsearch2.libvirt.oldstables:443"]
    user => "logstash_internal"
    password => "myPassw0rd"
    ilm_enabled => false
    manage_template => false
    index => "redhat-satellite-%{+YYYY.MM.dd}"
  }
  file {
    path => "/var/log/logstash/satellite.log"
  }
}

This is a simple Filebeat pipeline confiiguration that listens on port 5044 and forwards the log messages to the Elasticsearch servers specified in the hosts line, connecting with the specified credentials. Additionally, the log messages are sent to a local log file for testing purposes.

/etc/logstash/conf.d/gitlab.conf:

input {
  beats {
      port => 5045
  }
}

output {
  elasticsearch {
    hosts => ["https://elasticsearch1.libvirt.oldstables:443","https://elasticsearch2.libvirt.oldstables:443"]
    user => "logstash_internal"
    password => "myPassw0rd"
    ilm_enabled => false
    manage_template => false
    index => "gitlab-%{+YYYY.MM.dd}"
  }
  file {
    path => "/var/log/logstash/gitlab.log"
  }
}

The configuration for the gitlab pipeline is virtually identical to that for the satellite pipeline. Note that a different port is specified as pipelines cannot share ports.

/etc/logstash/conf.d/tower.conf:

input {
  http {
    port => 8085
    user => "tower_logger"
    password => "l0gst4sh_p4ssw0rd"
    ssl => true
    ssl_certificate => "/etc/logstash/tower.cert"
    ssl_key => "/etc/logstash/tower.key"
    response_headers => {
      "Access-Control-Allow-Origin" => "*"
      "Content-Type" => "application/json"
      "Access-Control-Allow-Headers" => "Origin, X-Requested-With, Content-Type, Accept"
    }   
  }
  http_poller {
    urls => {
      tower => {
        url => "https://ansible.oldstables/api/v2/metrics"
        method => get
        user => "logstash_user"
        password => "aUd1t_p4ssw0rd"
        headers => {
          Accept => "application/json"
        }
      }  
    }
    request_timeout => 60
    schedule => {cron => "* * * * * UTC"}
    codec => "json"
  }
}

output {
  elasticsearch {
    hosts => ["https://elasticsearch1.libvirt.oldstables:443","https://elasticsearch2.libvirt.oldstables:443"]
    user => "logstash_internal"
    password => "myPassw0rd"
    ilm_enabled => false
    manage_template => false
    index => "tower-%{+YYYY.MM.dd}"
  }
  file {
    path => "/var/log/logstash/tower.log"
  }
}

There are two input plugins configured here. The first is the http listener used to allow Tower to stream events to logstash. The user and password set here should be entered into the Tower logging configuration screen. Tower uses HTTPS rather than Filebeat and therefore we configure the http listener. As Tower will ship using HTTPS (plaintext HTTP is not an option) we need to configure a TLS certificate. This might be provided by a CA, or a self-signed one can be quickly generated using:

openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout /etc/logstash/tower.key -out /etc/logstash/tower.cert -subj /CN=localhost

The second input plugin is an http_poller which allows logstash to poll the Tower metrics API endpoint. Authentication here is configured using the audit user created in Tower earlier. Note that if you are using a self-signed certificate on your Tower, you will need to either switch off strict certificate checking, or trust the certificate – how to do this depends on the version of Logstash that you are using so consult the http_poller documentation for your version here.

/etc/logstash/pipelines.yml:

- pipeline.id: tower
  path.config: "/etc/logstash/conf.d/tower.conf"
- pipeline.id: satellite
  path.config: "/etc/logstash/conf.d/satellite.conf"
- pipeline.id: gitlab
  path.config: "/etc/logstash/conf.d/gitlab.conf"

This file configures 3 pipelines using the 3 configuration files.
NB It is possible to roll the two Filebeat pipelines into one, listening on a single port, however for the purposes of simplicity we keep them separate here.

/etc/logstash/logstash.yml:
In this file we will usually want to set:

queue.type: persisted

This ensures that log messages are persisted to disk on the Logstash servers, which will ensure that messages are not lost due to out of memory errors in the event of a persistent network outage between Logstash and Elasticsearch

Securing the Configuration

The following configuration files should be set to ownership logstash.logstash and 0600 permissions:

/etc/logstash/conf.d/*
/etc/logstash/tower.key

Testing the Configuration

Log messages should be visible streaming into the local /var/log/logstash/tower.log files on the Logstash servers.
NB For each pipeline, only one Logstash server will be receiving log messages, so if you have multiple Logstash servers make sure to check all of them. Typical Tower log messages are shown below:

"host":"0:0:0:0:0:0:0:1","cluster_host_id":"localhost","@timestamp":"2020-11-17T16:16:29.200Z","level":"WARNING","@version":"1","headers":{"request_path":"/","content_type":"application/json; charset=utf-8","request_method":"POST","http_host":"localhost:8085","content_length":"271","http_accept":"*/*","http_version":"HTTP/1.1","http_user_agent":null},"message":"scaling up worker pid:1411","stack_info":null,"tower_uuid":null,"logger_name":"awx.main.commands.run_callback_receiver"}
{"host":"0:0:0:0:0:0:0:1","cluster_host_id":"localhost","@timestamp":"2020-11-17T16:16:29.504Z","level":"WARNING","@version":"1","headers":{"request_path":"/","content_type":"application/json; charset=utf-8","request_method":"POST","http_host":"localhost:8085","content_length":"249","http_accept":"*/*","http_version":"HTTP/1.1","http_user_agent":null},"message":"scaling up worker pid:1415","stack_info":null,"tower_uuid":null,"logger_name":"awx.main.dispatch"}
{"host":"0:0:0:0:0:0:0:1","cluster_host_id":"localhost","@timestamp":"2020-11-17T16:16:29.573Z","level":"WARNING","@version":"1","headers":{"request_path":"/","content_type":"application/json; charset=utf-8","request_method":"POST","http_host":"localhost:8085","content_length":"249","http_accept":"*/*","http_version":"HTTP/1.1","http_user_agent":null},"message":"scaling up worker pid:1417","stack_info":null,"tower_uuid":null,"logger_name":"awx.main.dispatch"}
...

COMING SOON: Configuring Kibana Visualisations

Spread the love

Leave a Reply

Your email address will not be published. Required fields are marked *