Use Prometheus, Grafana, and Consul to monitor job service metrics
This tutorial explains how to configure Prometheus and Grafana to integrate with a Consul service mesh deployed with Nomad. While this tutorial introduces the basics of enabling mesh telemetry, you can also use this data to customize dashboards and set autoscaling rules for alerting.
When deploying a service mesh using Nomad and Consul, one of the benefits is the ability to collect service-to-service traffic telemetry emitted by Envoy sidecar proxies. This includes data such as request count, traffic rate, connections, response codes, and more.
In this tutorial you will deploy Grafana and Prometheus within the mesh, setup intentions, and
configure an ingress to enable access. You will configure Consul service discovery for targets
in Prometheus so that services are automatically scraped as they are deployed. A
Consul ingress gateway will load-balance the Prometheus deployment and provide access to the
web interfaces of Prometheus and Grafana on ports 8081
and 3000
respectively.
Prometheus telemetry on Envoy sidecars
The Prometheus configuration can either be done directly in Consul using proxy-defaults or per service within the Nomad job specification. This tutorial will cover configuration within the Nomad jobspec.
For a point of comparison and reference, enabling proxy metrics globally in a Consul
datacenter can be done with the following configuration and the Consul CLI command
consul config write ./<path_to_configuration_file>
.
Prerequisites
For this tutorial, you will need:
- A Nomad environment with Consul installed. The Nomad project provides Terraform configuration to deploy a cluster on AWS.
Ensure that the NOMAD_ADDR
and CONSUL_ADDR
environment variables are set
appropriately.
Create the Nomad jobs
Use the jobspec files below to create jobs for:
- two web applications to simulate traffic flows between envoy proxies
- an ingress controller to monitor traffic coming into the mesh
- Prometheus to collect the envoy metrics
- Grafana to act as a visualization frontend for Prometheus
Create the foo
web application job
The first web application job configures a "foo" service. Take note of these three specific configurations.
A dynamic port to send traffic to Prometheus' default port of
9102
.A
meta
attribute set in theservice
block that uses the dynamic port set. This port will be present in the Consul service registration that Prometheus will use to discover the proxy.A
sidecar_service
to bind the Prometheus endpoint to the dynamic port.
Create a file with the name foo.nomad.hcl
, add the following contents to it, and save the file.
foo.nomad.hcl
job "foo" { datacenters = ["dc1"] type = "service" group "foo" { count = 1 network { mode = "bridge" port "expose" {}## 1. This opens up a dynamic port to the envoy metrics port "envoy_metrics" { to = 9102 } } service { name = "foo" port = 9090 ## 2. This is used by prometheus to interpolate the dynamic port meta { envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}" } check { expose = true type = "http" path = "/health" interval = "30s" timeout = "5s" } connect { sidecar_service { proxy { config { ## 3. Instruct envoy to enable prometheus metrics on /metrics envoy_prometheus_bind_addr = "0.0.0.0:9102" } upstreams { destination_name = "bar" local_bind_port = 9091 } } } } } task "foo" { driver = "docker" config { image = "nicholasjackson/fake-service:v0.26.0" } env { UPSTREAM_URIS = "http://127.0.0.1:9091" NAME = "foo" MESSAGE = "foo service" ERROR_RATE = "0.2" ERROR_DELAY = "0.3s" TIMING_VARIANCE = "10" } } }}
Submit the job to Nomad.
$ nomad job run foo.nomad.hcl
Create the bar
web application job
The bar
service jobspec is similar to the foo
service jobspec.
Create a file with the name bar.nomad.hcl
, add the following contents to it, and save the file.
bar.nomad.hcl
job "bar" { datacenters = ["dc1"] type = "service" group "bar" { count = 1 network { mode = "bridge" port "expose" {} port "envoy_metrics" { to = 9102 } } service { name = "bar" port = 9090 meta { envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}" } check { expose = true type = "http" path = "/health" interval = "30s" timeout = "5s" } connect { sidecar_service { proxy { config { envoy_prometheus_bind_addr = "0.0.0.0:9102" } } } } } task "bar" { driver = "docker" config { image = "nicholasjackson/fake-service:v0.26.0" } env { NAME = "bar" MESSAGE = "bar service" ERROR_RATE = "0.2" ERROR_DELAY = "0.3s" RATE_LIMIT = "10" RATE_LIMIT_CODE = "429" TIMING_VARIANCE = "20" } } }}
Submit the job to Nomad.
$ nomad job run bar.nomad.hcl
Create the ingress controller job
The ingress controller is a system job so it deploys on all client nodes.
Create a file with the name ingress-controller.nomad.hcl
, add the following contents to it,
and save the file.
ingress-controller.nomad.hcl
job "ingress-controller" { type = "system" group "consul-ingress-controller" { network { mode = "bridge" port "app" { static = 8080 to = 8080 } port "prometheus" { static = 8081 to = 8081 } port "grafana" { static = 3000 to = 3000 } port "envoy_metrics" { to = 9102 } } service { name = "consul-ingress-controller" port = "8080" meta { envoy_metrics_port = "${NOMAD_HOST_PORT_envoy_metrics}" } connect { gateway { proxy { config { envoy_prometheus_bind_addr = "0.0.0.0:9102" } } ingress { listener { port = 8080 protocol = "http" service { hosts = ["*"] name = "foo" } } listener { port = 8081 protocol = "http" service { hosts = ["*"] name = "prometheus" } } listener { port = 3000 protocol = "http" service { hosts = ["*"] name = "grafana" } } } } } } }}
Submit the job to Nomad.
$ nomad job run ingress-controller.nomad.hcl
Create the Prometheus job
The Prometheus job uses the template stanza to create the Prometheus configuration file.
It has the attr.unique.network.ip-address
attribute in the
consul_sd_config
section that allows Prometheus to use Consul to detect and scrape
targets automatically. It works in this example because the Consul client is
running on the same virtual machine as Nomad.
The relabel_configs
section lets you replace the default application port with the
dynamic envoy port to scrape data from.
The volumes attribute of the Nomad task
block takes the configuration file
that the template stanza dynamically creates and places it in the Prometheus container.
Create a file with the name prometheus.nomad.hcl
, add the following contents to it,
and save the file.
prometheus.nomad.hcl
job "prometheus" { type = "service" group "prometheus" { count = 1 network { mode = "bridge" port "expose" {} port "envoy_metrics" { to = 9102 } } restart { attempts = 2 interval = "30m" delay = "15s" mode = "fail" } ephemeral_disk { size = 300 migrate = true sticky = true } task "prometheus" { template { change_mode = "noop" destination = "local/prometheus.yml" data = <<EOH---global: scrape_interval: 5s evaluation_interval: 5s scrape_configs: - job_name: 'Consul Connect Metrics' metrics_path: "/metrics" consul_sd_configs: - server: "{{ env "attr.unique.network.ip-address" }}:8500" relabel_configs: - source_labels: [__meta_consul_service] action: drop regex: (.+)-sidecar-proxy - source_labels: [__meta_consul_service_metadata_envoy_metrics_port] action: keep regex: (.+) - source_labels: [__address__, __meta_consul_service_metadata_envoy_metrics_port] regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__EOH } driver = "docker" config { image = "prom/prometheus:latest" args = [ "--config.file=/local/prometheus.yml", "--storage.tsdb.path=/alloc/data", "--web.listen-address=0.0.0.0:9090", "--web.external-url=/", "--web.console.libraries=/usr/share/prometheus/console_libraries", "--web.console.templates=/usr/share/prometheus/consoles" ] volumes = [ "local/prometheus.yml:/etc/prometheus/prometheus.yml", ] } } service { name = "prometheus" port = "9090" check { name = "prometheus_ui port alive" expose = true type = "http" path = "/-/healthy" interval = "10s" timeout = "2s" } connect { sidecar_service {} } } }}
Submit the job to Nomad.
$ nomad job run prometheus.nomad.hcl
Create the Grafana job
Create a file with the name grafana.nomad.hcl
, add the following contents to it,
and save the file.
grafana.nomad.hcl
job "grafana" { group "grafana" { count = 1 network { mode = "bridge" port "expose" {} } service { name = "grafana" port = "3000" meta { metrics_port = "${NOMAD_HOST_PORT_expose}" } check { expose = true type = "http" name = "grafana" path = "/api/health" interval = "30s" timeout = "10s" } connect { sidecar_service { proxy { expose { path { path = "/metrics" protocol = "http" local_path_port = 9102 listener_port = "expose" } } upstreams { destination_name = "prometheus" local_bind_port = 9090 } } } } } task "grafana" { driver = "docker" config { image = "grafana/grafana:latest" volumes = [ "local/provisioning/prom.yml:/etc/grafana/provisioning/datasources/prometheus.yml" ] } env { GF_PATHS_CONFIG = "/local/config.ini" GF_PATHS_PROVISIONING = "/local/provisioning" } template { destination = "local/config.ini" data = <<EOF[database]type = sqlite3[server]EOF } template { destination = "local/provisioning/datasources/prom.yml" data = <<EOFapiVersion: 1 datasources:- name: Prometheus type: prometheus access: proxy url: http://localhost:9090 isDefault: true editable: falseEOF perms = "777" } } }}
Submit the job to Nomad.
$ nomad job run grafana.nomad.hcl
Access and configure Grafana
Grafana is available via the ingress gateway on port 3000
. Use the
nomad service info
command to get the IP address of the
client running Grafana.
$ nomad service info grafana Job ID Address Tags Node ID Alloc IDgrafana 192.168.50.210:3000 [] 94dabfe7 e797357e
The default username and password for Grafana are both admin
. Grafana requires a password change
on initial login. Choose and set a new password for the admin
user and make a note of it.
Deploy an envoy dashboard
An envoy clusters dashboard is available from the Grafana dashboard marketplace.
Navigate to the dashboards page, click on the New button, then click on Import.
Enter 11021
in the field with the placeholder text Grafana.com dashboard URL or ID, click
Load, then click Import to finish the process.
The dashboard displays aggregated Envoy health information and traffic flows.
Simulate traffic
Simulate traffic to the cluster by making requests to either
of the client nodes on port 8080
.
Open the dashboard in Grafana to see requests, connections, and traffic volume on the time series panels.
Next steps
In this tutorial, you deployed Grafana and Prometheus within the Consul service mesh, set up intentions, configured an ingress to enable access, and configured Consul service discovery to allow automatic scraping of targets in Prometheus.
For more information, check out the additional resources below.