Visualize metrics with Prometheus
This tutorial shows how to enable metrics collection for self-managed Boundary deployments like Boundary Enterprise and Boundary Community Edition using Prometheus, and configures Grafana to visualize metrics data.
Background
Visibility into Boundary workers and controllers plays an important role in ensuring the health of production deployments. Boundary 0.8 adds monitoring capabilities using the OpenMetric exposition format, enabling administrators to collect the data using tools like Prometheus and export it into a data visualizer.
This tutorial reviews metrics collection in dev mode, and demonstrates how to enable metrics in a more realistic deployment scenario. Prometheus is then used to gather metrics, and Grafana for visualization.
Tutorial Contents
- Get setup: dev mode
- Install Prometheus
- Examine Metrics
- Get setup: Docker Compose
- Configure Prometheus
- Configure Grafana
- Examine Graph Metrics
Prerequisites
Docker is installed
Docker-Compose is installed
Tip
Docker Desktop 20.10 and above include the Docker Compose binary, and does not require separate installation.
A Boundary binary greater than 0.8.0 in your
PATH
. This tutorial uses the 0.8.0 version of Boundary.Terraform 0.13.0 or greater in your
PATH
Access to prometheus.io/download
Get setup: dev mode
Metrics are available for controllers and workers in a Boundary deployment. To quickly view metrics data, you will first use Boundary's dev mode and deploy Prometheus locally. Later in the tutorial a Docker Compose environment will be used to visualize metrics with Grafana.
In production deployments, metrics are enabled in Boundary's config
file by declaring an "ops"
listener. This will be explored later on.
Start boundary dev
, which uses a configuration with a pre-defined ops listener.
$ boundary dev==> Boundary server configuration: [Controller] AEAD Key Bytes: cXte2+fkVq/mnQ/VKO3cOL0bYQZKqJsQhWgPLvX9VsY= [Recovery] AEAD Key Bytes: XGcczs8FJ7lIwd8PQJaP34go/ILiPIeMs+7anHkK+vE= [Worker-Auth] AEAD Key Bytes: Y9A1Gw4Ja+IJbFtuGTSXLIw3L+aEPcwEpN+/lRqvWIQ= [Recovery] AEAD Type: aes-gcm [Root] AEAD Type: aes-gcm [Worker-Auth] AEAD Type: aes-gcm Cgo: disabled Controller Public Cluster Addr: 127.0.0.1:9201 Dev Database Container: bold_heisenberg Dev Database Url: postgres://postgres:password@localhost:55001/boundary?sslmode=disable Generated Admin Login Name: admin Generated Admin Password: password Generated Host Catalog Id: hcst_1234567890 Generated Host Id: hst_1234567890 Generated Host Set Id: hsst_1234567890 Generated Oidc Auth Method Id: amoidc_1234567890 Generated Org Scope Id: o_1234567890 Generated Password Auth Method Id: ampw_1234567890 Generated Project Scope Id: p_1234567890 Generated Target Id: ttcp_1234567890 Generated Unprivileged Login Name: user Generated Unprivileged Password: password Listener 1: tcp (addr: "127.0.0.1:9200", cors_allowed_headers: "[]", cors_allowed_origins: "[*]", cors_enabled: "true", max_request_duration: "1m30s", purpose: "api") Listener 2: tcp (addr: "127.0.0.1:9201", max_request_duration: "1m30s", purpose: "cluster") Listener 3: tcp (addr: "127.0.0.1:9203", max_request_duration: "1m30s", purpose: "ops") Listener 4: tcp (addr: "127.0.0.1:9202", max_request_duration: "1m30s", purpose: "proxy") Log Level: info Mlock: supported: false, enabled: false Version: Boundary v0.8.0 Version Sha: 9b48dbc2fd4f9a9f0bda4ca68488590f681dbd9e+CHANGES Worker Public Proxy Addr: 127.0.0.1:9202 ==> Boundary server started! Log data will stream in below: { "id": "QH3NNVS84T", "source": "https://hashicorp.com/boundary/dev-controller/boundary-dev", "specversion": "1.0", "type": "system", "data": { "version": "v0.1", "op": "github.com/hashicorp/boundary/internal/observability/event.(*HclogLoggerAdapter).writeEvent", "data": { "@original-log-level": "none", "@original-log-name": "aws", "msg": "configuring client automatic mTLS" } }, "datacontentype": "text/plain", "time": "2022-04-19T13:38:37.377958-06:00"} ...... More output ......
Open a web browser and navigate to http://localhost:9203/metrics
. A list
of metrics events are available at this endpoint, reported in the OpenMetric
format. Monitoring solutions like Prometheus enable metrics reporting by
scraping the values from this endpoint.
Leave dev mode running in the current terminal session, and open a new terminal window or tab to continue the tutorial.
Install Prometheus
Prometheus is an open-source monitoring and alerting toolkit. It gathers and stores metrics reported in the OpenMetric exposition format.
From Prometheus's docs:
Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.
Download the latest Prometheus monitoring binary for your system. This tutorial uses version 2.35.0. Extract the archive to your local machine.
For example, on MacOS:
$ curl -OL https://github.com/prometheus/prometheus/releases/download/v2.35.0-rc0/prometheus-2.35.0-rc0.darwin-amd64.tar.gz
Next, unzip the archive:
$ tar -zxvf prometheus-2.35.0-rc0.darwin-amd64.tar.gz
Change into the extracted folder and view its contents:
$ cd prometheus-2.35.0-rc0.darwin-amd64/
$ ls -R1LICENSENOTICEconsole_librariesconsolesprometheusprometheus.ymlpromtool ./console_libraries:menu.libprom.lib ./consoles:index.html.examplenode-cpu.htmlnode-disk.htmlnode-overview.htmlnode.htmlprometheus-overview.htmlprometheus.html
The prometheus.yml
in this directory is Prometheus's configuration file, used
by the prometheus
binary.
Open prometheus.yml
in your text editor, and locate static_configs
under the
scrape_configs
block.
The targets
allow you to define the endpoints Prometheus should attempt to
scrape metrics from. Notice that localhost:9090
is already defined for
job_name: "prometheus"
. This is Prometheus's metrics endpoint, where it
collects data about itself.
For a simple look at metrics, add an additional target for localhost:9203
,
the standard metrics endpoint for Boundary.
# my global configglobal: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configurationalerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090", "localhost:9203"]
Save this file, and return to your terminal.
Examine metrics
With the configuration file targeting Boundary's /metrics
endpoint, Prometheus
can be started to begin monitoring.
Ensure you are located in the extracted directory. Execute Prometheus,
supplying the prometheus.yml
config file.
$ ./prometheus --config.file="prometheus.yml"ts=2022-04-19T20:52:52.324Z caller=main.go:488 level=info msg="No time or size retention was set so using the default time retention" duration=15dts=2022-04-19T20:52:52.324Z caller=main.go:525 level=info msg="Starting Prometheus" version="(version=2.35.0-rc0, branch=HEAD, revision=5b73e518260d8bab36ebb1c0d0a5826eba8fc0a0)"ts=2022-04-19T20:52:52.324Z caller=main.go:530 level=info build_context="(go=go1.18, user=root@56c744483836, date=20220408-13:08:41)"ts=2022-04-19T20:52:52.325Z caller=main.go:531 level=info host_details=(darwin)ts=2022-04-19T20:52:52.325Z caller=main.go:532 level=info fd_limits="(soft=256, hard=unlimited)"ts=2022-04-19T20:52:52.325Z caller=main.go:533 level=info vm_limits="(soft=unlimited, hard=unlimited)"ts=2022-04-19T20:52:52.327Z caller=web.go:541 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090ts=2022-04-19T20:52:52.327Z caller=main.go:957 level=info msg="Starting TSDB ..."ts=2022-04-19T20:52:52.328Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=falsets=2022-04-19T20:52:52.330Z caller=head.go:493 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"ts=2022-04-19T20:52:52.330Z caller=head.go:536 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=5.635µsts=2022-04-19T20:52:52.330Z caller=head.go:542 level=info component=tsdb msg="Replaying WAL, this may take a while"ts=2022-04-19T20:52:52.337Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=1ts=2022-04-19T20:52:52.338Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=1ts=2022-04-19T20:52:52.338Z caller=head.go:619 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=132.121µs wal_replay_duration=8.100601ms total_replay_duration=8.257285msts=2022-04-19T20:52:52.339Z caller=main.go:978 level=info fs_type=1ats=2022-04-19T20:52:52.339Z caller=main.go:981 level=info msg="TSDB started"ts=2022-04-19T20:52:52.339Z caller=main.go:1162 level=info msg="Loading configuration file" filename=prometheus.ymlts=2022-04-19T20:52:52.357Z caller=main.go:1199 level=info msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=17.705208ms db_storage=537ns remote_storage=1.164µs web_handler=817ns query_engine=673ns scrape=17.309393ms scrape_sd=32.689µs notify=33.186µs notify_sd=7.118µs rules=4.899µs tracing=16.002µsts=2022-04-19T20:52:52.357Z caller=main.go:930 level=info msg="Server is ready to receive web requests."
Open a web browser, and navigate to http://localhost:9090
. You will be
redirected to the /graph
page, where queries can be constructed and executed.
Type boundary
into the search box. Notice that autocomplete shows a scrollable
list of metrics that can be queried for the controller and worker.
The metric names are automatically populated from the values available at the
/metrics
endpoint.
As of Boundary 0.8, the available metrics include:
Controller metrics:
- HTTP request latency
- HTTP request size
- HTTP response size
- gRPC service latency
Worker metrics:
- Open proxy connections
- Sent bytes for proxying, handled by worker
- Received bytes for proxying, handled by worker
- Time elapsed before a header is written back to the end user
Other metrics:
- Build info for version details
This is an initial set of operational metrics and more will be added in the future. To learn more about specific metrics and how to access them, refer to the metrics documentation.
Execute a simple query by clicking on a metric value. For example, select
boundary_cluster_client_grpc_request_duration_seconds_sum
, and then click
Execute.
The
boundary_cluster_client_grpc_request_duration_seconds
metric reports latencies for requests made to the gRPC
service running on the cluster listener.
The returned results show the matching queries. Scrolling through the
Evaluation time allows for quick navigation through metrics reported at a
particular timestamp. Most metrics in this example will share the same timestamp
because they were generated when boundary dev
was executed.
Select the Graph view to show these metrics over time.
Additional queries to Boundary would produce more metrics. For example,
authenticating to Boundary as the admin user would produce more events reported
by the boundary_controller_api_http_request_size_bytes
metric.
$ boundary authenticate password -auth-method-id ampw_1234567890 -login-name adminPlease enter the password (it will be hidden): <password> Authentication information: Account ID: acctpw_1234567890 Auth Method ID: ampw_1234567890 Expiration Time: Tue, 26 Apr 2022 16:12:20 MDT User ID: u_1234567890 The token was successfully stored in the chosen keyring and is not displayed here.
Return to Prometheus, and then execute a query for
boundary_controller_api_http_request_size_bytes_sum
. This would report a
number of method="get"
requests to the API at a single point in time.
Also notice that several metrics are available for boundary_worker
. Boundary's
dev mode includes a controller and worker instance for testing. Next, metrics
will be examined using a more realistic Boundary deployment with Docker Compose.
When finished examining dev mode metrics, locate the correct shell session and
stop Prometheus using ctrl+c
.
$ ./prometheus --config.file="prometheus.yml"ts=2022-04-19T21:02:33.144Z caller=main.go:488 level=info msg="No time or size retention was set so using the default time retention" duration=15dts=2022-04-19T21:02:33.144Z caller=main.go:525 level=info msg="Starting Prometheus" version="(version=2.35.0-rc0, branch=HEAD, revision=5b73e518260d8bab36ebb1c0d0a5826eba8fc0a0)"...... More output ......ts=2022-04-19T21:02:33.495Z caller=main.go:930 level=info msg="Server is ready to receive web requests."^Cts=2022-04-19T22:30:17.168Z caller=main.go:796 level=warn msg="Received SIGTERM, exiting gracefully..."ts=2022-04-19T22:30:17.168Z caller=main.go:819 level=info msg="Stopping scrape discovery manager..."ts=2022-04-19T22:30:17.168Z caller=main.go:833 level=info msg="Stopping notify discovery manager..."ts=2022-04-19T22:30:17.168Z caller=main.go:815 level=info msg="Scrape discovery manager stopped"ts=2022-04-19T22:30:17.168Z caller=main.go:829 level=info msg="Notify discovery manager stopped"ts=2022-04-19T22:30:17.168Z caller=main.go:855 level=info msg="Stopping scrape manager..."ts=2022-04-19T22:30:17.169Z caller=main.go:849 level=info msg="Scrape manager stopped"ts=2022-04-19T22:30:17.169Z caller=manager.go:950 level=info component="rule manager" msg="Stopping rule manager..."ts=2022-04-19T22:30:17.170Z caller=manager.go:960 level=info component="rule manager" msg="Rule manager stopped"ts=2022-04-19T22:30:17.270Z caller=notifier.go:600 level=info component=notifier msg="Stopping notification manager..."ts=2022-04-19T22:30:17.271Z caller=main.go:1088 level=info msg="Notifier manager stopped"ts=2022-04-19T22:30:17.271Z caller=main.go:1100 level=info msg="See you next time!"
Lastly, locate the terminal session where boundary dev
was executed, and
stop the dev server using ctrl+c
.
$ boundary dev...... More output ......{ "id": "p25tDGirn0", "source": "https://hashicorp.com/boundary/dev-controller/boundary-dev", "specversion": "1.0", "type": "observation", "data": { "latency-ms": 192.433153, "request_info": { "id": "gtraceid_MUGDhzXB7BNaGYcnzCTp", "method": "POST", "path": "/v1/auth-methods/ampw_1234567890:authenticate", "client_ip": "127.0.0.1" }, "start": "2022-04-19T16:12:19.947235-06:00", "status": 200, "stop": "2022-04-19T16:12:20.139673-06:00", "version": "v0.1" }, "datacontentype": "text/plain", "time": "2022-04-19T16:12:20.139694-06:00"}^C==> Boundary dev environment shutdown triggered, interrupt again to force==> Health is enabled, waiting 0s before shutdown{ "id": "EEe0SywUPE", "source": "https://hashicorp.com/boundary/dev-controller/boundary-dev", "specversion": "1.0", "type": "system", "data": { "version": "v0.1", "op": "worker.(Worker).startStatusTicking", "data": { "msg": "status ticking shutting down" } }, "datacontentype": "text/plain", "time": "2022-04-19T16:32:13.94708-06:00"}...... More output ......
Get setup: Docker Compose
The demo environment provided for this tutorial includes a Docker Compose cluster that deploys these containers:
- A Boundary 0.8.0 controller server
- A Boundary database
- 1 worker instance
- 1 postgres database target
- Prometheus
- Grafana
The Terraform Boundary
Provider is
also used in this tutorial to easily provision resources using Docker, and must
be available in your PATH
when deploying the demo environment.
To learn more about the various Boundary components, refer back to the Start a Development Environment tutorial.
Deploy the lab environment
The lab environment can be downloaded or cloned from the following Github repository:
In your terminal, clone the repository to get the example files locally:
$ git clone git@github.com:hashicorp-education/learn-boundary-prometheus-metrics.git
Move into the
learn-boundary-prometheus-metrics
folder.$ cd learn-boundary-prometheus-metrics
Ensure that you are in the correct directory by listing its contents.
$ ls -R1README.mdcomposedeployterraform ./compose:controller.hcldatasource.ymldocker-compose.ymlprometheus.ymlworker.hcl ./terraform:main.tfoutputs.tfversions.tf
The repository contains the following files:
deploy
: A script used to deploy and tear down the Docker-Compose configuration.compose/docker-compose.yml
: The Docker-Compose configuration file describing how to provision and network the boundary cluster.compose/controller.hcl
: The controller configuration file.compose/worker.hcl
: The worker configuration file.compose/prometheus.yml
: The Prometheus configuration file.terraform/main.tf
: The terraform provisioning instructions using the Boundary provider.terraform/outputs.tf
: The terraform outputs file for printing user connection details.
This tutorial makes it easy to launch the test environment with the
deploy
script.$ ./deploy all~/learn-boundary-prometheus-metrics-dev/compose ~/ learn-boundary-prometheus-metrics-devCreating boundary_postgres_1 ... doneCreating boundary_prometheus_1 ... doneCreating boundary_db_1 ... doneCreating boundary_grafana_1 ... doneCreating boundary_db-init_1 ... doneCreating boundary_controller_1 ... doneCreating boundary_worker_1 ... done~/learn-boundary-prometheus-metrics-dev~/learn-boundary-prometheus-metrics-dev/terraform ~/ learn-boundary-prometheus-metrics-dev Initializing the backend... Initializing provider plugins...- Reusing previous version of hashicorp/boundary from the dependency lock file- Using previously-installed hashicorp/boundary v1.0.5 Terraform has been successfully initialized!...... truncated output ...... Plan: 18 to add, 0 to change, 0 to destroy. Changes to Outputs: + username = { + user1 = { + auth_method_id = (known after apply) + description = "User account for user1" + id = (known after apply) + login_name = "user1" + name = "user1" + password = "password" + type = "password" } }boundary_scope.global: Creating...boundary_scope.global: Creation complete after 1s [id=global]boundary_scope.org: Creating...boundary_role.global_anon_listing: Creating...boundary_scope.org: Creation complete after 0s [id=o_fEgxPvWKif]boundary_auth_method.password: Creating...boundary_scope.project: Creating...boundary_role.org_anon_listing: Creating...boundary_auth_method.password: Creation complete after 0s [id=ampw_7tKwNDUNse]boundary_account.user["user1"]: Creating...boundary_scope.project: Creation complete after 0s [id=p_RzPqkBmIkV]boundary_host_catalog.databases: Creating...boundary_account.user["user1"]: Creation complete after 1s [id=acctpw_bKVv9ty2n3]boundary_user.user["user1"]: Creating...boundary_host_catalog.databases: Creation complete after 1s [id=hcst_lweuQ5p4ts]boundary_host.postgres: Creating...boundary_host.localhost: Creating...boundary_role.global_anon_listing: Creation complete after 2s [id=r_9oRueUGB2k]boundary_host.localhost: Creation complete after 1s [id=hst_DRS2Xk6qn2]boundary_host_set.local: Creating...boundary_host.postgres: Creation complete after 1s [id=hst_I3f2FL6zSH]boundary_host_set.postgres: Creating...boundary_user.user["user1"]: Creation complete after 2s [id=u_191M9UTSVT]boundary_role.proj_admin: Creating...boundary_role.org_admin: Creating...boundary_host_set.local: Creation complete after 2s [id=hsst_rFHhQkPlU0]boundary_target.db: Creating...boundary_target.ssh: Creating...boundary_host_set.postgres: Creation complete after 2s [id=hsst_6Avhk65GDr]boundary_target.postgres: Creating...boundary_role.org_anon_listing: Creation complete after 4s [id=r_2tccCCSFRc]boundary_target.ssh: Creation complete after 2s [id=ttcp_BTpkLLLAFq]boundary_target.db: Creation complete after 2s [id=ttcp_WN873v1rD2]boundary_target.postgres: Creation complete after 2s [id=ttcp_wcuAUCNNRs]boundary_role.proj_admin: Creation complete after 4s [id=r_QjVLNWtd6p]boundary_role.org_admin: Creation complete after 4s [id=r_Lg6irpogWJ]╷│ Warning: Argument is deprecated││ with boundary_account.user,│ on main.tf line 69, in resource "boundary_account" "user":│ 69: login_name = lower(each.key)││ Will be removed in favor of using attributes parameter││ (and 14 more similar warnings elsewhere)╵ Apply complete! Resources: 18 added, 0 changed, 0 destroyed. Outputs: username = { "user1" = { "auth_method_id" = "ampw_7tKwNDUNse" "description" = "User account for user1" "id" = "acctpw_bKVv9ty2n3" "login_name" = "user1" "name" = "user1" "password" = "password" "type" = "password" }}
Any resource deprecation warnings in the output can safely be ignored.
The Boundary user login details are printed in the shell output, and can also be viewed by inspecting the
terraform/terraform.tfstate
file. You will need the user1auth_method_id
to authenticate via the CLI and establish sessions later on.You can tear down the environment at any time by executing
./deploy cleanup
.To verify that the environment deployed correctly, print the running docker containers and notice the ones named with the prefix "boundary".
$ docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Image}}"CONTAINER ID NAMES IMAGEce337fdfcaeb boundary_worker_1 hashicorp/boundary:latest-0c8dd578e36448e4b32db boundary_controller_1 hashicorp/boundary:latest-0c8dd578e7819c107cfbf boundary_grafana_1 grafana/grafanafa44393130fc boundary_prometheus_1 prom/prometheus782561cf257d boundary_postgres_1 postgresda9187b717ec boundary_db_1 postgres
The next part of this tutorial focuses on the relationship between the controller, worker, the Prometheus metrics server, and Grafana.
Enable metrics
Both the controller and worker instances can be configured to report metrics at
their /metrics
endpoint.
To enable metrics, a tcp
listener with the "ops"
purpose must be defined in
the server configuration file:
listener "tcp" { purpose = "ops" tls_disable = true}
This example is the minimum needed to enable metrics. To expose the /metrics
endpoint to Prometheus, a port should be specified in the listener configuration
as well.
Open the controller configuration file, compose/controller.hcl
. Uncomment
lines 26 - 30.
compose/controller.yml
disable_mlock = true controller { name = "docker-controller" description = "A controller for a docker demo!" address = "boundary" database { url = "env://BOUNDARY_PG_URL" }} listener "tcp" { address = "0.0.0.0:9200" purpose = "api" tls_disable = true cors_enabled = true cors_allowed_origins = ["*"]} listener "tcp" { address = "boundary:9201" purpose = "cluster" tls_disable = true} listener "tcp" { address = "0.0.0.0:9203" purpose = "ops" tls_disable = true} kms "aead" { purpose = "root" aead_type = "aes-gcm" key = "sP1fnF5Xz85RrXyELHFeZg9Ad2qt4Z4bgNHVGtD6ung=" key_id = "global_root"} kms "aead" { purpose = "worker-auth" aead_type = "aes-gcm" key = "8fZBjCUfN0TzjEGLQldGY4+iE9AkOvCfjh7+p0GtRBQ=" key_id = "global_worker-auth"} kms "aead" { purpose = "recovery" aead_type = "aes-gcm" key = "8fZBjCUfN0TzjEGLQldGY4+iE9AkOvCfjh7+p0GtRBQ=" key_id = "global_recovery"}
Adding this listener block to a Boundary server will enable metrics collection
at the address 0.0.0.0:9203
. Save this file.
In the compose/docker-compose.yml
file, notice that the controller
container
maps port 9203
to the localhost's 9203
.
compose/docker-compose.yml
controller: image: hashicorp/boundary:latest-0c8dd578e entrypoint: sh -c "sleep 3 && exec boundary server -config /boundary/controller.hcl -log-level debug" volumes: - "${PWD}/:/boundary/" hostname: boundary ports: - "9200:9200" - "9201:9201" - "9203:9203" environment: - HOSTNAME=boundary - BOUNDARY_PG_URL=postgresql://postgres:postgres@db/boundary?sslmode=disable depends_on: - db-init - prometheus networks: - default - worker
With this configuration in place, restart the controller.
$ docker restart boundary_controller_1boundary_controller_1
Note
Depending on the OS and Docker installation method, Compose may
name the containers differently. If the controller name does not match, list the
running containers with docker ps
and restart the controller using the listed
name (such as boundary-controller_1
).
Open your web browser, and visit http://localhost:9203/metrics
. You will find
a list of controller metrics, similar to what was displayed at this endpoint
when running boundary dev
earlier.
Unlike in dev mode, only the controller-related metrics are shown, including:
boundary_controller_api_http_request_duration_second
boundary_controller_api_http_request_size_bytes
boundary_controller_api_http_response_size_bytes
boundary_controller_cluster_grpc_request_duration_seconds
and other Go-related and generic Prometheus metrics.
Next, follow the same procedure to enable metrics on the worker.
Open the worker configuration file, compose/worker.hcl
. Uncomment lines 11 -
15.
compose/worker.yml
// compose/worker.hcl disable_mlock = true listener "tcp" { address = "worker" purpose = "proxy" tls_disable = true} listener "tcp" { address = "0.0.0.0:9203" purpose = "ops" tls_disable = true} worker { name = "worker" description = "A worker for a docker demo" address = "worker" public_addr = "127.0.0.1:9202" controllers = ["boundary:9201"]} kms "aead" { purpose = "worker-auth" aead_type = "aes-gcm" key = "8fZBjCUfN0TzjEGLQldGY4+iE9AkOvCfjh7+p0GtRBQ=" key_id = "global_worker-auth"}
Adding this listener block to a Boundary server will enable metrics collection
at the address 0.0.0.0:9203
. Save this file.
In the compose/docker-compose.yml
file, notice that the worker
container
maps port 9203
to the localhost's 9204
. This is to prevent a port collision
on 9203
, where the controller metrics are being reported.
compose/docker-compose.yml
worker: image: hashicorp/boundary:latest-0c8dd578e command: ["server", "-config", "/boundary/worker.hcl", "-log-level", "debug"] volumes: - "${PWD}/:/boundary/" hostname: worker ports: - "9202:9202" - "9204:9203" environment: - HOSTNAME=worker depends_on: - controller networks: - default - worker
With this configuration in place, restart the worker.
$ docker restart boundary_worker_1boundary_worker_1
Note
Depending on the OS and Docker installation method, Compose may
name the containers differently. If the worker name does not match, list the
running containers with docker ps
and restart the controller using the listed
name (such as boundary-worker_1
).
Open your web browser, and visit http://localhost:9204/metrics
. You will find
a list of worker metrics. Unlike the controller, the metrics reported for the
worker mostly contain:
boundary_cluster_client_grpc_request_duration_seconds
boundary_worker_proxy_http_write_header_duration_seconds
boundary_worker_proxy_websocket_active_connections
boundary_worker_proxy_websocket_sent_bytes
and other Go-related and generic Prometheus metrics.
Configure Prometheus
With metrics enabled on the controller and worker servers, Prometheus is ready to be configured.
Open the compose/prometheus.yml
configuration file. This tutorial pre-defines
the Prometheus jobs under the scrape_configs
section:
compose/prometheus.yml
# my global configglobal: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configurationalerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape:# Here it's Prometheus itself.scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"] - job_name: "controller" static_configs: - targets: ["boundary:9203"] - job_name: "worker" static_configs: - targets: ["worker:9203"]
Prometheus collects metrics about itself on localhost:9090
.
For the controller
job, Prometheus listens on the boundary
host at port
9203
. Similarly, the worker
job listens on the worker
host at port 9203
.
Remember that the Prometheus container is within the Docker Compose deployment,
so it is listening on the host's exposed ops
port, not the forwarded port on
your machine's localhost.
This configuration is already correct. Open your web browser and navigate to
http://localhost:9090
to view the Prometheus dashboard.
This dashboard is the same as when you deployed Prometheus locally earlier when
running boundary dev
. Check that various boundary_controller_
and
boundary_worker_
metrics are available within the Expression search box.
Execute a query for the controller and worker to ensure metrics are properly
enabled.
With Prometheus successfully reporting metrics, a more robust visualization tool can be used to explore Boundary metrics.
Configure Grafana
Grafana is an observability tool commonly integrated with Prometheus. The tool enables a more robust view of metrics than can be easily created using Prometheus alone.
This tutorial uses the Grafana Docker
image, but Grafana
Cloud can also be integrated with minimal
additional configuration using a remote_write
block added to the
compose/prometheus.yml
file.
Examine the Grafana configuration on lines 85 - 95 in the
compose/docker-compose.yml
file.
compose/docker-compose.yml
grafana: image: grafana/grafana volumes: - "${PWD}/datasource.yml:/etc/grafana/provisioning/datasources/prometheus_datasource.yml" hostname: grafana ports: - "3000:3000" depends_on: - prometheus networks: - default
The compose/datasource.yml
file is copied to the grafana
container into
/etc/grafana/provisioning/datasources/
, the default config directory for
Grafana. When Grafana is started it automatically loads this file.
The grafana
container is mapped to port 3000
on your localhost.
Open compose/datasource.yml
.
compose/datasource.yml
# config file versionapiVersion: 1 datasources:- name: boundary type: prometheus access: server orgId: 1 url: http://prometheus:9090 password: user: database: basicAuth: basicAuthUser: basicAuthPassword: withCredentials: isDefault: jsonData: graphiteVersion: "1.1" tlsAuth: false tlsAuthWithCACert: false secureJsonData: tlsCACert: "" tlsClientCert: "" tlsClientKey: "" version: 1 editable: true
This basic Grafana configuration specifies a datasource named boundary
, of
type prometheus
. The url: http://prometheus:9090
specifies the default
location where Prometheus is running, and serves as the source of metrics for
Grafana.
This configuration is already correct. Open your web browser and navigate to
http://localhost:3000
to view the Grafana dashboard.
Login using the default Grafana credentials:
- Email or username:
admin
- Password:
admin
Skip the prompt to create a new password to continue to the dashboard.
Examine graph metrics
From the Grafana Home, open the settings menu and select Data Sources.
Within the Configuration menu boundary should be listed under Data
sources. If there are no data sources, check that no changes were made to the
compose/datasource.yml
file, where this config is sourced from.
Click on the boundary data source to view its settings.
Open the Dashboards page.
Click Import for the Prometheus 2.0 Stats and Grafana metrics dashboards.
After importing, select Search dashboards from the left-hand navigation.
Open the Prometheus 2.0 Stats dashboard.
This standard Prometheus dashboard shows generic metrics, such as a Memory Profile. Explore these metrics in more detail by clicking on any panel and selecting View.
A simple dashboard compiling the controller and worker metrics has been provided
for this tutorial in the
learn-boundary-prometheus-metrics-dev/compose/Boundary-Overview-Dashboard.json
file.
To load this dashboard, open the Create menu from the sidebar and select Import.
Next, click Upload JSON file and select the
learn-boundary-prometheus-metrics-dev/compose/Boundary-Overview-Dashboard.json
file.
Click on Select a Prometheus data source and select boundary. When finished, click Import.
You will be redirected to the imported Boundary Overview Dashboard. The relevant Boundary metrics have been organized into simple panels that overlay common renderings of the same metric value, such as the panel:
boundary_controller_api_http_request_duration_seconds
boundary_controller_api_http_request_duration_seconds_bucket
boundary_controller_api_http_request_duration_seconds_count
boundary_controller_api_http_request_duration_seconds_sum
Note
There are many ways to organize these metrics values. The provided sample dashboard is not intended to recommend any best practices on metrics visualization or monitoring in general.
Return to a terminal session and proceed to make various queries and requests to
Boundary. You will need the user1
auth_method_id
from deploying the lab
environment to authenticate. The user1
password is
password
.
$ boundary authenticate password -auth-method-id ampw_3c0u8u2wsv -login-name user1Please enter the password (it will be hidden): <password> Authentication information: Account ID: acctpw_vuC8ttaT7q Auth Method ID: ampw_3c0u8u2wsv Expiration Time: Thu, 28 Apr 2022 19:03:48 MDT User ID: u_lLN5ajmjjR The token was successfully stored in the chosen keyring and is not displayed here.
Next, examine the Boundary deployment by querying for various resources, such as listing targets and reading target details.
$ boundary targets list -recursive Target information: ID: ttcp_OVdplA2NOD Scope ID: p_4zconn633N Version: 2 Type: tcp Name: postgres Description: postgres server Authorized Actions: no-op read update delete add-host-sources set-host-sources remove-host-sources add-credential-libraries set-credential-libraries remove-credential-libraries add-credential-sources set-credential-sources remove-credential-sources authorize-session ID: ttcp_4AvJDpwe5C Scope ID: p_4zconn633N Version: 2 Type: tcp Name: ssh Description: SSH server Authorized Actions: no-op read update delete add-host-sources set-host-sources remove-host-sources add-credential-libraries set-credential-libraries remove-credential-libraries add-credential-sources set-credential-sources remove-credential-sources authorize-session ID: ttcp_LBkMY9JVbI Scope ID: p_4zconn633N Version: 2 Type: tcp Name: boundary-db Description: Boundary Postgres server Authorized Actions: no-op read update delete add-host-sources set-host-sources remove-host-sources add-credential-libraries set-credential-libraries remove-credential-libraries add-credential-sources set-credential-sources remove-credential-sources authorize-session
Additionally, you can log into the postgres
target to establish an active
session. The postgres user's password is password
.
$ boundary connect postgres -target-name postgres -target-scope-name databases -username postgresPassword for user postgres:psql (14.2, server 13.2 (Debian 13.2-1.pgdg100+1))Type "help" for help. postgres=#
Note that the postgres target sessions are automatically cancelled after 5
minutes, as defined in terraform/main.tf
. To exit the postgres session, enter
\q
into the postgres=#
prompt.
After interacting with Boundary, return to the Grafana dashboard. You will notice various metrics being reported and graphed over time as more are available.
Examine health check endpoint
Boundary 0.8 also introduces a health check endpoint for the controller.
Like metrics, the health endpoint is enabled when a listener with the "ops"
purpose is defined, by default on port 9203
. This configuration enables the
/health
endpoint where the controller's overall status can be monitored.
Health checks are critical for load-balanced Boundary deployments, and situations where a shutdown grace period is needed.
The new controller health service introduces a single read-only endpoint:
Status | Description |
---|---|
200 | GET /health returns HTTP status 200 OK if the controller's api gRPC Server is up |
5xx | GET /health returns HTTP status 5XX or request timeout if unhealthy |
503 | GET /health returns HTTP status 503 Service Unavailable status if the controller is shutting down |
All responses return empty bodies. GET /health
does not support any input.
Querying this in a terminal session using curl
, Invoke-WebRequest
or wget
returns a 200
response when the controller is healthy.
$ curl -i http://localhost:9203/healthHTTP/1.1 200 OKCache-Control: no-storeContent-Type: application/jsonGrpc-Metadata-Content-Type: application/grpcDate: Fri, 22 Apr 2022 02:10:02 GMTContent-Length: 2
Cleanup and teardown
The Boundary cluster containers and network resources can be cleaned up
using the provided deploy
script.
$ ./deploy cleanup~/target-aware-workers/compose ~/target-aware-workers~/learn-boundary-prometheus-metrics-dev/compose ~/learn-boundary-prometheus-metrics-devStopping boundary_worker_1 ... doneStopping boundary_controller_1 ... doneStopping boundary_grafana_1 ... doneStopping boundary_prometheus_1 ... doneStopping boundary_db_1 ... doneStopping boundary_postgres_1 ... doneGoing to remove boundary_worker_1, boundary_controller_1, boundary_db-init_1, boundary_grafana_1, boundary_prometheus_1, boundary_db_1, boundary_postgres_1Removing boundary_worker_1 ... doneRemoving boundary_controller_1 ... doneRemoving boundary_db-init_1 ... doneRemoving boundary_grafana_1 ... doneRemoving boundary_prometheus_1 ... doneRemoving boundary_db_1 ... doneRemoving boundary_postgres_1 ... done~/learn-boundary-prometheus-metrics-dev~/learn-boundary-prometheus-metrics-dev/terraform ~/learn-boundary-prometheus-metrics-dev~/learn-boundary-prometheus-metrics-dev
Check your work with a quick docker ps
and ensure there are no more containers
with the boundary_
prefix leftover. If unexpected containers still exist,
execute docker rm -f CONTAINER_NAME
against each to remove them.