Nomad APM Plugin
The Nomad APM plugin allows querying the Nomad API for metric data. This provides an immediate starting point without addition applications but comes at the price of efficiency and flexibility. When using this APM, it is advised to monitor Nomad carefully ensuring it is not put under excessive load pressure.
Since Nomad does not store historical metric data and only reports instant
metrics values, the Nomad APM plugin does not support the query_window
policy option and is not able to aggregate metrics over time, such as computing
averages or percentiles. An external APM is required to support uses cases like
these.
The Nomad APM plugin should only be used when scaling based on CPU and memory usage. For more advanced scenarios, such as scaling a cluster to zero clients, you should use a different APM plugin.
Agent Configuration Options
apm "nomad-apm" { driver = "nomad-apm"}
When using a Nomad cluster with ACLs enabled, following ACL policy will provide the appropriate permissions for obtaining task group metrics:
namespace "default" { policy = "read" capabilities = ["read-job"]}
In order to obtain cluster level metrics, the following ACL policy will be required:
node { policy = "read"} namespace "default" { policy = "read" capabilities = ["read-job"]}
Policy Configuration Options - Task Groups
The Nomad APM allows querying Nomad to understand the current resource usage of a task group.
check { source = "nomad-apm" query = "avg_cpu" # ...}
Querying Nomad task group metrics is be done using the <operation>_<metric>
syntax, where valid operations are:
avg
- returns the average of the metric value across allocations in the task group.min
- returns the lowest metric value among the allocations in the task group.max
- returns the highest metric value among the allocations in the task group.sum
- returns the sum of all the metric values for the allocations in the task group.
The metric value can be:
cpu
- CPU usage as reported by thenomad.client.allocs.cpu.total_percent
metric.cpu-allocated
- the percentage of CPU used out of the total CPU allocated for the allocation.memory
- Memory usage as reported by thenomad.client.allocs.memory.usage
metric.memory-allocated
- the percentage of memory used out of the total memory allocated for the allocation.
Policy Configuration Options - Client Nodes
The Nomad APM allows querying Nomad to understand the current allocated resource as a percentage of the total available.
Note: When using the Nomad APM plugin for cluster scaling, your policy target
and
all Nomad clients intended to be targeted by the policy must have a
node_class
defined. Nodes without node_class
are evaluated using the
default class value autoscaler-default-pool
.
policy { # ... check { source = "nomad-apm" query = "percentage-allocated_cpu" # ... } target "..." { # ... node_class = "autoscale" # .. }}
Querying Nomad client node metrics is be done using the <operation>_<metric>
syntax, where valid operations are:
percentage-allocated
- returns the allocated percentage of the desired resource.
The metric value can be: