Secure Nomad jobs with Consul service mesh
Nomad's first-class integration with Consul allows operators to design jobs that natively leverage Consul service mesh and transparent proxy. In this tutorial, you will learn how to configure Consul to allow access between workloads and run a sample Consul service mesh workload.
Prerequisites
- Nomad v1.8.0 or greater. All recent versions of Nomad support Consul service mesh, but only Nomad 1.8.0 or greater supports the transparent proxy block demonstrated here.
Create the Consul and Nomad clusters
This section uses development agents for Consul and Nomad as a quick way to get started. Development agents have ephemeral state and should not be used in production environments. They run in the foreground of your terminal so be sure not to close the terminal window or the agent configuration steps will need to be rerun again.
This setup uses hard-coded tokens in the Consul configuration. We do not recommend this method for production clusters. Follow the Secure Consul with ACLs tutorial to configure production Consul clusters.
Create a directory for the tutorial on your local machine, change into that
directory, and create a file named consul.hcl
to store the Consul agent
configuration. Add the following contents to it and replace the two instances of
the placeholder "<< IP ADDRESS >>"
with the IP address of your
local machine. Save the file.
consul.hcl
datacenter = "dc1"bind_addr = "<< IP ADDRESS >>"client_addr = "0.0.0.0"recursors = ["1.1.1.1", "1.1.0.0"] addresses { dns = "<< IP ADDRESS >>"} acl { enabled = true tokens { initial_management = "consul-root-token" agent = "consul-root-token" }} ports { grpc = 8502 dns = 8600}
Start a Consul dev agent using consul.hcl
as the configuration file.
$ consul agent -dev -config-file 'consul.hcl'==> Starting Consul agent... Version: '1.17.0' Build Date: '2023-11-03 14:56:56 +0000 UTC' Node ID: 'f9625116-3884-cd0b-01fa-2bc2e0d9a69d'...
Open another terminal window in the same directory and set the Consul management token as an environment variable.
$ export CONSUL_HTTP_TOKEN=consul-root-token
Create a file named consul-policy-nomad-agents.hcl
to store the Consul ACL
rules that grant the necessary permissions to Nomad agents. Add the following
contents to it and save the file.
consul-policy-nomad-agents.hcl
agent_prefix "" { policy = "read"} node_prefix "" { policy = "read"} service_prefix "" { policy = "write"}
Create a Consul ACL policy named nomad-agents
with the rules defined in the
consul-policy-nomad-agents.hcl
file.
$ consul acl policy create -name 'nomad-agents' -description 'Policy for Nomad agents' -rules '@consul-policy-nomad-agents.hcl'ID: 7a0fe00b-f7e6-809c-2227-bb0638b873bdName: nomad-agentsDescription: Policy for Nomad agentsDatacenters:Rules:agent_prefix "" { policy = "read"} node_prefix "" { policy = "read"} service_prefix "" { policy = "write"}
Create a Consul ACL token for the Nomad agent using the nomad-agents
ACL
policy.
$ consul acl token create -policy-name 'nomad-agents'AccessorID: 3f436657-823a-95e3-4755-79f3e1e43c8eSecretID: df179fd2-3211-3641-5901-a57331c14611Description:Local: falseCreate Time: 2023-11-15 18:23:39.572365 -0500 ESTPolicies: a5ee20ed-7158-89be-9a19-be213d106d24 - nomad-agents
Save the value of SecretID
for the Consul ACL token.
Download the consul-cni
CNI
plugin. Unzip the archive and move the consul-cni
binary to wherever you
install the CNI reference plugins as described in Nomad's Post Installation
Steps. A commonly used path is
/opt/cni/bin
.
Create a file named nomad.hcl
. Add the following contents to it and replace the
placeholder <Consul token SecretID>
text with the value of SecretID
. Save the file.
nomad.hcl
acl { enabled = true} consul { address = "127.0.0.1:8500" token = "<Consul token SecretID>" service_identity { aud = ["consul.io"] ttl = "1h" } task_identity { aud = ["consul.io"] ttl = "1h" }}
Open another terminal window in the same directory and start the Nomad dev
agent using nomad.hcl
as the configuration file.
$ sudo nomad agent -dev -dev-connect -config 'nomad.hcl'==> Loaded configuration from nomad.hcl==> Starting Nomad agent...==> Nomad agent configuration: Advertise Addrs: HTTP: 192.168.1.170:4646; RPC: 192.168.1.170:4647; Serf: 192.168.1.170:4648 Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648 Client: true Log Level: DEBUG Node Id: a59f4059-4453-dd16-be81-6934962435f1 Region: global (DC: dc1) Server: true Version: 1.8.0 ==> Nomad agent started! Log data will stream in below:...
Bootstrap the Nomad ACL system.
$ nomad acl bootstrapAccessor ID = d1de8625-8556-0932-a25c-3aa71bfc0134Secret ID = 7f10099a-936c-3f3a-8783-f0980493e54bName = Bootstrap TokenType = managementGlobal = trueCreate Time = 2023-11-16 01:09:26.565422 +0000 UTCExpiry Time = <none>Create Index = 23Modify Index = 23Policies = n/aRoles = n/a
Copy the value of Secret ID
and set it as the environment variable
NOMAD_TOKEN
.
$ export NOMAD_TOKEN=...
Use the nomad setup
command to configure Consul to use Nomad workload
identity. The nomad.hcl
file contains the recommended configuration set by
the command.
$ nomad setup consul -y...
Verify Nomad client Consul configuration
Verify that the Nomad client nodes detect Consul and the correct configuration
for Consul service mesh using nomad node status -verbose
. Specifically,
verify that consul.connect
is true
, consul.dns.addr
and consul.dns.port
are enabled (not set to -1
), and plugins.cni.version.consul-cni
is showing
version 1.4.2 or above.
$ nomad node status -verbose fdb32e59 | grep consulconsul.connect = trueconsul.datacenter = dc1consul.dns.addr = 192.168.1.170consul.dns.port = 8600consul.ft.namespaces = trueconsul.grpc = 8502consul.partition = defaultconsul.revision = d6969061consul.sku = entconsul.version = 1.16.0+entplugins.cni.version.consul-cni = v1.4.2
The Nomad client agents will periodically fingerprint the Consul agent every few minutes. Restarting the Nomad agent will force Nomad to immediately detect any changes.
Alternative architectures (non-x86/amd64)
If you are using an ARM or another non-x86/amd64 architecture, go to the Alternative Architectures section of this tutorial for additional setup details.
Run a connect
-enabled job
Create the job specification
Create the "countdash" job by copying this job specification into a file named
countdash.nomad.hcl
. Save the file.
countdash.nomad.hcl
job "countdash" { group "api" { network { mode = "bridge" } service { name = "count-api" port = "9001" check { type = "http" path = "/health" expose = true interval = "3s" timeout = "1s" check_restart { limit = 0 } } connect { sidecar_service { proxy { transparent_proxy {} } } } } task "web" { driver = "docker" config { image = "hashicorpdev/counter-api:v3" auth_soft_fail = true } } } group "dashboard" { network { mode = "bridge" port "http" { static = 9002 to = 9002 } } service { name = "count-dashboard" port = "9002" check { type = "http" path = "/health" expose = true interval = "3s" timeout = "1s" check_restart { limit = 0 } } connect { sidecar_service { proxy { transparent_proxy {} } } } } task "dashboard" { driver = "docker" env { COUNTING_SERVICE_URL = "http://count-api.virtual.consul" } config { image = "hashicorpdev/counter-dashboard:v3" auth_soft_fail = true } } }}
Create a service intention
In Consul, the default intention behavior is defined by the default ACL policy. If the default ACL policy is "allow all", then all service mesh connections are allowed by default. If the default ACL policy is "deny all", then all service mesh connections are denied by default.
To avoid unexpected behavior around this, it is better to create an explicit intention. Create an intention to allow traffic from the count-dashboard service to the count-api service.
First, create a file for a config entry definition named intention-config.hcl
.
intention-config.hcl
Kind = "service-intentions"Name = "count-api"Sources = [ { Name = "count-dashboard" Action = "allow" }]
Initialize the intention rules. For more information, check out the Service Intentions documentation.
$ consul config write intention-config.hclConfig entry written: service-intentions/count-api
Run the job
Run the job by calling nomad run countdash.nomad.hcl
.
The command will output the result of running the job and show the allocation IDs of the two new allocations that are created.
$ nomad run countdash.nomad.hcl==> Monitoring evaluation "3e7ebb57" Evaluation triggered by job "countdash" Evaluation within deployment: "9eaf6878" Allocation "012eb94f" created: node "c0e8c600", group "api" Allocation "02c3a696" created: node "c0e8c600", group "dashboard" Evaluation status changed: "pending" -> "complete"==> Evaluation "3e7ebb57" finished with status "complete"
Get the address of the dashboard application from the allocation status output.
$ nomad alloc status 02c3a696 | grep 9002*http yes 10.37.105.17:9002 -> 9002
Navigating to the dashboard on port 9002
of the Nomad client host shows
a green "Connected" badge.
Stop the job. The stop
command will stop the allocations in the background
and output evaluation information about the stop request.
$ nomad stop countdash==> Monitoring evaluation "d4796df1" Evaluation triggered by job "countdash" Evaluation within deployment: "18b25bb6" Evaluation status changed: "pending" -> "complete"==> Evaluation "d4796df1" finished with status "complete"
Advanced considerations
Alternative architectures
Nomad provides a default link to a pause image. This image, however, is architecture specific and is only provided for the amd64 architecture. In order to use Consul service mesh on non-x86/amd64 hardware, you will need to configure Nomad to use a different pause container. If Nomad is trying to use a version of Envoy earlier than 1.16, you will need to specify a different version it as well. Read through the section on airgapped networks below. It explains the same configuration elements that you will need to set to use alternative containers for service mesh.
Special thanks to @GusPS
, who reported this working configuration.
Envoy 1.16 now has ARM64 support. Configure it as your sidecar image by setting
the connect.sidecar_image
meta
variable on each of your ARM64
clients.
meta { "connect.sidecar_image" = "envoyproxy/envoy:v1.16.0"}
The rancher/pause
container has versions for several different architectures
as well. Override the default pause container and use it instead. In your client
configuration, add an infra_image
to your docker plugin configuration
overriding the default with the rancher version.
plugin "docker" { config { infra_image = "rancher/pause:3.2" }}
If you came here from "Alternative Architectures" note above, return there now.
Airgapped networks or proxied environments
If you are in an airgapped network or need to access Docker Hub via a proxy, you will have to perform some additional configuration on your Nomad clients to enable Nomad's Consul service mesh integration.
Set the "infra_image" path
Set the infra_image
configuration option for the Docker driver plugin on
your Nomad clients to a path that is accessible in your environment. For
example,
plugin "docker" { config { infra_image = "dockerhub.myproxy.com:8080/google_containers/pause-amd64:3.0" }}
Changing this value will require a restart of the Nomad client agent.
Set the "sidecar_image" path
You will also need the Envoy proxy image used for Consul service mesh
networking. Configure the sidecar image on your Nomad clients to override the
default container path by adding a "connect.sidecar_image"
value to the
client.meta
block of your Nomad client configuration. If you do
not have a meta
block inside of your top-level client
block, add one as
follows.
client {# ... meta { # Set this value to a proxy or internal registry that can provide an # appropriate envoy image. "connect.sidecar_image" = "dockerhub.myproxy.com:8080/envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09" }# ...}
Changing this value will require a restart of the Nomad client
agent. Alternately, you can set the value with the nomad node meta apply
command.
Next steps
Learn more about Nomad's Consul service mesh integration by checking out these resources.
- Nomad Consul Service Mesh documentation
consul
block in job specificationconnect
block in job specificationconsul
block in agent configuration
Learn more about Consul service mesh with these tutorials.
Learn more about Consul ACLs.