Express job placement preferences with affinities
The affinity stanza allows operators to express placement preferences for their jobs on particular types of nodes. Note that there is a key difference between the constraint stanza and the affinity stanza. The constraint stanza strictly filters where jobs are run based on attributes and client metadata. If no nodes are found to match, the placement does not succeed. The affinity stanza acts like a "soft constraint." Nomad will attempt to match the desired affinity, but placement will succeed even if no nodes match the desired criteria. This is done in conjunction with scoring based on the Nomad scheduler's bin packing algorithm which you can read more about here.
In this guide, you will encounter a sample application. Your application can run
in datacenters dc1
and dc2
; however, you have a strong preference to run it in
dc2. You will learn how to configure the job to inform the scheduler of your
preference, while still allowing it to place your workload in dc1
if the
desired resources aren't available in dc2.
By specify an affinity with the proper weight, the Nomad scheduler can find the best nodes on which to place your job. The affinity weight will be included when scoring nodes for placement along with other factors like the bin-packing algorithm.
Prerequisites
To perform the tasks described in this guide, you need to have a Nomad environment with Consul installed. You can use this repository to provision a sandbox environment. This tutorial will assume a cluster with one server node and three client nodes.
Tip
This tutorial is for demo purposes and is only using a single server node. In a production cluster, 3 or 5 server nodes are recommended.
Place one of the client nodes in a different datacenter
You are going express your job placement preference based on the datacenter your
nodes are located in. Choose one of your client nodes and edit
/etc/nomad.d/nomad.hcl
to change its location to dc2
. A snippet of an
example configuration file is show below with the required change is shown
below.
data_dir = "/opt/nomad/data"bind_addr = "0.0.0.0"datacenter = "dc2" # Enable the clientclient { enabled = true# ...}
After making the change on your chosen client node, restart the Nomad service
$ sudo systemctl restart nomad
If everything worked correctly, one of your nodes will now show datacenter dc2
when you run the nomad node status
command.
$ nomad node statusID DC Name Class Drain Eligibility Status3592943e dc1 ip-172-31-27-159 <none> false eligible ready3dea0188 dc1 ip-172-31-16-175 <none> false eligible ready6b6e9518 dc2 ip-172-31-27-25 <none> false eligible ready
Create a job with an affinity
Create a file with the name redis.nomad.hcl
and place the following content in it:
job "redis" { datacenters = ["dc1", "dc2"] type = "service" affinity { attribute = "${node.datacenter}" value = "dc2" weight = 100 } group "cache1" { count = 4 network { port "db" { to = 6379 } } service { name = "redis-cache" port = "db" check { name = "alive" type = "tcp" interval = "10s" timeout = "2s" } } task "redis" { driver = "docker" config { image = "redis:latest" ports = ["db"] } } }}
Observe that the job uses the affinity
stanza and specifies dc2
as the value
for the ${node.datacenter}
attribute. It also uses the value
100
for the weight which will cause the Nomad scheduler to rank
nodes in datacenter dc2
with a higher score. Keep in mind that weights can
range from -100 to 100, inclusive. Negative weights serve as anti-affinities
which cause Nomad to avoid placing allocations on nodes that match the criteria.
Register the redis Nomad job
Run the Nomad job with the following command:
$ nomad run redis.nomad.hcl==> Monitoring evaluation "11388ef2" Evaluation triggered by job "redis" Allocation "0dfcf0ba" created: node "6b6e9518", group "cache1" Allocation "89a9aae9" created: node "3592943e", group "cache1" Allocation "9a00f742" created: node "6b6e9518", group "cache1" Allocation "fc0f21bc" created: node "3dea0188", group "cache1" Evaluation status changed: "pending" -> "complete"==> Evaluation "11388ef2" finished with status "complete"
Note that two of the allocations in this example have been placed on node
6b6e9518
. This is the node configured to be in datacenter dc2
. The Nomad
scheduler selected this node because of the affinity specified. All of the
allocations have not been placed on this node because the Nomad scheduler
considers other factors in the scoring such as bin-packing. This helps avoid
placing too many instances of the same job on a node and prevents reduced
capacity during a node level failure. You will take a detailed look at the
scoring in the next few steps.
Check the status of the job
At this point, Check the status of the job and verify where the allocations have been placed. Run the following command:
$ nomad status redis
There should be four instances of the job running in the Summary
section of
the output as shown below:
...SummaryTask Group Queued Starting Running Failed Complete Lostcache1 0 0 4 0 0 0AllocationsID Node ID Task Group Version Desired Status Created Modified0dfcf0ba 6b6e9518 cache1 0 run running 1h44m ago 1h44m ago89a9aae9 3592943e cache1 0 run running 1h44m ago 1h44m ago9a00f742 6b6e9518 cache1 0 run running 1h44m ago 1h44m agofc0f21bc 3dea0188 cache1 0 run running 1h44m ago 1h44m ago
You can cross-check this output with the results of the nomad node status
command to verify that the majority of your workload has been placed on the node
in dc2
. In the case of the above output, that node is 6b6e9518
.
Obtain detailed scoring information on job placement
The Nomad scheduler will not always place all of your workload on nodes you have
specified in the affinity
stanza even if the resources are available. This is
because affinity scoring is combined with other metrics as well before making a
scheduling decision. In this step, you will take a look at some of those other
factors.
Using the output from the previous step, find an allocation that has been placed
on a node in dc2
and use the nomad alloc status
command with
the -verbose
option to obtain detailed scoring information on it.
In this example, the allocation ID to be inspected is 0dfcf0ba
(your
allocation IDs will be different).
$ nomad alloc status -verbose 0dfcf0ba
The resulting output will show the Placement Metrics
section at the bottom.
...Placement MetricsNode binpack job-anti-affinity node-reschedule-penalty node-affinity final score6b6e9518-d2a4-82c8-af3b-6805c8cdc29c 0.33 0 0 1 0.6653dea0188-ae06-ad98-64dd-a761ab2b1bf3 0.33 0 0 0 0.333592943e-67e4-461f-d888-d5842372a4d4 0.33 0 0 0 0.33
Note that the results from the binpack
, job-anti-affinity
,
node-reschedule-penalty
, and node-affinity
columns are combined to produce
the numbers listed in the final score
column for each node. The Nomad
scheduler uses the final score for each node in deciding where to make
placements.
Next steps
Experiment with the weight provided in the affinity
stanza (the value can be
from -100 through 100) and observe how the final score given to each node
changes (use the nomad alloc status
command as shown in the previous step).
Reference material
- The affinity stanza documentation
- Scheduling with Nomad