Nomad clusters on the cloud
The Get Started guide describes how to deploy a Nomad environment with minimal infrastructure configuration. It also allows you to quickly develop, test, deploy, and iterate on your application.
When you are ready to move from your local machine, these tutorials guide you through deploying a Nomad cluster with access control lists (ACLs) enabled on the three major cloud platforms: AWS, GCP, and Azure. This gives you the flexibility to leverage all of the features available in Nomad such as CSI volumes, service discovery integration, and job constraints.
The code and configuration files for each cloud provider are in their own directory in the example repository. This tutorial will cover the contents of the repository at a high level which is the configuration of the Nomad cluster. The tutorials will then guide you through deploying and provisioning a Nomad cluster on the specific cloud platform of your choice.
Cluster overview
The cluster design follows best practices outlined in the reference architecture including a three server setup for high availability, using Consul for automatic clustering and service discovery, and making sure there is low network latency between the nodes.
Nomad's ACL system is enabled to control data and API access and provides a minimal amount of permission to the default client token, restricting any administrative rights by default. This client token is generated during the cluster setup and provided to the user for their interactions with Nomad instead of the management token.
Finally, the security group setup allows free communication between the nodes of the cluster and limits external ingress to only the necessary UI ports as outlined in the extensibility notes.
Review repository contents
The root level of the repository contains a directory for each cloud and a shared
directory that contains configuration files common to all of the clouds.
Explore the shared/config
directory
The shared/config
directory contains configuration files for starting the Nomad and Consul agents as well as the policy files for configuring ACLs.
Nomad files
nomad-acl-user.hcl
is the Nomad ACL policy file that gives the user token the permissions to read and submit jobs.
nomad.hcl
and nomad_client.hcl
are the Nomad agent startup files for the server and client nodes, respectively. They are used to configure the Nomad agent started by the nomad.service
file via systemd
. The agent files contain capitalized placeholder strings that are replaced with actual values during the provisioning process.
shared/config/nomad.hcl
data_dir = "/opt/nomad/data"bind_addr = "0.0.0.0" # Enable the serverserver { enabled = true bootstrap_expect = SERVER_COUNT} consul { address = "127.0.0.1:8500" token = "CONSUL_TOKEN"} acl { enabled = true} ## ...
Consul files
consul-acl-nomad-auto-join.hcl
is the Consul ACL policy file that gives the Nomad agent token the necessary permissions to automatically join the Consul cluster during startup.
consul-template.hcl
and consul-template.service
are used to configure and start the Consul Template service.
consul.hcl
and consul_client.hcl
are the Consul agent startup files for the server and client nodes, respectively. They are used to configure the Consul agent started by the consul_aws.service
, consul_gce.service
, or consul_azure.service
files via systemd
, depending on the cloud platform. Like the Nomad agent files, these also contain capitalized placeholder strings that are replaced with actual values during the provisioning process.
/shared/config/consul.hcl
data_dir = "/opt/consul/data"bind_addr = "0.0.0.0"client_addr = "0.0.0.0"advertise_addr = "IP_ADDRESS" bootstrap_expect = SERVER_COUNT acl { enabled = true default_policy = "deny" down_policy = "extend-cache"} log_level = "INFO" server = trueui = trueretry_join = ["RETRY_JOIN"] ## ...
Explore the shared/scripts
directory
The shared/scripts
directory contains scripts for installing, configuring, and starting Nomad and Consul on the deployed infrastructure.
setup.sh
downloads and installs Nomad, Consul, Consul Template, and their dependencies.
server.sh
and client.sh
replace the capitalized placeholder strings in the server and client agent startup files with actual values, copies the systemd
service files to the correct location and starts them, and configures Docker networking.
Explore the shared/data-scripts
directory
The data-scripts
directory contains user-data-server.sh
which bootstraps the Consul ACLs, the Nomad ACLs, and then saves the Nomad bootstrap user token temporarily in the Consul KV store. It also contains user-data-client.sh
which runs the shared/scipts/client.sh
script from above and restarts Nomad.
Tip
Terraform adds the nomad_consul_token_secret
value to the configuration during the provisioning process so that it's available for the script to replace at runtime.
shared/data-scripts/user-data-client.sh
#!/bin/bash set -e exec > >(sudo tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1sudo bash /ops/shared/scripts/client.sh "${cloud_env}" '${retry_join}' "${nomad_binary}" NOMAD_HCL_PATH="/etc/nomad.d/nomad.hcl"CLOUD_ENV="${cloud_env}" sed -i "s/CONSUL_TOKEN/${nomad_consul_token_secret}/g" $NOMAD_HCL_PATH # ...
Explore the cloud directories
The root level aws
, gcp
, and azure
directories contain several common components that have been configured to work with a specific cloud platform.
variables.hcl.example
is the variables file used for both Packer and Terraform via the -var-file
flag.
Example Packer command using -var-file
$ packer build -var-file=variables.hcl image.pkr.hcl
image.pkr.hcl
is the Packer build file used to create the machine image for the cluster nodes. This also runs the shared/scripts.setup.sh
script.
main.tf
, outputs.tf
, variables.tf
, and versions.tf
contain the Terraform configurations to provision the cluster.
By default, the cluster consists of 3 server and 3 client nodes and uses the Consul auto-join functionality to automatically add nodes as they start up and become available. The value for retry_join
found in the consul.hcl
and consul_client.hcl
agent template files comes from Terraform during provisioning and differs somewhat between the three cloud platforms.
shared/config/consul_client.hcl
ui = truelog_level = "INFO"data_dir = "/opt/consul/data"bind_addr = "0.0.0.0"client_addr = "0.0.0.0"advertise_addr = "IP_ADDRESS"retry_join = ["RETRY_JOIN"]
In each scenario, Terraform substitutes the retry_join
value into either the user-data-server.sh
or user-data-client.sh
scripts with the templatefile()
function in main.tf
.
Cloud Auto-join for AWS EC2 does not require any project specific information so the value is set as a default in the variables file. The values for tag_key
and tag_value
are read by Consul as a key-value pair of "ConsulAutoJoin" = "auto-join"
.
aws/variables.tf
# ... variable "retry_join" { description = "Used by Consul to automatically form a cluster." type = string default = "provider=aws tag_key=ConsulAutoJoin tag_value=auto-join"} # ...
A tag is set in the aws_instance
resource for each server and client that matches the key-value pair in the retry_join
variable.
aws/main.tf
resource "aws_instance" "server" { # ... # instance tags # ConsulAutoJoin is necessary for nodes to automatically join the cluster tags = merge( { "Name" = "${var.name}-server-${count.index}" }, { "ConsulAutoJoin" = "auto-join" }, { "NomadType" = "server" } ) # ...}
The value is then read by Terraform during provisioning for both the server and client nodes.
aws/main.tf
resource "aws_instance" "server" { # ... user_data = templatefile("../shared/data-scripts/user-data-server.sh", { server_count = var.server_count region = var.region cloud_env = "aws" retry_join = var.retry_join nomad_binary = var.nomad_binary nomad_consul_token_id = random_uuid.nomad_id.result nomad_consul_token_secret = random_uuid.nomad_token.result }) # ...}
main.tf
also adds the startup scripts from shared/data-scripts
to the server and client nodes during provisioning and places the actual values specified in variables.hcl
to those startup scripts.
post-script.sh
gets the temporary Nomad bootstrap user token from the Consul KV store, saves it locally, and then deletes it from the Consul KV store.
Extensibility Notes
The cluster setup in the following tutorials includes the minimum amount of configuration that is required for the cluster to operate.
Once setup is complete, the Consul UI will be accessible on port 8500
, the Nomad UI on port 4646
, and SSH to each node on port 22
. Security groups implementing this configuration are in main.tf
for each cloud in the root of their respective folders. They allow access from IP addresses specified by the CIDR range in the allowlist_ip
variable of the variables.hcl
file in the same directory.
To test out your applications running in the cluster, you will need to create additional security group rules that allow access to ports used by your application. Each scenario's main.tf
file contains an example showing how to configure the rules.
The AWS scenario contains a security group named client_ingress
where you can place your application rules.
aws/main.tf
resource "aws_security_group" "clients_ingress" { name = "${var.name}-clients-ingress" vpc_id = data.aws_vpc.default.id # ... # Add application ingress rules here # These rules are applied only to the client nodes # nginx example ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }}
The aws_instance
resource for the clients contain the clients_ingress
security group and attaches your application rules to the client instances with this group.
aws/main.tf
resource "aws_instance" "client" { ami = var.ami instance_type = var.client_instance_type key_name = var.key_name vpc_security_group_ids = [ aws_security_group.consul_nomad_ui_ingress.id, aws_security_group.ssh_ingress.id, aws_security_group.clients_ingress.id, aws_security_group.allow_all_internal.id ] count = var.client_count # ...}
Next steps
Now that you have reviewed the cluster setup repository and learned how the cluster is configured, continue on to the cluster setup tutorials for each of the major cloud platforms to provision and configure your Nomad cluster.