Provide fault tolerance with redundancy zones
Enterprise Only
The redundancy zone functionality demonstrated here requires HashiCorp Cloud Platform (HCP) or self-managed Consul Enterprise. If you've purchased or wish to try out Consul Enterprise, refer to how to access Consul Enterprise.
This tutorial demonstrates how you can improve your Consul datacenter's fault resiliency by using redundancy zones.
These instructions demonstrate Consul's autopilot features, which make it possible to run one voter alongside any number of non-voters in each defined redundancy zone.
During this tutorial, you will deploy two servers, one voter and one non-voter, in each of the three cloud regions, for a total of six servers. To simplify this tutorial, we refer to these groups of servers with the following names:
group 1
is the servers that are voters in the initial deployment state.group 2
is the servers that are non-voters in the initial deployment state.
After a server in group 1
fails, autopilot promotes a non-voter from the same zone to voter status automatically. As a result, Consul servers can continue operating without an effect on server quorum. For more information about Consul server redundancy and quorum, refer to the Consul reference architecture.
The following diagrams show the Consul architecture and its changes across the course of this tutorial:
Prerequisites
The tutorial assumes that you are familiar with Consul and its core functionality. If you are new to Consul, refer to the Consul Getting Started tutorials collection.
To complete this tutorial, you need the following software:
- Consul Enterprise with a license
- An AWS account configured for use with Terraform
- git >= 2.0
- aws-cli >= 2.0
- terraform >= 1.4
- jq >= 1.6
Clone GitHub repository
Clone the GitHub repository containing the configuration files and resources.
$ git clone https://github.com/hashicorp-education/learn-consul-redundancy-zones.git
Change into the directory that contains the complete configuration files for this tutorial.
$ cd learn-consul-redundancy-zones
This repository contains Terraform configurations to spin up the initial infrastructure, as well as files to automatically configure and deploy Consul.
This tutorial's repository contains the following items:
instance-scripts/
directory contains the bash scripts used to bootstrap and join the Consul servers running on EC2 instancesprovisioning/
directory contains Consul agent configuration file templatesconsul-instances.tf
defines the EC2 instances the Consul servers run onoutputs.tf
defines Terraform outputs you use to authenticate and connect to your EC2 instancesproviders.tf
contains provider definitions for Terraformvariables.tf
defines variables you can use to customize the tutorialvpc.tf
defines the AWS VPC resources
The Terraform files provision the following billable AWS resources:
- An AWS VPC
- An AWS key pair
- An AWS EC2 instance group running Consul server agent
Set up the Consul license
Redundancy zones are a Consul Enterprise feature, meaning that servers require an Enterprise license key. If you do not have a Consul Enterprise license, you can register for a 30 day trial license.
To start the tutorial, place your Consul Enterprise license file in the repository directory before you deploy the infrastructure. The Terraform file consul-instances.tf
is configured to upload the license on your behalf. Ensure the filename is consul.hclic
.
$ touch consul.hclic
Deploy your infrastructure
Initialize your Terraform configuration to download the necessary providers and modules.
$ terraform init Initializing the backend... Initializing provider plugins...##... Terraform has been successfully initialized!##...
Then create the infrastructure. When prompted, enter yes
to confirm the run.
Note
This tutorial targets AWS region `us-east-2` as its default. If you want to deploy to another region, modify the `terraform.tfvars` file accordingly.$ terraform apply ##... Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
It takes a few minutes to deploy your infrastructure. After the deploy completes, it returns a list of outputs you need to complete the tutorial.
Apply complete! Resources: 25 added, 0 changed, 0 destroyed.Outputs:consul_group1_ips = [ "3.76.213.176", "18.153.69.68", "3.72.40.212",]consul_group2_ips = [ "3.79.240.36", "18.199.93.74", "3.121.185.195",]consul_token = <sensitive>next_steps = [ "You can now add the TLS certificate for accessing your EC2 instances by running:", "ssh-add ./tls-key.pem",]
After Terraform deploys the infrastructure for this tutorial, you need to set up SSH access to the EC2 instances.
In order to log on to the instances, configure your SSH key manager agent to use the correct SSH key identity file.
$ ssh-add tls-key.pemIdentity added: tls-key.pem (tls-key.pem)
To make it easier to run remote commands on the instances, save the IP addresses of the Consul server nodes into a set of environment variables with the following command.
$ export GROUP1_SERVER0=ubuntu@$(terraform output -json 'consul_group1_ips' | jq -r '.[0]') && \ export GROUP1_SERVER1=ubuntu@$(terraform output -json 'consul_group1_ips' | jq -r '.[1]') && \ export GROUP1_SERVER2=ubuntu@$(terraform output -json 'consul_group1_ips' | jq -r '.[2]') && \ export GROUP2_SERVER0=ubuntu@$(terraform output -json 'consul_group2_ips' | jq -r '.[0]') && \ export GROUP2_SERVER1=ubuntu@$(terraform output -json 'consul_group2_ips' | jq -r '.[1]') && \ export GROUP2_SERVER2=ubuntu@$(terraform output -json 'consul_group2_ips' | jq -r '.[2]')
Review Terraform configuration for server instances
Open consul-instances.tf
. This Terraform configuration creates the following:
- a TLS key pair that you can use to login to the server instances
- a couple of AWS IAM policies for the instances so they can use Consul cloud join
- two groups of EC2 instances that run Consul as servers
The EC2 instance uses a provisioning script instance-scripts/setup.sh
that is executed by the cloud-init
subsystem to automate the Consul client configuration and provisioning. This script installs the Consul agent package on the instance and sets up its Consul configuration file. The latter is automatically generated by Terraform for each Consul server instance.
Inspect the consul-server-group1
resource in the consul-instances.tf
file. The following output is trimmed for brevity.
consul-instances.tf
resource "aws_instance" "consul-server-group1" { count = 3 ami = data.aws_ami.ubuntu.id instance_type = "t3.micro" iam_instance_profile = aws_iam_instance_profile.profile_manage_instances.name ##... setup = base64gzip(templatefile("${path.module}/instance-scripts/setup.sh", { hostname = "consul-group1-server${count.index}", consul_license = base64encode(file("${path.module}/consul.hclic")), consul_ca = base64encode(tls_self_signed_cert.consul_ca_cert.cert_pem), consul_config = base64encode(templatefile("${path.module}/provisioning/templates/consul-server.json", { count = count.index, datacenter = var.datacenter, token = random_uuid.consul_bootstrap_token.result, retry_join = "provider=aws tag_key=learn-consul-redundancy-zones tag_value=join", })), consul_acl_token = random_uuid.consul_bootstrap_token.result, consul_version = var.consul_version, vpc_cidr = module.vpc.vpc_cidr_block, })), }) tags = { Name = "consul-group1-server${count.index}" learn-consul-redundancy-zones = "join" }}
On line 2, the count
directive causes Terraform to deploy three instances of this resource. Each instance's hostname is dynamically generated on line 10 by appending the instance number to the end of the consul-group1-server
string.
Line 13 generates the Consul configuration file. It uses the template in provisioning/templates/consul-server.json
and passes a few variables to it to enable the Consul cloud autojoin feature. Here is an example of the generated configuration for the consul-group1-server0
node. You will review this configuration in the next section of this tutorial.
{ "acl": { "enabled": true, "down_policy": "async-cache", "default_policy": "deny", "tokens": { "agent": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "default": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "initial_management": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" } }, "datacenter": "dc1", "retry_join": [ "provider=aws tag_key=learn-consul-redundancy-zones tag_value=join" ], "node_meta": { "zone": "zone0" }, "autopilot": { "redundancy_zone_tag": "zone" }, "license_path": "/etc/consul.d/consul.hclic", "encrypt": "", "encrypt_verify_incoming": false, "encrypt_verify_outgoing": false, "server": true, "bootstrap_expect": 3, "log_level": "INFO", "ui_config": { "enabled": true }, "tls": { "defaults": { "ca_file": "/etc/consul.d/ca.pem", "verify_outgoing": false } }, "ports": { "grpc": 8502 }, "bind_addr": "{{ GetPrivateInterfaces | include \"network\" \"10.0.0.0/16\" | attr \"address\" }}"}
Finally, line 27 includes the learn-consul-redundancy-zones
key and its value join
. This configuration enables instances to identity and join other servers in a cluster.
Review configuration for redundancy zones
Inspect the configuration file on the first Consul instance in the first server group. If you get an error that there is no such file or directory, this means that the provisioning is still working on the instance. Wait for a few minutes more before you continue this tutorial.
$ ssh $GROUP1_SERVER0 "cat /etc/consul.d/client.json"
{ "acl": { "enabled": true, "down_policy": "async-cache", "default_policy": "deny", "tokens": { "agent": "xxxxxxxxxxxxxxxxxx", "default": "xxxxxxxxxxxxxxxxxx", "initial_management": "xxxxxxxxxxxxxxxxxx" } }, "datacenter": "dc1", "retry_join": [ "provider=aws tag_key=learn-consul-redundancy-zones tag_value=join" ], "node_meta": { "zone": "zone0" }, "autopilot": { "redundancy_zone_tag": "zone" }, "license_path": "/etc/consul.d/consul.hclic", "encrypt": "", "encrypt_verify_incoming": false, "encrypt_verify_outgoing": false, "server": true, "bootstrap_expect": 3, "log_level": "INFO", "ui_config": { "enabled": true }, "tls": { "defaults": { "ca_file": "/etc/consul.d/ca.pem", "verify_outgoing": false } }, "ports": { "grpc": 8502 }, "bind_addr": "{{ GetPrivateInterfaces | include \"network\" \"10.0.0.0/16\" | attr \"address\" }}"}
When you use Consul's availability zones functionality, every Consul instance must be assigned to a zone. A zone can have only one Consul server participate as a voter, but it can include multiple non-voter Consul servers. You define the zone with tags that designate the zone name.
In the provisioning template for the Consul servers, these zones are defined and configured according to the following code blocks:
"node_meta": { "zone": "zone${count}" }, "autopilot": { "redundancy_zone_tag": "zone" },
The name zone
is arbitrary and could be anything. If you change the name, we recommend that you use the same tag name on all Consul servers.
You can inspect the configured zone tag with a direct query to the Consul server agent on the deployed instance.
$ ssh $GROUP1_SERVER0 "consul operator autopilot get-config"
CleanupDeadServers = trueLastContactThreshold = 200msMaxTrailingLogs = 250MinQuorum = 0ServerStabilizationTime = 10sRedundancyZoneTag = "zone"DisableUpgradeMigration = falseUpgradeVersionTag = ""
You can inspect the configured zone tag with a direct query to the Consul server agent on the deployed instance.
You can inspect the node's tag configuration with a query to the /agent/self
API endpoint of the Consul server agent on the deployed instance.
$ ssh $GROUP1_SERVER0 "curl --silent localhost:8500/v1/agent/self" | jq .Meta{ "consul-network-segment": "", "consul-version": "1.17.3", "zone": "zone0"}
Tip
To change a zone tag without reloading the Consul configuration file, use the consul operator autopilot set-config -redundancy-zone-tag=<tag-name>
command or the related API endpoint.
Review voting status for Consul servers
Run the consul operator
command on the first Consul server from the first server group and review which nodes are voters and which ones are non-voters. Your results may be different based on which node was provisioned first. Refer to the Voter
column in the output.
$ ssh $GROUP1_SERVER0 "consul operator raft list-peers"Node ID Address State Voter RaftProtocol Commit Index Trails Leader Byconsul-group1-server0 27f94c2a-9f12-1cfb-9357-a574919a7aa1 10.0.4.237:8300 leader true 3 322 -consul-group1-server1 ed563e54-3a26-aebd-9565-23d21609d22d 10.0.4.246:8300 follower true 3 322 0 commitsconsul-group1-server2 a1091fef-d90b-72d3-da61-d6bc60f2ed04 10.0.4.97:8300 follower true 3 322 0 commitsconsul-group2-server0 36747824-080a-a693-b419-ae3309de3389 10.0.4.242:8300 follower false 3 322 0 commitsconsul-group2-server1 cda2c288-c01b-0704-fcf8-336aba213b98 10.0.4.93:8300 follower false 3 322 0 commitsconsul-group2-server2 ca684698-68e7-e759-501d-d575c8cd41ec 10.0.4.186:8300 follower false 3 322 0 commits
In this case, the voting servers are consul-group1-server0
, consul-group1-server1
and consul-group1-server2
.
If all six servers are voters, make sure your Consul license includes the Redundancy Zone
feature set. Run the following command and inspect your license.
$ ssh $GROUP1_SERVER0 "consul license get"License is validLicense ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxCustomer ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxExpires At: 2030-04-27 00:00:00 +0000 UTCTerminates At: 2035-04-27 00:00:00 +0000 UTCNon-terminating: falseDatacenter: *Modules: Global Visibility, Routing and Scale Governance and PolicyLicensed Features: Automated Backups Automated Upgrades Enhanced Read Scalability Network Segments Redundancy Zone Advanced Network Federation Namespaces SSO Audit Logging Admin Partitions
Test fault tolerance
In the next part of this tutorial, you will test the fault tolerance of the Consul cluster by simulating the failure of one server in a redundancy zone, then all servers in a redundancy zone. Finally, you will restart these servers and observe the results.
Stop one server in a zone
To verify that redundancy zones are configured correctly, stop one of the voters and check that the non-voter in its redundancy zone becomes a voter. Because consul-group1-server0
is currently a voter, you can terminate the Consul server agent without notice to simulate a failure.
$ ssh $GROUP1_SERVER0 "sudo systemctl --signal=SIGKILL stop consul"
Select another instance and inspect the status of the cluster. The following command runs on the second server in group 1.
$ ssh $GROUP1_SERVER1 "consul operator raft list-peers"Node ID Address State Voter RaftProtocol Commit Index Trails Leader Byconsul-group1-server1 ed563e54-3a26-aebd-9565-23d21609d22d 10.0.4.246:8300 follower true 3 466 0 commitsconsul-group1-server2 a1091fef-d90b-72d3-da61-d6bc60f2ed04 10.0.4.97:8300 leader true 3 466 -consul-group2-server0 36747824-080a-a693-b419-ae3309de3389 10.0.4.242:8300 follower true 3 466 0 commitsconsul-group2-server1 cda2c288-c01b-0704-fcf8-336aba213b98 10.0.4.93:8300 follower false 3 466 0 commitsconsul-group2-server2 ca684698-68e7-e759-501d-d575c8cd41ec 10.0.4.186:8300 follower false 3 466 0 commits
After the leader node consul-group1-server0
failed, three events took place:
consul-group2-server0
, the non-voter server inzone0
, was promoted to a voter.consul-group1-server2
was elected leader.consul-group1-server0
was removed from the list of peers because Consul autopilot executed dead server cleanup.
To check on the status of consul-group1-server0
, run the consul members
command.
$ ssh $GROUP1_SERVER1 "consul members"Node Address Status Type Build Protocol DC Partition Segmentconsul-group1-server0 10.0.4.254:8301 left server 1.17.3+ent 2 dc1 default <all>consul-group1-server1 10.0.4.68:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group1-server2 10.0.4.165:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server0 10.0.4.184:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server1 10.0.4.53:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server2 10.0.4.42:8301 alive server 1.17.3+ent 2 dc1 default <all>
By default, all Consul server nodes in a zone have the potential to become that zone's voter. To explicitly forbid one or more Consul servers from ever becoming a voter, use enhanced read
scalability. When you set the agent's non_voting_server
flag to true, the Consul server helps ease read load from the other voting servers but does not participate in voter elections, even if all of the other voter servers in their zone fail.
Stop all servers in a zone
After you shut down consul-group1-server0
, there is only one server left in zone0
. Run the following command to stop the remaining Consul server in zone0
, which simulates a total zone failure.
$ ssh $GROUP2_SERVER0 "sudo systemctl --signal=SIGKILL stop consul"
Inspect the status of the cluster. Run the command against the second server from the first server group, or any other instance that still runs.
$ ssh $GROUP1_SERVER1 "consul operator raft list-peers"Node ID Address State Voter RaftProtocol Commit Index Trails Leader Byconsul-group1-server1 7813ce06-16dd-d06a-7343-46dd4ca51a11 10.0.4.68:8300 leader true 3 648 -consul-group1-server2 dd1374e1-1eca-e75c-d9f2-19fedc102824 10.0.4.165:8300 follower true 3 648 0 commitsconsul-group2-server0 b13d5391-688e-8eaf-a749-88e1ed85a104 10.0.4.184:8300 follower false 3 580 68 commitsconsul-group2-server1 d3054163-3822-0fb0-560f-bc6548ed40a9 10.0.4.53:8300 follower true 3 648 0 commitsconsul-group2-server2 9d4d2fc0-80e5-df80-1ab8-c19874b0a8c0 10.0.4.42:8300 follower false 3 648 0 commits
After the node consul-group2-server0
failed, two events took place events took place:
consul-group2-server1
was promoted to a voter.consul-group2-server0
began to trail the leader's index.
In order to preserve quorum of 3 voting nodes, Consul Autopilot promotes an available server from a different zone, even if that zone already has a voter.
Inspect the Consul Autopilot state and verify that the extra voter in zone1
was promoted because of the failure of all nodes in zone0
. The output is trimmed for brevity.
$ ssh $GROUP1_SERVER1 "consul operator autopilot state -format=json | jq -r \" (.Servers | .[] | [.Name, .RedundancyZone, .NodeType]) | @tsv \" | sort"consul-group1-server1 zone1 zone-voterconsul-group1-server2 zone2 zone-voterconsul-group2-server0 zone0 zone-voterconsul-group2-server1 zone1 zone-extra-voterconsul-group2-server2 zone2 zone-standby
The Node Type
describes the voter status.
zone-voter
indicates that autopilot designates this server to be the voter for the specific zone.zone-standby
indicates that autopilot designates this server to become the voter if a voter from the zone fails.zone-extra-voter
indicates that autopilot designates this server as available to become a voter due to a failure of all servers in another zone. When one of the servers in the failed zone is restored, this server is automatically demoted.
Explore the command's full output. It includes the Consul server node's name, its zone, and its role.
$ ssh $GROUP1_SERVER1 "consul operator autopilot state"Healthy: falseFailure Tolerance: 1Optimistic Failure Tolerance: 2Leader: 7813ce06-16dd-d06a-7343-46dd4ca51a11Voters: 7813ce06-16dd-d06a-7343-46dd4ca51a11 dd1374e1-1eca-e75c-d9f2-19fedc102824 d3054163-3822-0fb0-560f-bc6548ed40a9Redundancy Zones: zone0: Failure Tolerance: 0 Voters: Servers: b13d5391-688e-8eaf-a749-88e1ed85a104 zone1: Failure Tolerance: 1 Voters: 7813ce06-16dd-d06a-7343-46dd4ca51a11 d3054163-3822-0fb0-560f-bc6548ed40a9 Servers: 7813ce06-16dd-d06a-7343-46dd4ca51a11 d3054163-3822-0fb0-560f-bc6548ed40a9 zone2: Failure Tolerance: 1 Voters: dd1374e1-1eca-e75c-d9f2-19fedc102824 Servers: dd1374e1-1eca-e75c-d9f2-19fedc102824 9d4d2fc0-80e5-df80-1ab8-c19874b0a8c0Upgrade: Status: idle Target Version: 1.17.3 Target Version Voters: 7813ce06-16dd-d06a-7343-46dd4ca51a11 dd1374e1-1eca-e75c-d9f2-19fedc102824 d3054163-3822-0fb0-560f-bc6548ed40a9 Target Version Non-Voters: 9d4d2fc0-80e5-df80-1ab8-c19874b0a8c0 b13d5391-688e-8eaf-a749-88e1ed85a104Servers: 7813ce06-16dd-d06a-7343-46dd4ca51a11 Name: consul-group1-server1 Address: 10.0.4.68:8300 Version: 1.17.3 Status: leader Node Type: zone-voter Node Status: alive Healthy: true Last Contact: 0s Last Term: 4 Last Index: 1464 Redundancy Zone: zone1 Upgrade Version: 1.17.3 Meta "consul-network-segment": "" "consul-version": "1.17.3" "zone": "zone1" 9d4d2fc0-80e5-df80-1ab8-c19874b0a8c0 Name: consul-group2-server2 Address: 10.0.4.42:8300 Version: 1.17.3 Status: non-voter Node Type: zone-standby Node Status: alive Healthy: true Last Contact: 75.292238ms Last Term: 4 Last Index: 1464 Redundancy Zone: zone2 Upgrade Version: 1.17.3 Meta "consul-network-segment": "" "consul-version": "1.17.3" "zone": "zone2" b13d5391-688e-8eaf-a749-88e1ed85a104 Name: consul-group2-server0 Address: 10.0.4.184:8300 Version: 1.17.3 Status: non-voter Node Type: zone-voter Node Status: failed Healthy: false Last Contact: 19.749669ms Last Term: 4 Last Index: 580 Redundancy Zone: zone0 Upgrade Version: 1.17.3 Meta "consul-network-segment": "" "consul-version": "1.17.3" "zone": "zone0" d3054163-3822-0fb0-560f-bc6548ed40a9 Name: consul-group2-server1 Address: 10.0.4.53:8300 Version: 1.17.3 Status: voter Node Type: zone-extra-voter Node Status: alive Healthy: true Last Contact: 31.604337ms Last Term: 4 Last Index: 1464 Redundancy Zone: zone1 Upgrade Version: 1.17.3 Meta "consul-network-segment": "" "consul-version": "1.17.3" "zone": "zone1" dd1374e1-1eca-e75c-d9f2-19fedc102824 Name: consul-group1-server2 Address: 10.0.4.165:8300 Version: 1.17.3 Status: voter Node Type: zone-voter Node Status: alive Healthy: true Last Contact: 44.459057ms Last Term: 4 Last Index: 1464 Redundancy Zone: zone2 Upgrade Version: 1.17.3 Meta "consul-network-segment": "" "consul-version": "1.17.3" "zone": "zone2"
The other effect of shutting down the second node in zone0
is that the output of the consul operator raft list-peers
command displayed earlier shows that consul-group2-server0
is still in the Raft peers list, however as a non-Voter with a trailing Raft index. The reason this node is still in the list is because no other node was available in its zone so Consul Autopilot did not execute its dead server cleanup.
Run the following command to inspect the Consul cluster members and their status.
$ ssh $GROUP1_SERVER1 "consul members"Node Address Status Type Build Protocol DC Partition Segmentconsul-group1-server0 10.0.4.254:8301 left server 1.17.3+ent 2 dc1 default <all>consul-group1-server1 10.0.4.68:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group1-server2 10.0.4.165:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server0 10.0.4.184:8301 failed server 1.17.3+ent 2 dc1 default <all>consul-group2-server1 10.0.4.53:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server2 10.0.4.42:8301 alive server 1.17.3+ent 2 dc1 default <all>
The status of consul-group2-server0
is failed
. Compare it to the status of consul-group1-server0
, which was at first marked as failed
. However, when consul-group2-server0
stepped into its role, it was ejected from the cluster by Consul Autopilot and marked as left
.
Recover all servers in a zone
Next, observe what happens when you recover the servers in zone0
. Execute the following command to restart the Consul server agents on both server instances.
$ ssh $GROUP1_SERVER0 "sudo systemctl start consul" && \ ssh $GROUP2_SERVER0 "sudo systemctl start consul"
Wait for a few minutes for the Consul servers to start. Aftewards, inspect the Consul cluster members state.
$ ssh $GROUP1_SERVER1 "consul members" Node Address Status Type Build Protocol DC Partition Segmentconsul-group1-server0 10.0.4.126:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group1-server1 10.0.4.68:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group1-server2 10.0.4.165:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server0 10.0.4.111:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server1 10.0.4.53:8301 alive server 1.17.3+ent 2 dc1 default <all>consul-group2-server2 10.0.4.42:8301 alive server 1.17.3+ent 2 dc1 default <all>
All Consul server nodes are back in the cluster and their state is alive
. The next command shows the Raft peer set of the cluster and their voting status.
$ ssh $GROUP1_SERVER1 "consul operator raft list-peers"Node ID Address State Voter RaftProtocol Commit Index Trails Leader Byconsul-group1-server1 7813ce06-16dd-d06a-7343-46dd4ca51a11 10.0.4.68:8300 leader true 3 1870 -consul-group1-server2 dd1374e1-1eca-e75c-d9f2-19fedc102824 10.0.4.165:8300 follower true 3 1870 0 commitsconsul-group2-server1 d3054163-3822-0fb0-560f-bc6548ed40a9 10.0.4.53:8300 follower false 3 1870 0 commitsconsul-group2-server2 9d4d2fc0-80e5-df80-1ab8-c19874b0a8c0 10.0.4.42:8300 follower false 3 1870 0 commitsconsul-group2-server0 4fe23f52-dc33-7ba0-ff5a-648a842a978d 10.0.4.111:8300 follower true 3 1870 0 commitsconsul-group1-server0 59b33708-1874-e62a-261d-cff58a69b3f8 10.0.4.126:8300 follower false 3 1870 0 commits
Notice that in this case consul-group2-server0
has become the voter for zone0
, and also consul-group1-server0
has returned to the list. Finally, inspect the Consul Autopilot node roles for the cluster.
$ ssh $GROUP1_SERVER1 "consul operator autopilot state -format=json | jq -r \" (.Servers | .[] | [.Name, .RedundancyZone, .NodeType]) | @tsv \" | sort"consul-group1-server0 zone0 zone-standbyconsul-group1-server1 zone1 zone-voterconsul-group1-server2 zone2 zone-voterconsul-group2-server0 zone0 zone-voterconsul-group2-server1 zone1 zone-standbyconsul-group2-server2 zone2 zone-standby
The cluster state was recovered. There are three voters and three non-voters in total. There is no priority for previous voters to return to their voting state. The first node to join the cluster in an empty zone becomes a voter, and any other nodes that join after it are treated as non-voters.
Clean up environment
Destroy the Terraform resources to clean up your environment. Enter yes
to confirm the destroy operation.
$ terraform destroy##... Do you really want to destroy all resources? Terraform will destroy all your managed infrastructure, as shown above. There is no undo. Only 'yes' will be accepted to confirm. Enter a value: yes ##... Destroy complete! Resources: 25 destroyed.
Due to race conditions with the various cloud resources created in this tutorial, you may need to run the destroy
operation twice to ensure all resources have been properly removed.
Next steps
In this tutorial you learned how to configure Consul Redundancy Zones in a pool of Consul server nodes and use them as hot standby instances in case one of the server voters fails. You observed how once a Consul server voter fails, another one from its zone is elected for the voter role.
Consul Redundancy Zones is a part of the Autopilot functionality set. To learn more about Autopilot, go to the Day 2 Operations: Autopilot tutorial next.