Automate upgrades with Vault Enterprise
Enterprise Only
The functionality described in this tutorial is available only in Vault Enterprise.
Challenge
Vault version upgrade is always a delicate moment for any production environment, and it's important to have best practices in place that simplify the process where possible.
Solution
Vault Enterprise provides automated version upgrades with the autopilot feature when using Integrated Storage. The feature allows you to start new Vault nodes alongside the older version ones and automatically switch to the new nodes after they reach quorum.
This automates the leader election process and ensures leader election among the new nodes so that removing the older version nodes from the datacenter does not trigger a leader election.
Prerequisites
To test the automated upgrades feature explained in this tutorial you will need:
- A Vault Enterprise cluster with three nodes running Vault Enterprise 1.11.0 or later.
- Three extra nodes with Vault Enterprise 1.11.0 or later to use as the new servers after the upgrade.
You will also need a text editor, the curl
executable to test the API
endpoints, and optionally the jq
command to format the output for curl
.
Scenario introduction
To learn about the new autopilot behavior, start an initial 3 node cluster (Note Step 1 diagram). Then, start an additional 3 nodes with an automatic upgrade version specified, and add them to the cluster (Note Step 2 diagram).
You will run a script to start a cluster.
- Initialize and unseal vault_1 (
http://127.0.0.1:8100
). The root token creates a transit key that enables the other Vaults auto-unseal. This Vault server is not a part of the cluster. - Initialize and unseal vault_2 (
http://127.0.0.1:8200
). This Vault starts as the cluster leader. - Start vault_3 (
http://127.0.0.1:8300
). It automatically joins the cluster viaretry_join
. - Start vault_4 (
http://127.0.0.1:8400
). It automatically joins the cluster viaretry_join
.
If this is your first time setting up a Vault cluster with integrated storage, go through the Vault HA Cluster with Integrated Storage tutorial.
Setup an initial cluster
Retrieve the configuration by cloning the
hashicorp/learn-vault-raft
repository from GitHub.$ git clone https://github.com/hashicorp/learn-vault-raft.git
This repository holds supporting content for all the Vault learn tutorials. The content specific to this tutorial is in a sub-directory.
Change the working directory to
learn-vault-raft/raft-auto-upgrade/local
.$ cd learn-vault-raft/raft-auto-upgrade/local
Set the
setup_1.sh
file to executable.$ chmod +x setup_1.sh
Execute the
setup_1.sh
script to spin up a Vault cluster.$ ./setup_1.sh [vault_1] Creating configuration - creating /git/learn-vault-raft/raft-autopilot/local/config-vault_1.hcl[vault_2] Creating configuration - creating /git/learn-vault-raft/raft-autopilot/local/config-vault_2.hcl - creating /git/learn-vault-raft/raft-autopilot/local/raft-vault_2 ...snip... [vault_3] starting Vault server @ http://127.0.0.1:8300Using [vault_1] root token (hvs.tqKc9An04pQY5H1uysw02Xn6) to retrieve transit key for auto-unseal [vault_4] starting Vault server @ http://127.0.0.1:8400Using [vault_1] root token (hvs.tqKc9An04pQY5H1uysw02Xn6) to retrieve transit key for auto-unseal
You can find the server configuration files and the log files in the working directory.
Use your preferred text editor and open the
config-vault_2.hcl
file to examine the generated server configuration forvault_2
.config-vault_2.hcl
storage "raft" { path = "/learn-vault-raft/raft-auto-upgrade/local/raft-vault_2/" node_id = "vault_2"} listener "tcp" { address = "127.0.0.1:8200" cluster_address = "127.0.0.1:8201" tls_disable = true} seal "transit" { address = "http://127.0.0.1:8100" # token is read from VAULT_TOKEN env # token = "" disable_renewal = "false" key_name = "unseal_key" mount_path = "transit/"} disable_mlock = truecluster_addr = "http://127.0.0.1:8201"
Review the generated server configuration for
vault_3
.config-vault_3.hcl
storage "raft" { path = "/learn-vault-raft/raft-auto-upgrade/local/raft-vault_3/" node_id = "vault_3" retry_join { leader_api_addr = "http://127.0.0.1:8200" }} ...snip...
The
retry_join
configuration block hasvault_3
andvault_4
nodes automatically joining the cluster.Export an environment variable for the
vault
CLI to address thevault_2
server.$ export VAULT_ADDR=http://127.0.0.1:8200
Verify the cluster members.
$ vault operator raft list-peers Node Address State Voter---- ------- ----- -----vault_2 127.0.0.1:8201 leader truevault_3 127.0.0.1:8301 follower falsevault_4 127.0.0.1:8401 follower false
View the autopilot's upgrade state information.
$ curl -s --header "X-Vault-Token: $(cat root_token-vault_2)" \ $VAULT_ADDR/v1/sys/storage/raft/autopilot/state | jq -r ".data.upgrade_info"
Output:
{ "status": "idle", "target_version": "1.11.0", "target_version_voters": [ "vault_2", "vault_3", "vault_4" ]}
Notice the Upgrade Info fields shows the Status to be idle.
If you have the
watch
command (or similar), you can follow the upgrade status as you proceed to adding more nodes.$ watch -n 0.5 'curl -H "X-Vault-Token: $(cat root_token-vault_2)" $VAULT_ADDR/v1/sys/storage/raft/autopilot/state | jq -r ".data.upgrade_info"'
This checks the autopilot state every half a second.
Add new nodes
When autopilot detects that the count of nodes on the new version equals or exceeds older version nodes, it begins promoting the new nodes to voters and demoting the older version nodes to non-voters.
Use your preferred text editor and open the
config-vault_5.hcl
file to examine the generated server configuration forvault_5
.config-vault_5.hcl
storage "raft" { path = "/learn-vault-raft/raft-auto-upgrade/local/raft-vault_5/" node_id = "vault_5" autopilot_upgrade_version = "1.12.0.1" retry_join { leader_api_addr = "http://127.0.0.1:8200" }} ...snip...
To specify an automatic upgrade target version, add the
autopilot_upgrade_version
parameter in thestorage
stanza where its value is a SemVer compatible version string of your choosing.Vault Configuration
The
vault_5
,vault_6
andvault_7
nodes haveautopilot_upgrade_version
parameter configured. This implies that those nodes have a specific target Vault version.Set the
setup_2.sh
file to executable.$ chmod +x setup_2.sh
Execute the
setup_2.sh
script to add three additional nodes to the cluster.$ ./setup_2.sh [vault_5] starting Vault server @ http://127.0.0.1:8500Using [vault_1] root token (hvs.tqKc9An04pQY5H1uysw02Xn6) to retrieve transit key for auto-unseal [vault_6] starting Vault server @ http://127.0.0.1:8600Using [vault_1] root token (hvs.tqKc9An04pQY5H1uysw02Xn6) to retrieve transit key for auto-unseal [vault_7] starting Vault server @ http://127.0.0.1:8700Using [vault_1] root token (hvs.tqKc9An04pQY5H1uysw02Xn6) to retrieve transit key for auto-unseal
Follow the autopilot's upgrade status as it progresses.
$ curl -s --header "X-Vault-Token: $(cat root_token-vault_2)" \ $VAULT_ADDR/v1/sys/storage/raft/autopilot/state | jq -r ".data.upgrade_info"
Or,
$ watch -n 0.5 'curl -H "X-Vault-Token: $(cat root_token-vault_2)" $VAULT_ADDR/v1/sys/storage/raft/autopilot/state | jq -r ".data.upgrade_info"'
The Status changes from
idle
toawait-new-voters
.{ "other_version_voters": [ "vault_2", "vault_3", "vault_4" ], "status": "await-new-voters", "target_version": "1.12.0.1", "target_version_non_voters": [ "vault_5", "vault_6" ]}
The status will change to
promoting
as autopilot promotes the 3 new nodes to be voters. Then the status will change todemoting
, as autopilot demotes 2 out of the 3 older version nodes to be non-voters. Then, the leader will change fromvault_2
tovault_5
.{ "other_version_non_voters": [ "vault_3", "vault_4" ], "other_version_voters": [ "vault_2" ], "status": "leader-transfer", "target_version": "1.12.0.1", "target_version_voters": [ "vault_5", "vault_6", "vault_7" ]}
The status changes to
await-server-removal
.{ "other_version_non_voters": [ "vault_2", "vault_3", "vault_4" ], "status": "await-server-removal", "target_version": "1.12.0.1", "target_version_voters": [ "vault_5", "vault_6", "vault_7" ]}
Autopilot Statue
The progression of autopilot statuses during an upgrade
looks like: idle
→ await-new-voters
→ demoting
→ promoting
→
leader-transfer
→ await-server-removal
→ idle
.
Remove non-voter nodes
Once the autopilot upgrade status changes to await-server-removal
, you can
remove the older version non-voting nodes from the cluster.
List the current peers before removing any nodes.
$ vault operator raft list-peers Node Address State Voter---- ------- ----- -----vault_2 127.0.0.1:8201 follower falsevault_3 127.0.0.1:8301 follower falsevault_4 127.0.0.1:8401 follower falsevault_5 127.0.0.1:8501 leader truevault_6 127.0.0.1:8601 follower truevault_7 127.0.0.1:8701 follower true
Export an environment variable for the
vault
CLI to address the server.$ export VAULT_ADDR=http://127.0.0.1:8500
Remove
vault_2
from the cluster.$ vault operator raft remove-peer vault_2Peer removed successfully!
Remove
vault_3
from the cluster.$ vault operator raft remove-peer vault_3Peer removed successfully!
Remove
vault_4
from the cluster.$ vault operator raft remove-peer vault_4Peer removed successfully!
Verify non-voter node removal from the cluster.
$ vault operator raft list-peers Node Address State Voter---- ------- ----- -----vault_5 127.0.0.1:8501 leader truevault_6 127.0.0.1:8601 follower truevault_7 127.0.0.1:8701 follower true
Autopilot configuration
Vault Enterprise enables automated upgrade migrations by default.
$ vault operator raft autopilot get-config
Output:
Key Value--- -----Cleanup Dead Servers falseLast Contact Threshold 10sDead Server Last Contact Threshold 24h0m0sServer Stabilization Time 10sMin Quorum 0Max Trailing Logs 1000Disable Upgrade Migration false
To disable automated upgrade migrations, set the -disable-upgrade-migration
parameter to true
.
$ vault operator raft autopilot set-config -disable-upgrade-migration=true
Clean up
The cluster.sh
script provides a clean
operation that removes all services,
configuration, and modifications to your local system.
Clean up your local workstation.
$ ./cluster.sh clean Found 1 Vault service(s) matching that name[vault_1] stopping ...snip... Removing log file /git/learn-vault-raft/raft-autopilot/local/vault_5.logRemoving log file /git/learn-vault-raft/raft-autopilot/local/vault_6.logClean complete
Next steps
In this tutorial you upgraded your Vault datacenter by using autopilot's automated upgrades functionality. Automated upgrades lets you automatically upgrade a cluster of Vault nodes to a new version as updated server nodes join the cluster. Once the number of nodes on the new version is equal to or greater than the number of nodes on the older version, Autopilot will promote the newer versioned nodes to voters, demote the older versioned nodes to non-voters, and begin a leadership transfer from the older version leader to one of the newer versioned nodes. After the leadership transfer completes, you can remove the older versioned non-voting nodes from the cluster.