Use hcdiag with Vault
HashiCorp Diagnostics — hcdiag — is a troubleshooting data-gathering tool that you can use to collect and archive important data from Consul, Nomad, Vault, and TFE server environments. The information gathered by hcdiag
is well-suited for sharing with teams during incident response and troubleshooting.
In this tutorial, you will:
- Run a Vault server in "dev" mode, inside an Ubuntu Docker container.
- Install hcdiag from the official HashiCorp Ubuntu package repository.
- Execute basic
hcdiag
commands against this Vault service. - Explore the contents of files created by the hcdiag tool.
- Learn about additional hcdiag features and how to use a custom configuration file with
hcdiag
.
Prerequisites
You will need a local install of Docker running on your machine for this tutorial. You can find the instructions for installing Docker here.
Scenario introduction
In this tutorial, you will run a Docker container, and access that container through a shell. Inside the container environment, you will start the Vault service in dev-mode, install the hcdiag tool, and use it to gather data from Vault.
You will then unpack the file archive created by hcdiag and examine its contents to learn about what hcdiag gathers by default.
You can explore some example production outputs along with a deep dive explanation of the output.
You will also learn about some useful hcdiag options, including how to use a custom configuration file.
Set up the environment
Run an ubuntu
Docker container in detached mode with the -d
flag. The --rm
flag instructs Docker to remove the container upon stopping it, and the -t
flag allocates a pseudo-tty which keeps the container running until you stop it.
$ docker run -d --rm -t --name vault ubuntu:22.04
Open an interactive shell session in the container with the -it
flags.
$ docker exec -it vault /bin/bash
Tip
: Your terminal prompt will now appear differently to show that you are in a shell in the Ubuntu container - for example, it could resemble this example: root@a931b3c8ca00:/#
. You will run the rest of tutorial commands in this Ubuntu container shell.
Install dependencies
Update apt-get
and install the necessary dependencies.
$ apt-get update && apt-get install -y wget gpg
Create a working directory and change into it.
$ mkdir /tmp/vault-hcdiag && cd /tmp/vault-hcdiag
Install and start Vault
Add the HashiCorp repository:
$ wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor > /usr/share/keyrings/hashicorp-archive-keyring.gpg && echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com jammy main" | tee /etc/apt/sources.list.d/hashicorp.list
Install the vault
package.
$ apt-get update && apt-get install -y vault
Run the vault
server in dev mode as a background process.
$ vault server -dev &
Vault operational log output will scroll in the standard output -- when you see the log stop scrolling, you'll be able to type commands at the prompt again.
Root token
At the end of Vault's start-up output, you'll see the Unseal Key and Root Token displayed, like this:
Unseal Key: gNszUX/QW/JkPaI//BoTxXo7HWjVvmruOLnnTdMeljY=Root Token: hvs.N1SlPSRrTzpq68NzczrcOa1Z
Set up the environment
Before you run hcdiag, you must first perform some initial environment configuration so that the tool knows how to communicate with and authenticate to Vault.
Set the VAULT_ADDR
environment variables for use by hcdiag.
$ export VAULT_ADDR=http://127.0.0.1:8200
Set the VAULT_TOKEN
environment variables. Use the Root Token that Vault generated during startup (review the Install and Start Vault section).
Example:
$ export VAULT_TOKEN=hvs.N1SlPSRrTzpq68NzczrcOa1Z
You should now be able to authenticate to Vault and get some information:
$ vault token lookup | grep policiespolicies [root]
Getting a "permission denied" or 403 Error? If so, your VAULT_TOKEN
is not set: please review vault's startup logs to find the root token you should be using as VAULT_TOKEN
. If you can't find the startup logs, just stop the vault process you started and repeat the earlier section of this tutorial.
Insecure operation
In this scenario, you use a dev mode Vault server and its initial root token. For production hcdiag use, you must use a token with enough capabilities to execute the vault
CLI commands used by hcdiag. You can examine the output of Results.json
from an hcdiag archive to discover the commands used, and create a suitable production policy that limits a token to the required commands.
Install and Run the hcdiag
tool
Install the latest hcdiag release from the HashiCorp repository.
$ apt-get install -y hcdiag
This is a minimal environment, so make sure to set the SHELL
environment variable:
$ export SHELL=/bin/sh
Run hcdiag to collect all available environment information for Vault.
$ hcdiag -vault2022-08-26T18:22:26.484Z [INFO] hcdiag: Ensuring destination directory exists: directory=.2022-08-26T18:22:26.484Z [INFO] hcdiag: Checking product availability2022-08-26T18:22:26.637Z [INFO] hcdiag: Gathering diagnostics2022-08-26T18:22:26.637Z [INFO] hcdiag.product: Running operations for: product=host2022-08-26T18:22:26.637Z [INFO] hcdiag.product: running operation: product=host runner="uname -v"2022-08-26T18:22:26.637Z [INFO] hcdiag.product: Running operations for: product=vault2022-08-26T18:22:26.637Z [INFO] hcdiag.product: running operation: product=vault runner="vault version"2022-08-26T18:22:26.639Z [INFO] hcdiag.product: running operation: product=host runner=disks2022-08-26T18:22:26.640Z [INFO] hcdiag.product: running operation: product=host runner=info2022-08-26T18:22:26.641Z [INFO] hcdiag.product: running operation: product=host runner=memory2022-08-26T18:22:26.641Z [INFO] hcdiag.product: running operation: product=host runner=process2022-08-26T18:22:26.642Z [INFO] hcdiag.product: running operation: product=host runner=network2022-08-26T18:22:26.642Z [INFO] hcdiag.product: running operation: product=host runner=/etc/hosts2022-08-26T18:22:26.648Z [INFO] hcdiag.product: running operation: product=host runner=iptables2022-08-26T18:22:26.648Z [WARN] hcdiag.product: result: runner=iptables status=fail result="map[iptables -L -n -v:]" error="exec error, command=iptables -L -n -v, format=string, error=exec: "iptables": executable file not found in $PATH"2022-08-26T18:22:26.648Z [INFO] hcdiag.product: running operation: product=host runner="/proc/ files"2022-08-26T18:22:26.665Z [INFO] hcdiag.product: running operation: product=host runner=/etc/fstab2022-08-26T18:22:26.671Z [INFO] hcdiag: Product done: product=host statuses="map[fail:1 success:9]"2022-08-26T18:22:26.714Z [INFO] hcdiag.product: running operation: product=vault runner="vault status -format=json"2022-08-26T18:22:26.786Z [INFO] hcdiag.product: running operation: product=vault runner="vault read sys/health -format=json"2022-08-26T18:22:26.885Z [INFO] hcdiag.product: running operation: product=vault runner="vault read sys/seal-status -format=json"2022-08-26T18:22:26.996Z [INFO] hcdiag.product: running operation: product=vault runner="vault read sys/host-info -format=json"2022-08-26T18:22:27.086Z [INFO] hcdiag.product: running operation: product=vault runner="vault debug -output=/tmp/vault-hcdiag/hcdiag-2022-08-26T182226Z501102067/VaultDebug.tar.gz -duration=10s -interval=5s"2022-08-26T18:22:38.181Z [INFO] hcdiag.product: running operation: product=vault runner="log/docker vault"2022-08-26T18:22:38.184Z [INFO] hcdiag.product: result: runner="log/docker vault" status=skip result= | /bin/sh: 1: docker: not found error="docker not found, container=vault, error=exec error, command=docker version, error=exit status 127"2022-08-26T18:22:38.184Z [INFO] hcdiag.product: running operation: product=vault runner=journald2022-08-26T18:22:38.186Z [INFO] hcdiag.product: result: runner=journald status=skip result= | /bin/sh: 1: journalctl: not found error="journald not found on this system, service=vault, error=exec error, command=journalctl --version, error=exit status 127"2022-08-26T18:22:38.186Z [INFO] hcdiag: Product done: product=vault statuses="map[skip:2 success:6]"2022-08-26T18:22:38.186Z [INFO] hcdiag: Recording manifest2022-08-26T18:22:38.189Z [INFO] hcdiag: Created Results.json file: dest=/tmp/vault-hcdiag/hcdiag-2022-08-26T182226Z501102067/Results.json2022-08-26T18:22:38.190Z [INFO] hcdiag: Created Manifest.json file: dest=/tmp/vault-hcdiag/hcdiag-2022-08-26T182226Z501102067/Manifest.json2022-08-26T18:22:38.219Z [INFO] hcdiag: Compressed and archived output file: dest=hcdiag-2022-08-26T182226Z.tar.gz2022-08-26T18:22:38.220Z [INFO] hcdiag: Writing summary of products and ops to standard outputproduct success fail unknown totalhost 9 1 0 10vault 6 0 2 8
Tip
This is a minimal environment which doesn't use some system services that hcdiag uses to gather information; you can expect to observe errors related to those services.
Tip
You can also invoke hcdiag
without options to gather all available environment and product information. To learn about all executable options, run hcdiag -h
.
Examine results
What did hcdiag produce in the brief moments while running?
List the directory for tar+gzip archive files to discover the file that hcdiag created.
$ ls -l *.gz-rw-r--r-- 1 vault vault 139143 Aug 10 21:18 hcdiag-2022-08-18T170538Z.tar.gz
You can unpack the archive and further examine its contents:
$ tar zxvf hcdiag-2022-08-18T170538Z.tar.gzhcdiag-2022-08-18T170538Z/Manifest.jsonhcdiag-2022-08-18T170538Z/Results.jsonhcdiag-2022-08-18T170538Z/VaultDebug.tar.gzhcdiag-2022-08-18T170538Z/docker-vault.loghcdiag-2022-08-18T170538Z/journald-vault.log
Use the hcdiag redaction feature to ensure that this bundle holds information that is appropriate to share based on your specific use cases.
Vault Enterprise users
You can share the output from hcdiag runs with HashiCorp Customer Support to greatly reduce the amount of information gathering needed in a support request.
The tool works locally, and does not export or share the diagnostic bundle with anyone. You must use other tools to transfer it to a secure location so you can share it with specific support staff who need to view it.
After you unpack the archive, the directory hcdiag-2022-08-18T170538Z
has 3 files, which the following section further describes.
Example production output
Here is a deeper dive into the output files and their contents for further clarification.
Manifest.json
The manifest has JSON data representing details about the hcdiag
run.
Here is an example.
{ "started_at": "2022-08-26T18:22:26.484412828Z", "ended_at": "2022-08-26T18:22:38.187139789Z", "duration": "11.702726956 seconds", "num_ops": 18, "configuration": { "hcl": {}, "operating_system": "auto", "serial": false, "dry_run": false, "consul_enabled": false, "nomad_enabled": false, "terraform_ent_enabled": false, "vault_enabled": true, "since": "2022-08-23T18:22:26.484337403Z", "until": "0001-01-01T00:00:00Z", "includes": null, "destination": ".", "debug_duration": 10000000000, "debug_interval": 5000000000 }, ...}
From this output, you can learn things like products queried, the duration of the run, and the presence of errors.
Results.json
The results file has detailed information about the host and Vault environment. The large amount of output from the file is best parsed and queried with a tool like jq
for specific answers.
{ "host": { "stats": { "runner": { "os": "linux" }, "result": { "disk": [ {"note":"content omitted"}, ], "host": { {"note":"content omitted"}, }, "memory": { {"note":"content omitted"}, }, "uname": "#1 SMP Mon Nov 8 10:21:19 UTC 2021" }, "error": "" } }, "vault": { "vault debug -output=hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz -duration=10s": { "runner": { "command": "vault debug -output=hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz -duration=10s" }, {"note":"content omitted"}, }, "vault read sys/seal-status -format=json": { "runner": { "command": "vault read sys/seal-status -format=json" }, "result": { {"note":"content omitted"}, }, "error": "" }, "vault status -format=json": { "runner": { "command": "vault status -format=json" }, {"note":"content omitted"}, }, "vault version": { "runner": { "command": "vault version" }, "result": "Vault v1.9.3 (7dbdd57243a0d8d9d9e07cd01eb657369f8e1b8a)", "error": "" } }}
{ "host": { "stats": { "runner": { "os": "linux" }, "result": { "disk": [ { "device": "overlay", "mountpoint": "/", "fstype": "overlay", "opts": [ "rw", "relatime" ] }, { "device": "proc", "mountpoint": "/proc", "fstype": "proc", "opts": [ "rw", "nosuid", "nodev", "noexec", "relatime" ] }, { "device": "tmpfs", "mountpoint": "/dev", "fstype": "tmpfs", "opts": [ "rw", "nosuid" ] }, { "device": "devpts", "mountpoint": "/dev/pts", "fstype": "devpts", "opts": [ "rw", "nosuid", "noexec", "relatime" ] }, { "device": "sysfs", "mountpoint": "/sys", "fstype": "sysfs", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime" ] }, { "device": "cgroup", "mountpoint": "/sys/fs/cgroup", "fstype": "cgroup2", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime" ] }, { "device": "mqueue", "mountpoint": "/dev/mqueue", "fstype": "mqueue", "opts": [ "rw", "nosuid", "nodev", "noexec", "relatime" ] }, { "device": "shm", "mountpoint": "/dev/shm", "fstype": "tmpfs", "opts": [ "rw", "nosuid", "nodev", "noexec", "relatime" ] }, { "device": "/dev/vda1", "mountpoint": "/vault/logs", "fstype": "ext4", "opts": [ "rw", "relatime", "bind" ] }, { "device": "/dev/vda1", "mountpoint": "/vault/file", "fstype": "ext4", "opts": [ "rw", "relatime", "bind" ] }, { "device": "/dev/vda1", "mountpoint": "/etc/resolv.conf", "fstype": "ext4", "opts": [ "rw", "relatime", "bind" ] }, { "device": "/dev/vda1", "mountpoint": "/etc/hostname", "fstype": "ext4", "opts": [ "rw", "relatime", "bind" ] }, { "device": "/dev/vda1", "mountpoint": "/etc/hosts", "fstype": "ext4", "opts": [ "rw", "relatime", "bind" ] }, { "device": "proc", "mountpoint": "/proc/bus", "fstype": "proc", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime", "bind" ] }, { "device": "proc", "mountpoint": "/proc/fs", "fstype": "proc", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime", "bind" ] }, { "device": "proc", "mountpoint": "/proc/irq", "fstype": "proc", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime", "bind" ] }, { "device": "proc", "mountpoint": "/proc/sys", "fstype": "proc", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime", "bind" ] }, { "device": "proc", "mountpoint": "/proc/sysrq-trigger", "fstype": "proc", "opts": [ "ro", "nosuid", "nodev", "noexec", "relatime", "bind" ] }, { "device": "tmpfs", "mountpoint": "/proc/acpi", "fstype": "tmpfs", "opts": [ "ro", "relatime" ] }, { "device": "tmpfs", "mountpoint": "/proc/kcore", "fstype": "tmpfs", "opts": [ "rw", "nosuid", "bind" ] }, { "device": "tmpfs", "mountpoint": "/proc/keys", "fstype": "tmpfs", "opts": [ "rw", "nosuid", "bind" ] }, { "device": "tmpfs", "mountpoint": "/proc/timer_list", "fstype": "tmpfs", "opts": [ "rw", "nosuid", "bind" ] }, { "device": "tmpfs", "mountpoint": "/proc/sched_debug", "fstype": "tmpfs", "opts": [ "rw", "nosuid", "bind" ] }, { "device": "tmpfs", "mountpoint": "/sys/firmware", "fstype": "tmpfs", "opts": [ "ro", "relatime" ] } ], "host": { "hostname": "fac9458dceba", "uptime": 181, "bootTime": 1645714066, "procs": 5, "os": "linux", "platform": "alpine", "platformFamily": "alpine", "platformVersion": "3.14.3", "kernelVersion": "5.10.76-linuxkit", "kernelArch": "x86_64", "virtualizationSystem": "", "virtualizationRole": "guest", "hostId": "a4cf4b9a-0000-0000-802b-63c077dfd558" }, "memory": { "total": 5704634368, "available": 4859256832, "used": 269352960, "usedPercent": 4.7216516015632575, "free": 4411375616, "active": 240119808, "inactive": 937185280, "wired": 0, "laundry": 0, "buffers": 7458816, "cached": 1016446976, "writeBack": 0, "dirty": 12288, "writeBackTmp": 0, "shared": 343969792, "slab": 59740160, "sreclaimable": 31637504, "sunreclaim": 28102656, "pageTables": 4677632, "swapCached": 0, "commitLimit": 3926052864, "committedAS": 3108286464, "highTotal": 0, "highFree": 0, "lowTotal": 0, "lowFree": 0, "swapTotal": 1073737728, "swapFree": 1073737728, "mapped": 304005120, "vmallocTotal": 35184372087808, "vmallocUsed": 11104256, "vmallocChunk": 0, "hugePagesTotal": 0, "hugePagesFree": 0, "hugePageSize": 2097152 }, "uname": "#1 SMP Mon Nov 8 10:21:19 UTC 2021" }, "error": "" } }, "vault": { "vault debug -output=hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz -duration=10s": { "runner": { "command": "vault debug -output=hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz -duration=10s" }, "result": "Overwriting interval value \"30s\" to the duration value \"10s\"\n==\u003e Starting debug capture...\n Vault Address: http://localhost:8200\n Client Version: 1.9.3\n Duration: 10s\n Interval: 10s\n Metrics Interval: 10s\n Targets: config, host, metrics, pprof, replication-status, server-status, log\n Output: hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz\n\n==\u003e Capturing static information...\n2022-02-24T14:50:47.235Z [INFO] capturing configuration state\n\n==\u003e Capturing dynamic information...\n2022-02-24T14:50:47.236Z [INFO] capturing metrics: count=0\n2022-02-24T14:50:47.236Z [INFO] capturing replication status: count=0\n2022-02-24T14:50:47.236Z [INFO] capturing host information: count=0\n2022-02-24T14:50:47.237Z [INFO] capturing pprof data: count=0\n2022-02-24T14:50:47.237Z [INFO] capturing server status: count=0\n2022-02-24T14:50:57.242Z [INFO] capturing host information: count=1\n2022-02-24T14:50:57.242Z [INFO] capturing server status: count=1\n2022-02-24T14:50:57.242Z [INFO] capturing metrics: count=1\n2022-02-24T14:50:57.242Z [INFO] capturing replication status: count=1\n2022-02-24T14:50:57.351Z [INFO] capturing pprof data: count=1\nFinished capturing information, bundling files...\nSuccess! Bundle written to: hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz", "error": "" }, "vault read sys/health -format=json": { "runner": { "command": "vault read sys/health -format=json" }, "result": { "data": null, "lease_duration": 0, "lease_id": "", "renewable": false, "request_id": "", "warnings": null }, "error": "" }, "vault read sys/host-info -format=json": { "runner": { "command": "vault read sys/host-info -format=json" }, "result": { "data": { "cpu": [ { "cacheSize": 8192, "coreId": "0", "cores": 1, "cpu": 0, "family": "6", "flags": [ "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", "cx8", "apic", "sep", "mtrr", "pge", "mca", "cmov", "pat", "pse36", "clflush", "mmx", "fxsr", "sse", "sse2", "ss", "ht", "pbe", "syscall", "nx", "pdpe1gb", "lm", "constant_tsc", "rep_good", "nopl", "xtopology", "nonstop_tsc", "cpuid", "pni", "pclmulqdq", "dtes64", "ds_cpl", "ssse3", "sdbg", "fma", "cx16", "xtpr", "pcid", "sse4_1", "sse4_2", "movbe", "popcnt", "aes", "xsave", "avx", "f16c", "rdrand", "hypervisor", "lahf_lm", "abm", "3dnowprefetch", "fsgsbase", "bmi1", "avx2", "bmi2", "erms", "avx512f", "avx512cd", "xsaveopt", "arat" ], "mhz": 2300, "microcode": "", "model": "126", "modelName": "Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz", "physicalId": "0", "stepping": 5, "vendorId": "GenuineIntel" }, { "cacheSize": 8192, "coreId": "0", "cores": 1, "cpu": 1, "family": "6", "flags": [ "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", "cx8", "apic", "sep", "mtrr", "pge", "mca", "cmov", "pat", "pse36", "clflush", "mmx", "fxsr", "sse", "sse2", "ss", "ht", "pbe", "syscall", "nx", "pdpe1gb", "lm", "constant_tsc", "rep_good", "nopl", "xtopology", "nonstop_tsc", "cpuid", "pni", "pclmulqdq", "dtes64", "ds_cpl", "ssse3", "sdbg", "fma", "cx16", "xtpr", "pcid", "sse4_1", "sse4_2", "movbe", "popcnt", "aes", "xsave", "avx", "f16c", "rdrand", "hypervisor", "lahf_lm", "abm", "3dnowprefetch", "fsgsbase", "bmi1", "avx2", "bmi2", "erms", "avx512f", "avx512cd", "xsaveopt", "arat" ], "mhz": 2300, "microcode": "", "model": "126", "modelName": "Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz", "physicalId": "1", "stepping": 5, "vendorId": "GenuineIntel" }, { "cacheSize": 8192, "coreId": "0", "cores": 1, "cpu": 2, "family": "6", "flags": [ "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", "cx8", "apic", "sep", "mtrr", "pge", "mca", "cmov", "pat", "pse36", "clflush", "mmx", "fxsr", "sse", "sse2", "ss", "ht", "pbe", "syscall", "nx", "pdpe1gb", "lm", "constant_tsc", "rep_good", "nopl", "xtopology", "nonstop_tsc", "cpuid", "pni", "pclmulqdq", "dtes64", "ds_cpl", "ssse3", "sdbg", "fma", "cx16", "xtpr", "pcid", "sse4_1", "sse4_2", "movbe", "popcnt", "aes", "xsave", "avx", "f16c", "rdrand", "hypervisor", "lahf_lm", "abm", "3dnowprefetch", "fsgsbase", "bmi1", "avx2", "bmi2", "erms", "avx512f", "avx512cd", "xsaveopt", "arat" ], "mhz": 2300, "microcode": "", "model": "126", "modelName": "Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz", "physicalId": "2", "stepping": 5, "vendorId": "GenuineIntel" }, { "cacheSize": 8192, "coreId": "0", "cores": 1, "cpu": 3, "family": "6", "flags": [ "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", "cx8", "apic", "sep", "mtrr", "pge", "mca", "cmov", "pat", "pse36", "clflush", "mmx", "fxsr", "sse", "sse2", "ss", "ht", "pbe", "syscall", "nx", "pdpe1gb", "lm", "constant_tsc", "rep_good", "nopl", "xtopology", "nonstop_tsc", "cpuid", "pni", "pclmulqdq", "dtes64", "ds_cpl", "ssse3", "sdbg", "fma", "cx16", "xtpr", "pcid", "sse4_1", "sse4_2", "movbe", "popcnt", "aes", "xsave", "avx", "f16c", "rdrand", "hypervisor", "lahf_lm", "abm", "3dnowprefetch", "fsgsbase", "bmi1", "avx2", "bmi2", "erms", "avx512f", "avx512cd", "xsaveopt", "arat" ], "mhz": 2300, "microcode": "", "model": "126", "modelName": "Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz", "physicalId": "3", "stepping": 5, "vendorId": "GenuineIntel" } ], "cpu_times": [ { "cpu": "cpu0", "guest": 0, "guestNice": 0, "idle": 162.41, "iowait": 0.6, "irq": 0, "nice": 0, "softirq": 0.03, "steal": 0, "system": 3.43, "user": 0.77 }, { "cpu": "cpu1", "guest": 0, "guestNice": 0, "idle": 163.69, "iowait": 0.28, "irq": 0, "nice": 0, "softirq": 0.02, "steal": 0, "system": 1.17, "user": 1.12 }, { "cpu": "cpu2", "guest": 0, "guestNice": 0, "idle": 162.56, "iowait": 0.32, "irq": 0, "nice": 0, "softirq": 0.05, "steal": 0, "system": 1.79, "user": 1.07 }, { "cpu": "cpu3", "guest": 0, "guestNice": 0, "idle": 161.98, "iowait": 1.1, "irq": 0, "nice": 0, "softirq": 0.08, "steal": 0, "system": 1.68, "user": 0.99 } ], "disk": [ { "free": 57431232512, "fstype": "ext2/ext3", "inodesFree": 3894488, "inodesTotal": 3907584, "inodesUsed": 13096, "inodesUsedPercent": 0.3351431472746331, "path": "/vault/logs", "total": 62725623808, "used": 2077675520, "usedPercent": 3.4913689205702814 }, { "free": 57431232512, "fstype": "ext2/ext3", "inodesFree": 3894488, "inodesTotal": 3907584, "inodesUsed": 13096, "inodesUsedPercent": 0.3351431472746331, "path": "/vault/file", "total": 62725623808, "used": 2077675520, "usedPercent": 3.4913689205702814 }, { "free": 57431232512, "fstype": "ext2/ext3", "inodesFree": 3894488, "inodesTotal": 3907584, "inodesUsed": 13096, "inodesUsedPercent": 0.3351431472746331, "path": "/etc/resolv.conf", "total": 62725623808, "used": 2077675520, "usedPercent": 3.4913689205702814 }, { "free": 57431232512, "fstype": "ext2/ext3", "inodesFree": 3894488, "inodesTotal": 3907584, "inodesUsed": 13096, "inodesUsedPercent": 0.3351431472746331, "path": "/etc/hostname", "total": 62725623808, "used": 2077675520, "usedPercent": 3.4913689205702814 }, { "free": 57431232512, "fstype": "ext2/ext3", "inodesFree": 3894488, "inodesTotal": 3907584, "inodesUsed": 13096, "inodesUsedPercent": 0.3351431472746331, "path": "/etc/hosts", "total": 62725623808, "used": 2077675520, "usedPercent": 3.4913689205702814 } ], "host": { "bootTime": 1645714066, "hostid": "6171926a-e406-4516-93eb-6352a38169cb", "hostname": "fac9458dceba", "kernelArch": "x86_64", "kernelVersion": "5.10.76-linuxkit", "os": "linux", "platform": "alpine", "platformFamily": "alpine", "platformVersion": "3.14.3", "procs": 5, "uptime": 181, "virtualizationRole": "guest", "virtualizationSystem": "" }, "memory": { "active": 241491968, "available": 4844466176, "buffers": 7458816, "cached": 1016471552, "commitlimit": 3926052864, "committedas": 3125587968, "dirty": 12288, "free": 4396568576, "highfree": 0, "hightotal": 0, "hugepagesfree": 0, "hugepagesize": 2097152, "hugepagestotal": 0, "inactive": 951267328, "laundry": 0, "lowfree": 0, "lowtotal": 0, "mapped": 305569792, "pagetables": 5148672, "shared": 343969792, "slab": 59760640, "sreclaimable": 31653888, "sunreclaim": 28106752, "swapcached": 0, "swapfree": 1073737728, "swaptotal": 1073737728, "total": 5704634368, "used": 284135424, "usedPercent": 4.980782389732993, "vmallocchunk": 0, "vmalloctotal": 35184372087808, "vmallocused": 11153408, "wired": 0, "writeback": 0, "writebacktmp": 0 }, "timestamp": "2022-02-24T14:50:47.182059169Z" }, "lease_duration": 0, "lease_id": "", "renewable": false, "request_id": "4c50da3b-eb37-9892-46a9-929c7ff087c7", "warnings": null }, "error": "" }, "vault read sys/seal-status -format=json": { "runner": { "command": "vault read sys/seal-status -format=json" }, "result": { "data": null, "lease_duration": 0, "lease_id": "", "renewable": false, "request_id": "", "warnings": null }, "error": "" }, "vault status -format=json": { "runner": { "command": "vault status -format=json" }, "result": { "active_time": "0001-01-01T00:00:00Z", "cluster_id": "f09dd249-6efa-09c2-637d-861694314024", "cluster_name": "vault-cluster-600609bf", "ha_enabled": false, "initialized": true, "migration": false, "n": 1, "nonce": "", "progress": 0, "recovery_seal": false, "sealed": false, "storage_type": "inmem", "t": 1, "type": "shamir", "version": "1.9.3" }, "error": "" }, "vault version": { "runner": { "command": "vault version" }, "result": "Vault v1.9.3 (7dbdd57243a0d8d9d9e07cd01eb657369f8e1b8a)", "error": "" } }}
The debug file
The debug tarball holds the results of invoking the vault debug
command.
The following is a tree of output files produced by unpacking the hcdiag-2022-08-18T170538Z/VaultDebug.tar.gz
file.
VaultDebug ├── 2022-02-24T14-50-47Z │ ├── allocs.prof │ ├── block.prof │ ├── goroutine.prof │ ├── goroutines.txt │ ├── heap.prof │ ├── mutex.prof │ ├── profile.prof │ ├── threadcreate.prof │ └── trace.out ├── 2022-02-24T14-50-57Z │ ├── allocs.prof │ ├── block.prof │ ├── goroutine.prof │ ├── goroutines.txt │ ├── heap.prof │ ├── mutex.prof │ └── threadcreate.prof ├── config.json ├── host_info.json ├── index.json ├── metrics.json ├── replication_status.json ├── server_status.json └── vault.log 2 directories, 23 files
The first entry, 2022-02-24T14-50-47Z
is a directory containing runtime profiling information and goroutine data as gathered from the running Vault processes with the Go pprof utility.
These profiles are essentially collections of stack traces and their associated metadata. They are most useful when debugging issues by engineers familiar with the related Vault source code.
Here is a breakdown on the contents of each file.
allocs.prof
: All past memory allocations.block.prof
: Stack traces which led to blocking on synchronization primitives.goroutine.prof
: Traces on all current goroutines.goroutines.txt
: Listing of all goroutines.heap.prof
: Memory allocation of live objects.mutex.prof
: Stack traces for holders of contended mutexes.profile.prof
: CPU profile information.threadcreate.prof
: Stack traces that led to creation of new OS threads.trace.out
: CPU trace information.
Visualizing profile information is typically performed with the pprof command by passing in the filename of a .prof
file. If you have an established Go environment, you can use it to examine these files.
You can use the pprof tool in both interactive and non-interactive modes. Here are some example non-interactive invocations of the tool against the example data to familiarize you with some of its outputs.
The first example lists the top 10 entries from the 2022-02-10T16-41-35Z CPU profile:
$ go tool pprof -top 2022-02-24T14-50-47Z/profile.prof | head -n 16 File: vaultType: cpuTime: Feb 10, 2022 at 11:41am (EST)Duration: 10.10s, Total samples = 260ms ( 2.57%)Showing nodes accounting for 260ms, 100% of 260ms total flat flat% sum% cum cum% 80ms 30.77% 30.77% 80ms 30.77% runtime.futex 40ms 15.38% 46.15% 40ms 15.38% runtime.epollwait 20ms 7.69% 53.85% 30ms 11.54% runtime.gentraceback 10ms 3.85% 57.69% 10ms 3.85% compress/flate.(*huffmanBitWriter).writeTokens 10ms 3.85% 61.54% 10ms 3.85% context.WithValue 10ms 3.85% 65.38% 10ms 3.85% runtime.(*spanSet).pop 10ms 3.85% 69.23% 10ms 3.85% runtime.(*traceBuf).byte (inline) 10ms 3.85% 73.08% 10ms 3.85% runtime.casgstatus 10ms 3.85% 76.92% 10ms 3.85% runtime.heapBitsForAddr (inline) 10ms 3.85% 80.77% 10ms 3.85% runtime.makechan
This shows CPU time and usage for functions in use by Vault.
Another example for examining memory usage would be to use the same command against the heap file instead.
$ go tool pprof -top heap.prof | head -n 15File: vaultType: inuse_spaceTime: Feb 10, 2022 at 11:41am (EST)Showing nodes accounting for 40848.57kB, 100% of 40848.57kB total flat flat% sum% cum cum%28161.83kB 68.94% 68.94% 28161.83kB 68.94% github.com/hashicorp/vault/enthelpers/merkle.NewMerkleSubPage (inline) 6192.12kB 15.16% 84.10% 34353.95kB 84.10% github.com/hashicorp/vault/enthelpers/merkle.(*Tree).getPages 1322kB 3.24% 87.34% 2346.03kB 5.74% github.com/hashicorp/vault/enthelpers/merkle.(*Tree).recoverPages 1024.03kB 2.51% 89.84% 1024.03kB 2.51% github.com/hashicorp/vault/vendor/google.golang.org/protobuf/internal/impl.consumeBytesNoZero 544.67kB 1.33% 91.18% 544.67kB 1.33% github.com/hashicorp/vault/vendor/google.golang.org/protobuf/internal/strs.(*Builder).AppendFullName 521.05kB 1.28% 92.45% 521.05kB 1.28% github.com/hashicorp/vault/vendor/go.etcd.io/bbolt.(*node).put 519.03kB 1.27% 93.72% 519.03kB 1.27% github.com/hashicorp/vault/vendor/github.com/jackc/pgx/pgtype.NewConnInfo (inline) 514.63kB 1.26% 94.98% 514.63kB 1.26% regexp.makeOnePass.func1 512.88kB 1.26% 96.24% 512.88kB 1.26% github.com/hashicorp/vault/vault.(*SystemBackend).raftAutoSnapshotPaths 512.19kB 1.25% 97.49% 512.19kB 1.25% runtime.malg
You can also generate SVG based call graphs. For example, to generate a graph of goroutines, you would use a command like this.
$ go tool pprof -web goroutine.prof
This will generate an SVG image and open in the default handler for such images on your system. You can learn more about interpreting call graphs in the pprof documentation
Tip
If you are new to pprof, there is an excellent article on pprof that explains it thoroughly.
The second directory 2022-02-24T14-50-47Z
has the same information gathered 10 seconds later.
These are the remaining files which form the debug archive:
config.json
: A JSON representation of the current Vault server configuration.host_info.json
: Detailed host resource information about CPU, filesystems, memory, etc.index.json
: A summary of the hcdiag run and files gathered.metrics.json
: Vault Telemetry metrics data.replication_status.json
: The current Vault server's Enterprise Replication status.server_status.json
: The output from the seal-status API.vault.log
: Entries from the Vault server operational log captured during the hcdiag run.
Configuration file
You can configure hcdiag's behavior with a HashiCorp Configuration Language (HCL) formatted file. Using this file, you can configure behavior by adding your own custom runners, redacting sensitive content using regular expressions, excluding commands, and more.
To run hcdiag with a custom configuration file, just create the file and point hcdiag
at it with the -config
flag:
$ hcdiag -config /path/to/your/configfile
Tip
This minimal environment doesn't ship with most common command-line text editors,so you'll want to install one with apt-get install nano
or apt-get install vim
, depending on which one you prefer.
Here is a minimal configuration file, which does two things:
It adds an agent-level (global) redaction which instructs hcdiag to redact all sensitive content relating to Vault Tokens, when they occur in the format you saw earlier in this tutorial while starting the Vault service. This is a slightly contrived example; please refer to the official hcdiag Documentation for more detailed information about how to redact sensitive content.
It instructs hcdiag to exclude the
vault debug
command shown as an example.
diag.hcl
agent { redact "regex" { match = "hvs\\.[A-Za-z0-9]{24}" replace = "<VAULT TOKEN REDACTED>" }}product "vault" { excludes = ["vault debug"]}
If you created this file as diag.hcl
and executed hcdiag as follows, then you could expect output like this:
$ hcdiag -vault -config diag.hcl2022-08-26T18:42:31.391Z [INFO] hcdiag: Ensuring destination directory exists: directory=.2022-08-26T18:42:31.391Z [INFO] hcdiag: Checking product availability2022-08-26T18:42:31.509Z [INFO] hcdiag: Gathering diagnostics2022-08-26T18:42:31.509Z [INFO] hcdiag.product: Running operations for: product=host2022-08-26T18:42:31.509Z [INFO] hcdiag.product: running operation: product=host runner="uname -v"2022-08-26T18:42:31.509Z [INFO] hcdiag.product: Running operations for: product=vault2022-08-26T18:42:31.509Z [INFO] hcdiag.product: running operation: product=vault runner="vault version"2022-08-26T18:42:31.511Z [INFO] hcdiag.product: running operation: product=host runner=disks2022-08-26T18:42:31.512Z [INFO] hcdiag.product: running operation: product=host runner=info2022-08-26T18:42:31.512Z [INFO] hcdiag.product: running operation: product=host runner=memory2022-08-26T18:42:31.512Z [INFO] hcdiag.product: running operation: product=host runner=process2022-08-26T18:42:31.513Z [INFO] hcdiag.product: running operation: product=host runner=network2022-08-26T18:42:31.513Z [INFO] hcdiag.product: running operation: product=host runner=/etc/hosts2022-08-26T18:42:31.516Z [INFO] hcdiag.product: running operation: product=host runner=iptables2022-08-26T18:42:31.516Z [WARN] hcdiag.product: result: runner=iptables status=fail result="map[iptables -L -n -v:]" error="exec error, command=iptables -L -n -v, format=string, error=exec: "iptables": executable file not found in $PATH"2022-08-26T18:42:31.516Z [INFO] hcdiag.product: running operation: product=host runner="/proc/ files"2022-08-26T18:42:31.530Z [INFO] hcdiag.product: running operation: product=host runner=/etc/fstab2022-08-26T18:42:31.532Z [INFO] hcdiag: Product done: product=host statuses="map[fail:1 success:9]"2022-08-26T18:42:31.570Z [INFO] hcdiag.product: running operation: product=vault runner="vault status -format=json"2022-08-26T18:42:31.637Z [INFO] hcdiag.product: running operation: product=vault runner="vault read sys/health -format=json"2022-08-26T18:42:31.694Z [INFO] hcdiag.product: running operation: product=vault runner="vault read sys/seal-status -format=json"2022-08-26T18:42:31.755Z [INFO] hcdiag.product: running operation: product=vault runner="vault read sys/host-info -format=json"2022-08-26T18:42:31.817Z [INFO] hcdiag.product: running operation: product=vault runner="vault debug -output=/tmp/vault-hcdiag/hcdiag-2022-08-26T184231Z2701148618/VaultDebug.tar.gz -duration=10s -interval=5s"2022-08-26T18:42:42.898Z [INFO] hcdiag.product: running operation: product=vault runner="log/docker vault"2022-08-26T18:42:42.899Z [INFO] hcdiag.product: result: runner="log/docker vault" status=skip result= | /bin/sh: 1: docker: not found error="docker not found, container=vault, error=exec error, command=docker version, error=exit status 127"2022-08-26T18:42:42.899Z [INFO] hcdiag.product: running operation: product=vault runner=journald2022-08-26T18:42:42.902Z [INFO] hcdiag.product: result: runner=journald status=skip result= | /bin/sh: 1: journalctl: not found error="journald not found on this system, service=vault, error=exec error, command=journalctl --version, error=exit status 127"2022-08-26T18:42:42.902Z [INFO] hcdiag: Product done: product=vault statuses="map[skip:2 success:6]"2022-08-26T18:42:42.902Z [INFO] hcdiag: Recording manifest2022-08-26T18:42:42.903Z [INFO] hcdiag: Created Results.json file: dest=/tmp/vault-hcdiag/hcdiag-2022-08-26T184231Z2701148618/Results.json2022-08-26T18:42:42.903Z [INFO] hcdiag: Created Manifest.json file: dest=/tmp/vault-hcdiag/hcdiag-2022-08-26T184231Z2701148618/Manifest.json2022-08-26T18:42:42.916Z [INFO] hcdiag: Compressed and archived output file: dest=hcdiag-2022-08-26T184231Z.tar.gz2022-08-26T18:42:42.917Z [INFO] hcdiag: Writing summary of products and ops to standard outputproduct success fail unknown totalhost 9 1 0 10vault 6 0 2 8
If you compare this output to that of the hcdiag invocation you ran earlier, you'll notice that the Vault Debug information is not present in this example.
Cleanup
Exit the Ubuntu container to return to your terminal prompt.
$ exit
Stop the Docker container. Docker will automatically delete it because the -rm
flag instructs it to do so when the container stops.
$ docker stop vault
Production usage tips
By default, the hcdiag tool includes files for up to 72 hours back from the current time. You can specify the desired time range using the -include-since
flag.
If you have concerns about impacting performance of your Vault servers, you can ensure that runners run serially, instead of concurrently, by invoking hcdiag with the -serial
flag.
Deploying hcdiag in production involves a workflow like the following:
Place the hcdiag binary on a system that is capable of connecting to the Vault server targeted by hcdiag, such as a bastion host or the host itself.
When running with a configuration file and the
-config
flag, ensure that the specified configuration file is readable by the user that executes hcdiag.Ensure that the current directory or that specified by the
dest
flag is writable by the user that executes hcdiag.Ensure connectivity to the HashiCorp products that hcdiag needs to connect to during the run. Export any required environment variables for establishing connection or passing authentication tokens as necessary.
Decide on a duration for information gathering, noting that the default is to gather for up to 72 hours back in server log output. Adjust your needs as necessary with the
-include-since
flag. For example, to include 24 hours of log output, invoke as:$ hcdiag -vault -include-since 24h
Limit what is gathered with the
-includes
flag. For example,-includes /var/log/vault-*,/var/log/nomad-*
instructs hcdiag to only gather logs matching the specified Vault and Nomad filename patterns.Use redaction to prevent sensitive information like keys or passwords from reaching hcdiag's output or the generated bundle files.
Use the
-dryrun
flag to observe what hcdiag will do without anything actually being done for testing configuration and options.
Summary
In this tutorial, you learned about the hcdiag tool, and used it to gather information from a running Vault server environment. You also learned about some of hcdiag's configuration flags, the configuration file, and production specific tips for using hcdiag.
Next Steps
For additional information about the tool, check out the hcdiag
GitHub repository.
There are also hcdiag
guides for other HashiCorp tools including Nomad, Terraform, and Consul.