Bash for Validator Key-Ceremony and Sentry-Firewall Automation
Strict mode, trap-based cleanup, and the idempotent guard.
§ IFrame
Most shell scripts fail cheaply. A backup that exits early just runs again tomorrow. The scripts in today's Ops lesson are not that kind. A key-ceremony script that exits halfway and leaves an unencrypted key on disk has created the theft surface the whole architecture exists to remove. A failover script that starts a new signer before the old one dies has produced the double-sign that forfeits the stake. The shell is where validator security is enforced or quietly thrown away, and Bash gives you a small set of idioms that decide which.
This is the first Bash lesson in the Polyglot-Dev track, and it opens on the highest-stakes shell work in the curriculum on purpose. The language is forgiving by default and that default is the danger: a Bash script that hits an error keeps running, a variable that was never set expands to nothing, and a pipeline reports success as long as its last stage succeeded. Each of those defaults is a way for a security script to do half its job and announce victory. The first move in any serious script is to turn the defaults off.
The Linux Command Line — Shotts · Writing Shell Scripts · pp 500–501 · referenced (foundational-substrate)
Running Linux, 4th Edition — O'Reilly · Ch 17 Basic Security (firewalls) · pp 562–563 · referenced (foundational-substrate)
§ IILanguage Idiom — Strict Mode and the Trap
Bash decides whether a script is safe in its first two lines. The first is strict mode, the second is a cleanup trap, and a script that handles secret material without both is a script betting nothing will go wrong.
Strict mode is three options set together. set -e makes the script exit the moment any command returns a non-zero status, so a failed cp of a key file stops the script instead of letting the next line run against a key that is not there. set -u makes the expansion of an unset variable an error, so a typo in $KEY_DIR aborts rather than expanding to nothing and pointing a destructive command at the filesystem root. set -o pipefail makes a pipeline fail if any stage fails, not just the last, so gpg --decrypt key.gpg | install_to_signer reports the truth when the decrypt fails and the install reads empty input. The exit-status mechanism all three rest on is the one The Linux Command Line builds its flow-control chapter around: every command returns a status, zero means success, and disciplined scripts test that status rather than assuming it (The Linux Command Line, ch. 27, p. 393).
The canonical header reads:
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'
The IFS line resets word-splitting to newline and tab only, removing the space from the separator set so that a path or filename containing a space survives expansion intact. Shotts treats shell scripts as ordinary programs that deserve ordinary rigor (The Linux Command Line, writing-shell-scripts material, pp. 500–501), and this header is rigor in three lines: stop on error, stop on the unset, stop on the broken pipe.
The trap is the second idiom. A key ceremony decrypts or generates key material into a working directory, hands it to the signer, and must leave nothing behind. If the script exits early, the working directory is exactly the unencrypted-key-on-disk that §I named as the theft surface. A trap binds a cleanup function to the EXIT signal so the cleanup runs no matter how the script ends, including the strict-mode abort:
WORK="$(mktemp -d)"
cleanup() {
if [[ -d "$WORK" ]]; then
find "$WORK" -type f -exec shred --remove --zero {} +
rmdir "$WORK"
fi
}
trap cleanup EXIT
Because the trap is registered before any secret touches WORK, every exit path runs through cleanup: normal completion, a set -e abort on a failed command, an operator's Ctrl-C, a kill. The shred overwrites the file content before unlinking, so the bytes do not survive on the disk for forensic recovery. A script that deletes secrets only on the happy path is a script that leaves secrets behind exactly when something went wrong, which is exactly when an attacker is most interested.
§ IIICode Worked Example — The No-Double-Signer Failover Guard
The Ops lesson's §IV failover has one absolute rule: never let two signers hold the key at once. The script that moves a signer enforces that rule with a lockfile and an ordering check, and the ordering is the whole point.
A lockfile makes the failover mutually exclusive with itself, so two operators running the script at once cannot interleave into a double start:
LOCK="/run/validator/failover.lock"
exec 9>"$LOCK"
if ! flock --nonblock 9; then
echo "failover already in progress; refusing to start a second run" >&2
exit 1
fi
flock on file descriptor 9 holds the lock for the life of the script and releases it automatically on exit, so a crashed run does not leave a stale lock wedged forever. The non-blocking flag turns "someone else is running this" into an immediate refusal rather than a silent wait that an impatient operator might Ctrl-C and retry.
The ordering check is the safety property. The new signer must not start until the old signer is provably dead, so the script confirms the old host's signer process is gone before it moves anything:
confirm_old_signer_dead() {
local host="$1"
if ssh "$host" 'pgrep -x tmkms >/dev/null'; then
echo "old signer still running on $host; aborting before any state moves" >&2
return 1
fi
return 0
}
Under set -e, a return 1 from this function inside the main flow aborts the whole script, so a still-running old signer stops the failover cold before a single byte of key state moves. The function is written to fail loud and fail early, because the expensive mistake is to proceed.
The move itself treats the key and the anti-double-sign state as one unit, transferred only after the death check passes:
migrate_signer() {
local old="$1" new="$2"
confirm_old_signer_dead "$old"
rsync --archive --remove-source-files \
"$old:/var/lib/tmkms/secrets/" "$new:/var/lib/tmkms/secrets/"
ssh "$new" 'systemctl start tmkms'
}
Because confirm_old_signer_dead runs first and aborts on failure, rsync and the systemctl start are unreachable while the old signer lives. The new signer therefore starts with the migrated state, inheriting the last-signed height rather than booting blank. The R-and-Python lesson two days ago drew a line between exploring a question and enforcing an answer; the same line runs here, between the operator's intent to fail over and the script's enforcement that the failover cannot double-sign.
§ IVConnection to Today's Ops Lesson
The Ops lesson named three primitives: isolate the key, isolate the network position, isolate the right to sign. Each lands in the shell as a specific idiom. Key isolation is the trap-protected key ceremony from §II, which guarantees the key never outlives the script that handled it. Network-position isolation is the sentry firewall, which Running Linux frames as the basic-security task of allowing only the connections you intend and denying the rest (Running Linux, ch. 17, pp. 562–563): the signer host's firewall script permits inbound only from the validator node's address and drops everything else, and it is written to be idempotent so that re-running it converges on the same rule set instead of stacking duplicate rules. Sign-right isolation is the §III lockfile-and-ordering guard, which makes "two signers at once" unreachable in code rather than merely discouraged in a runbook.
The connection runs the other way too. The Ops lesson's asymmetry, that a minute of downtime is cheap and a double-sign is fatal, is the reason the §III script fails loud and early at every check. A script that swallowed errors would optimize for finishing; this one optimizes for refusing to finish wrong.
§ VPrior-Lesson Reach
The R and Python lesson (Polyglot-Dev/R, 2026-06-11) coined the two-language discipline: explore in one tool, enforce in another. Bash sits on the enforce side of its own version of that split. An operator explores a failover interactively, typing commands and reading output, but the moment the procedure must be correct every time, it moves into a script with strict mode and traps, because the interactive shell forgives and the production shell must not.
The chain-indexing and validator-operations lessons earlier in the δ arc both leaned on one idea this lesson makes mechanical: persistent state plus reconciliation beats in-memory bookkeeping. The indexer committed a height cursor atomically with its data; the validator failover moves the anti-double-sign state as a unit with the key. Both refuse to trust memory across a restart, and both express that refusal in the boring, durable act of writing the right thing to disk in the right order.
§ VIClosing
Bash earns its place in this curriculum on exactly the work where its defaults are most dangerous. Three lines turn the defaults off, one trap guarantees secrets do not outlive the script that touched them, and one lockfile-plus-ordering check makes the catastrophic outcome unreachable rather than unlikely. The shell is not where validator security is designed. It is where it is enforced, and a security design that lives only in a runbook and not in a strict-mode script is a design that holds until the night an operator is tired.
Write the failover as a script before the disk fails. Read it cold the next morning, and ask of every command what happens when it is the one that breaks.
Paired Ops → δ-Chain/Synthesis-Lessons/2026-06-13-validator-signing-key-security-and-the-double-sign-prevention-discipline
Paired Cert → Cert-Prep/Interchain/2026-06-13-cometbft-consensus-and-the-validator-signing-surface
Grounding tome → The Linux Command Line — Shotts (Ch 27 p 393; writing-shell-scripts pp 500–501)
Synthesis Lesson · Bash (Ops & Automation) · 2026-06-13 · ROD v3