CometBFT Consensus and the Validator Signing Surface
priv_validator, the remote-signer protocol, and double-sign evidence in the Cosmos SDK.
§ IFrame
The first two Interchain cert lessons looked outward. The inaugural lesson covered the SDK upgrade module and IBC versioning, the slow coordination between sovereign chains. The second covered the IBC packet lifecycle and light-client verification, the fast cross-chain traffic. Both treated the chain as a thing that talks to other chains. Today turns the camera around and looks into the engine that makes a single chain agree with itself: CometBFT consensus, and the one act inside it that an operator can get catastrophically wrong, which is signing.
CometBFT (the consensus engine formerly named Tendermint Core) is what produces blocks on a Cosmos chain. The application logic lives in the Cosmos SDK above it, the cross-chain logic in IBC beside it, but the heartbeat is CometBFT, and its security model rests on a single promise from each validator: I will never sign two conflicting messages for the same height and round. The Ops lesson today protected the key that makes those signatures. The Dev lesson scripted the failover that keeps a second copy from ever signing. This lesson names what is actually being signed, the interface the engine uses to ask for it, and the protocol machinery that punishes a validator who breaks the promise.
tome_refs: [] — the Cosmos / CometBFT / Tendermint tome shelf is empty in the Atrium Lattice (R-010 BlockOps acquisition gap, same as the 2026-06-06 cert lesson). Lattice queries for the consensus, staking-module, and double-sign phrases returned zero canonical tome chunks. Acquisition priority surfaced to the Fajr brief §IV.
§ IIDomain Foundations — What a Validator Signs
CometBFT runs each block height as a small state machine of rounds, and within a round a validator can be asked to sign three distinct kinds of message. Hold the three apart, because the protocol treats them very differently when one is duplicated.
A proposal is the block a designated proposer puts forward for a given height and round. Only the round's proposer signs a proposal, and proposer duty rotates deterministically by voting power. A prevote is each validator's first vote, cast after it receives and validates a proposal, declaring the block it is willing to accept this round (or a special nil prevote for "nothing valid yet"). A precommit is the binding second vote, cast once a validator has seen prevotes from more than two-thirds of voting power for the same block, declaring that it commits to that block. When more than two-thirds of voting power precommits the same block, the block is final, and on a Tendermint-family chain final means final: there is no reorganization, no probabilistic settling, no deeper-confirmation horizon. That instant finality is exactly why the indexer lesson could collapse its finality gate to a single block.
Two numbers govern the safety of this dance. A block needs prevotes and precommits from more than two-thirds of voting power to advance, and the protocol can tolerate faults from strictly less than one-third. The gap between those thresholds is the Byzantine fault tolerance margin: the chain stays both safe and live as long as less than a third of voting power is faulty. A validator that double-signs spends its stake against that margin, which is why the protocol does not merely frown at a double-sign but treats it as an attack on the chain's safety.
§ IIIThe priv_validator Interface
CometBFT does not hard-code where signatures come from. It defines an interface, PrivValidator, with a small contract: report the public key, sign a vote, sign a proposal. Anything that satisfies that contract can be the validator's signer, and the choice of implementation is the whole security decision the Ops lesson dramatized.
The default implementation is FilePV, which reads an Ed25519 private key from priv_validator_key.json on the same host as the node and signs in-process. FilePV also owns the anti-double-sign state, persisted in priv_validator_state.json: the last height, round, and step it signed. Before signing anything, FilePV checks the request against that state and refuses to sign at or below a height-round-step it has already signed. This file is the protocol-level expression of the "amnesia" death from the Ops lesson. Lose priv_validator_state.json and the signer forgets what it signed; a fresh state file resets the last-signed height to zero, and the next request for an already-signed height passes the check.
The production implementation replaces FilePV with a remote signer reached over a socket. The node holds no key; it speaks the remote-signer protocol to a separate process that implements the same PrivValidator contract on a different machine. The contract is identical, so CometBFT cannot tell whether it is talking to a local file or a remote hardware-backed signer, and that indifference is the design's strength: the security model swaps underneath the engine without the engine changing.
PrivValidator
GetPubKey() -> consensus public key
SignVote(vote) -> vote with signature, or error if it would double-sign
SignProposal(prop) -> proposal with signature, or error if it would double-sign
Read the contract carefully and the safety property is visible in the return types. SignVote and SignProposal are allowed to refuse. A correct signer returns an error rather than a signature whenever signing would violate the last-signed-height monotonicity, and the consensus engine treats that refusal as "I will miss this vote," which costs liveness, never as "sign anyway."
§ IVThe Remote-Signer Protocol and Double-Sign Evidence
The remote-signer wire protocol is a request-response conversation over a length-prefixed connection. The node sends a sign-vote or sign-proposal request carrying the exact bytes to sign; the signer validates the request against its own monotonic state, signs if and only if the request advances past the last-signed height-round-step, and returns the signature or a signed error. Two operational facts about this protocol decide whether it is safe in production. First, the signer, not the node, owns the anti-double-sign state, so the safety check lives with the key rather than with the easily-restarted node. Second, when the signer connection drops, the node misses votes rather than falling back to a local key, because a fallback to a local FilePV with stale state is precisely the path that re-signs an already-signed height.
When prevention fails and a validator does sign two conflicting messages, the protocol's response moves to the application layer through two Cosmos SDK modules. The x/evidence module accepts evidence of misbehavior, and the relevant kind here is duplicate-vote evidence: two votes from the same validator, for the same height and round and vote type, on different block hashes. Any node that observes both votes can construct this evidence and submit it in a transaction, because both votes are signed by the validator's own consensus key and are therefore self-proving. No trust is required; the signatures convict.
The x/slashing module then applies the penalty. On verified duplicate-vote evidence it slashes a fixed fraction of the validator's bonded stake (the slash_fraction_double_sign parameter, commonly five percent), jails the validator, and tombstones it. The tombstone is permanent: a tombstoned validator can never rejoin the active set with that consensus key, even after unjailing, so the only path back is a brand-new validator with a new key and a fresh delegation cycle. The slash hits delegators proportionally, which is why a double-sign is not only the operator's loss but a breach of trust with everyone who staked behind them. The same module handles the gentler downtime path: a validator that misses too many blocks in a signing window is slashed a much smaller slash_fraction_downtime and jailed temporarily. The two fractions encode the asymmetry the Ops lesson built its procedures around, written directly into chain parameters.
§ VWorked Example — A Dropped Signer Connection That Does Not Slash
Trace one incident through the surfaces above. A validator runs a remote signer; the socket between node and signer drops because the signer host reboots for a kernel patch. CometBFT's node now has no PrivValidator that answers, so SignVote requests time out. The correct node behavior is to miss prevotes and precommits for the blocks that pass during the outage, accumulating toward the downtime window but signing nothing. When the signer host returns and the socket re-establishes, the signer resumes from its persisted last-signed state, and because that state never reset, it refuses any stale re-sign and proceeds cleanly from the current height.
Compare the slashing branch the discipline avoids. Suppose the operator, anxious about missed blocks, had configured a local FilePV fallback and restored the signer host from a backup snapshot taken an hour earlier. The restored priv_validator_state.json now claims a last-signed height an hour in the past. The signer, asked to sign a current height, sees nothing in its state preventing it, signs, and meanwhile the still-live consensus has already recorded this validator's earlier signatures for some of those heights from before the reboot. Two signatures, same height, different rounds or blocks. A watching node builds duplicate-vote evidence, submits it, and x/slashing tombstones the validator. The protocol did exactly what it promises; the operator handed it the evidence. The lesson the chain enforces is the lesson the Ops procedure encodes: never restore signing state backward, and never give the node a key to fall back to.
§ VIConnection to Today's Ops and Dev Lessons
The Ops lesson and this lesson are two descriptions of one boundary. The Ops lesson named the three deaths in operational terms: theft, operator duplication, lost anti-double-sign state. This lesson names the same three in protocol terms: a stolen consensus key signs valid votes from anywhere, a duplicated signer produces the conflicting votes that x/evidence convicts on, and a reset priv_validator_state.json defeats the monotonicity check that FilePV and every remote signer rely on. The remote-signer protocol is the wire-level form of the Ops lesson's first primitive, and the tombstone is the protocol-level form of its opening asymmetry.
The Dev lesson scripts the exact prevention this protocol assumes. The remote signer owning its monotonic state only helps if the failover never starts a second signer and never restores state backward, and those guarantees live in the lockfile-and-ordering guard the Bash lesson wrote. The protocol convicts on duplicate signatures; the shell script is what keeps two signers from ever existing to produce them.
§ VIIPractice Questions
FilePV risks signing with stale or independent anti-double-sign state, producing a second signature for a height the remote signer already signed, which is exactly the duplicate-vote condition x/slashing tombstones.slash_fraction_downtime with temporary jailing and is recoverable by unjailing; double-sign applies a much larger slash_fraction_double_sign plus a permanent tombstone that bars the consensus key forever. Both fractions are chain parameters in x/slashing.§ VIIIClosing
A Cosmos validator's entire risk reduces to one signed message it must never duplicate. CometBFT asks for three kinds of signature, the PrivValidator interface decides who answers, the remote-signer protocol moves that answer off the exposed node, and x/evidence plus x/slashing stand ready to convict on the validator's own signatures the instant the promise breaks. Study the priv_validator_state.json monotonicity check until it is obvious why restoring it backward is the one restore an operator must never run.
Prior arc → Cert-Prep/Interchain/2026-06-06-ibc-packet-lifecycle-and-the-light-client-verification-surface
Paired Ops → δ-Chain/Synthesis-Lessons/2026-06-13-validator-signing-key-security-and-the-double-sign-prevention-discipline
Paired Dev → Polyglot-Dev/Bash/2026-06-13-bash-for-validator-key-ceremony-and-sentry-firewall-automation
Cert-Prep Lesson · Interchain Dev (Cosmos) · 2026-06-13 · ROD v3