Hedronite · Cert-Prep Lesson · Interchain Dev (Cosmos) · Sat 2026-06-13

CometBFT Consensus and the Validator Signing Surface

priv_validator, the remote-signer protocol, and double-sign evidence in the Cosmos SDK.

Lesson Class: Cert-Prep Synthesis
Cert Slot: Interchain Dev (Cosmos) · Interchain Foundation
Word Count: ~2,550
Paired Ops: Validator Signing-Key Security and the Double-Sign-Prevention Discipline
Paired Dev: Bash for Validator Key-Ceremony and Sentry-Firewall Automation
Discipline: ROD v3 · q-card practice questions

§ IFrame

The first two Interchain cert lessons looked outward. The inaugural lesson covered the SDK upgrade module and IBC versioning, the slow coordination between sovereign chains. The second covered the IBC packet lifecycle and light-client verification, the fast cross-chain traffic. Both treated the chain as a thing that talks to other chains. Today turns the camera around and looks into the engine that makes a single chain agree with itself: CometBFT consensus, and the one act inside it that an operator can get catastrophically wrong, which is signing.

CometBFT (the consensus engine formerly named Tendermint Core) is what produces blocks on a Cosmos chain. The application logic lives in the Cosmos SDK above it, the cross-chain logic in IBC beside it, but the heartbeat is CometBFT, and its security model rests on a single promise from each validator: I will never sign two conflicting messages for the same height and round. The Ops lesson today protected the key that makes those signatures. The Dev lesson scripted the failover that keeps a second copy from ever signing. This lesson names what is actually being signed, the interface the engine uses to ask for it, and the protocol machinery that punishes a validator who breaks the promise.

Tome Grounding — Knowledge Gap tome_refs: [] — the Cosmos / CometBFT / Tendermint tome shelf is empty in the Atrium Lattice (R-010 BlockOps acquisition gap, same as the 2026-06-06 cert lesson). Lattice queries for the consensus, staking-module, and double-sign phrases returned zero canonical tome chunks. Acquisition priority surfaced to the Fajr brief §IV.

§ IIDomain Foundations — What a Validator Signs

CometBFT runs each block height as a small state machine of rounds, and within a round a validator can be asked to sign three distinct kinds of message. Hold the three apart, because the protocol treats them very differently when one is duplicated.

A proposal is the block a designated proposer puts forward for a given height and round. Only the round's proposer signs a proposal, and proposer duty rotates deterministically by voting power. A prevote is each validator's first vote, cast after it receives and validates a proposal, declaring the block it is willing to accept this round (or a special nil prevote for "nothing valid yet"). A precommit is the binding second vote, cast once a validator has seen prevotes from more than two-thirds of voting power for the same block, declaring that it commits to that block. When more than two-thirds of voting power precommits the same block, the block is final, and on a Tendermint-family chain final means final: there is no reorganization, no probabilistic settling, no deeper-confirmation horizon. That instant finality is exactly why the indexer lesson could collapse its finality gate to a single block.

Two numbers govern the safety of this dance. A block needs prevotes and precommits from more than two-thirds of voting power to advance, and the protocol can tolerate faults from strictly less than one-third. The gap between those thresholds is the Byzantine fault tolerance margin: the chain stays both safe and live as long as less than a third of voting power is faulty. A validator that double-signs spends its stake against that margin, which is why the protocol does not merely frown at a double-sign but treats it as an attack on the chain's safety.

§ IIIThe priv_validator Interface

CometBFT does not hard-code where signatures come from. It defines an interface, PrivValidator, with a small contract: report the public key, sign a vote, sign a proposal. Anything that satisfies that contract can be the validator's signer, and the choice of implementation is the whole security decision the Ops lesson dramatized.

The default implementation is FilePV, which reads an Ed25519 private key from priv_validator_key.json on the same host as the node and signs in-process. FilePV also owns the anti-double-sign state, persisted in priv_validator_state.json: the last height, round, and step it signed. Before signing anything, FilePV checks the request against that state and refuses to sign at or below a height-round-step it has already signed. This file is the protocol-level expression of the "amnesia" death from the Ops lesson. Lose priv_validator_state.json and the signer forgets what it signed; a fresh state file resets the last-signed height to zero, and the next request for an already-signed height passes the check.

The production implementation replaces FilePV with a remote signer reached over a socket. The node holds no key; it speaks the remote-signer protocol to a separate process that implements the same PrivValidator contract on a different machine. The contract is identical, so CometBFT cannot tell whether it is talking to a local file or a remote hardware-backed signer, and that indifference is the design's strength: the security model swaps underneath the engine without the engine changing.

PrivValidator
  GetPubKey()        -> consensus public key
  SignVote(vote)     -> vote with signature, or error if it would double-sign
  SignProposal(prop) -> proposal with signature, or error if it would double-sign

Read the contract carefully and the safety property is visible in the return types. SignVote and SignProposal are allowed to refuse. A correct signer returns an error rather than a signature whenever signing would violate the last-signed-height monotonicity, and the consensus engine treats that refusal as "I will miss this vote," which costs liveness, never as "sign anyway."

§ IVThe Remote-Signer Protocol and Double-Sign Evidence

The remote-signer wire protocol is a request-response conversation over a length-prefixed connection. The node sends a sign-vote or sign-proposal request carrying the exact bytes to sign; the signer validates the request against its own monotonic state, signs if and only if the request advances past the last-signed height-round-step, and returns the signature or a signed error. Two operational facts about this protocol decide whether it is safe in production. First, the signer, not the node, owns the anti-double-sign state, so the safety check lives with the key rather than with the easily-restarted node. Second, when the signer connection drops, the node misses votes rather than falling back to a local key, because a fallback to a local FilePV with stale state is precisely the path that re-signs an already-signed height.

When prevention fails and a validator does sign two conflicting messages, the protocol's response moves to the application layer through two Cosmos SDK modules. The x/evidence module accepts evidence of misbehavior, and the relevant kind here is duplicate-vote evidence: two votes from the same validator, for the same height and round and vote type, on different block hashes. Any node that observes both votes can construct this evidence and submit it in a transaction, because both votes are signed by the validator's own consensus key and are therefore self-proving. No trust is required; the signatures convict.

The x/slashing module then applies the penalty. On verified duplicate-vote evidence it slashes a fixed fraction of the validator's bonded stake (the slash_fraction_double_sign parameter, commonly five percent), jails the validator, and tombstones it. The tombstone is permanent: a tombstoned validator can never rejoin the active set with that consensus key, even after unjailing, so the only path back is a brand-new validator with a new key and a fresh delegation cycle. The slash hits delegators proportionally, which is why a double-sign is not only the operator's loss but a breach of trust with everyone who staked behind them. The same module handles the gentler downtime path: a validator that misses too many blocks in a signing window is slashed a much smaller slash_fraction_downtime and jailed temporarily. The two fractions encode the asymmetry the Ops lesson built its procedures around, written directly into chain parameters.

§ VWorked Example — A Dropped Signer Connection That Does Not Slash

Trace one incident through the surfaces above. A validator runs a remote signer; the socket between node and signer drops because the signer host reboots for a kernel patch. CometBFT's node now has no PrivValidator that answers, so SignVote requests time out. The correct node behavior is to miss prevotes and precommits for the blocks that pass during the outage, accumulating toward the downtime window but signing nothing. When the signer host returns and the socket re-establishes, the signer resumes from its persisted last-signed state, and because that state never reset, it refuses any stale re-sign and proceeds cleanly from the current height.

Compare the slashing branch the discipline avoids. Suppose the operator, anxious about missed blocks, had configured a local FilePV fallback and restored the signer host from a backup snapshot taken an hour earlier. The restored priv_validator_state.json now claims a last-signed height an hour in the past. The signer, asked to sign a current height, sees nothing in its state preventing it, signs, and meanwhile the still-live consensus has already recorded this validator's earlier signatures for some of those heights from before the reboot. Two signatures, same height, different rounds or blocks. A watching node builds duplicate-vote evidence, submits it, and x/slashing tombstones the validator. The protocol did exactly what it promises; the operator handed it the evidence. The lesson the chain enforces is the lesson the Ops procedure encodes: never restore signing state backward, and never give the node a key to fall back to.

§ VIConnection to Today's Ops and Dev Lessons

The Ops lesson and this lesson are two descriptions of one boundary. The Ops lesson named the three deaths in operational terms: theft, operator duplication, lost anti-double-sign state. This lesson names the same three in protocol terms: a stolen consensus key signs valid votes from anywhere, a duplicated signer produces the conflicting votes that x/evidence convicts on, and a reset priv_validator_state.json defeats the monotonicity check that FilePV and every remote signer rely on. The remote-signer protocol is the wire-level form of the Ops lesson's first primitive, and the tombstone is the protocol-level form of its opening asymmetry.

The Dev lesson scripts the exact prevention this protocol assumes. The remote signer owning its monotonic state only helps if the failover never starts a second signer and never restores state backward, and those guarantees live in the lockfile-and-ordering guard the Bash lesson wrote. The protocol convicts on duplicate signatures; the shell script is what keeps two signers from ever existing to produce them.

§ VIIPractice Questions

Q1
A validator signs a prevote and a precommit for the same block at the same height and round. Is this double-sign evidence?
No. Double-sign (duplicate-vote) evidence requires two votes of the same type for the same height and round on different blocks. A prevote and a precommit are different vote types and are both expected within a normal round. The convicting case is two prevotes, or two precommits, on conflicting block hashes at one height and round.
Q2
The remote-signer connection drops mid-round. What is the correct node behavior, and why is a local-key fallback dangerous?
The node should miss the affected votes and sign nothing until the signer returns. A fallback to a local FilePV risks signing with stale or independent anti-double-sign state, producing a second signature for a height the remote signer already signed, which is exactly the duplicate-vote condition x/slashing tombstones.
Q3
Why is duplicate-vote evidence self-proving, requiring no trust in the submitter?
Both conflicting votes are signed by the validator's own consensus key. Any node can verify the signatures against the validator's known public key, so the evidence convicts on cryptography alone; the submitter's honesty is irrelevant.
Q4
What is the operational difference between the downtime slash and the double-sign slash, and where is that difference encoded?
Downtime applies a small slash_fraction_downtime with temporary jailing and is recoverable by unjailing; double-sign applies a much larger slash_fraction_double_sign plus a permanent tombstone that bars the consensus key forever. Both fractions are chain parameters in x/slashing.
Q5
Why can a Tendermint-family indexer collapse its reorg-finality gate to a single block?
CometBFT provides instant finality: once more than two-thirds of voting power precommits a block, it is final by protocol with no reorganization possible. A committed block can never be replaced, so the confirmation horizon is one block.

§ VIIIClosing

A Cosmos validator's entire risk reduces to one signed message it must never duplicate. CometBFT asks for three kinds of signature, the PrivValidator interface decides who answers, the remote-signer protocol moves that answer off the exposed node, and x/evidence plus x/slashing stand ready to convict on the validator's own signatures the instant the promise breaks. Study the priv_validator_state.json monotonicity check until it is obvious why restoring it backward is the one restore an operator must never run.

Prior arc → Cert-Prep/Interchain/2026-06-06-ibc-packet-lifecycle-and-the-light-client-verification-surface
Paired Ops → δ-Chain/Synthesis-Lessons/2026-06-13-validator-signing-key-security-and-the-double-sign-prevention-discipline
Paired Dev → Polyglot-Dev/Bash/2026-06-13-bash-for-validator-key-ceremony-and-sentry-firewall-automation

🫡 ⚖️ 📜
Leo.Syri — Praetor Consulate of Imperium Luminaura
Cert-Prep Lesson · Interchain Dev (Cosmos) · 2026-06-13 · ROD v3