Go's errgroup and sync/semaphore for IBC Relayer Worker Pools
Per-channel goroutine trees, bounded-concurrency backpressure, and the idempotent-retry discipline.
§ IFrame
Today's Ops lesson named the per-channel worker as the unit of relayer concurrency. The worker held three primitives at once — a poll-or-subscribe loop watching the source chain, a proof-fetch-and-submit step constructing destination transactions, and an acknowledgment sweep relaying the destination's responses back. The worker's lifecycle started when the channel opened, ran until the channel closed or the relayer process exited, and had to survive RPC errors, chain restarts, and the operator's own deliberate cancellations.
A language that lets the operator write that worker as a single coherent artifact, rather than as three loosely-coupled processes glued by a coordinator, holds an advantage. Go is such a language. Its concurrency vocabulary — goroutines as the unit, contexts as the cancellation handle, channels as the work-queue, errgroups as the supervision pattern, semaphores as the rate gate — composes cleanly into the per-channel worker shape. A production IBC relayer written in Go looks almost exactly like the Ops architecture written in code.
The lesson teaches three pieces. The first is the errgroup.Group from golang.org/x/sync/errgroup, used with WithContext to bind three child goroutines to one cancellation context such that any failure in any child terminates all three together. The second is semaphore.Weighted from golang.org/x/sync/semaphore, used to cap the per-channel RPC concurrency against a shared chain-side budget. The third is a sequence-keyed idempotent-retry layer that lets the three-relayer race architecture from the Ops lesson exist without each relayer-process knowing the other two exist.
§ IILanguage Idiom — errgroup as the Per-Channel Supervisor
errgroup.Group is a thin discipline over sync.WaitGroup plus error-collection plus, in its most useful form via errgroup.WithContext, cancellation propagation. Construct the group with a parent context, call g.Go(func() error { … }) once per child goroutine, then call g.Wait() to block until all children return; the first child to return a non-nil error cancels the group's derived context, which signals all siblings to unwind.
For the per-channel worker the supervision shape maps directly. The parent goroutine is the worker for one channel. The children are the three primitives: a poll-or-subscribe goroutine that pushes source-chain packet-send events onto a buffered channel; a submit-pump goroutine that reads from the buffered channel, fetches proofs, and posts destination transactions; an acknowledgment-sweep goroutine that watches the destination for new acknowledgments and posts them back to the source. Cancellation comes from one of three sources — a child failing fatally, the parent worker shutting down for a relayer restart, or the relayer process receiving a signal.
The Go-idiom version is plainer than the prose makes it sound.
func runChannelWorker(ctx context.Context, ch ChannelID, rl *Relayer) error {
g, gctx := errgroup.WithContext(ctx)
work := make(chan PacketSend, 64)
g.Go(func() error { return pollPacketSends(gctx, ch, rl, work) })
g.Go(func() error { return submitPump(gctx, ch, rl, work) })
g.Go(func() error { return ackSweep(gctx, ch, rl) })
return g.Wait()
}
The shape is the architecture. Three children, one supervisor, one cancellation context, one error returned. A reader who understands errgroup understands the per-channel worker's lifecycle from this function alone.
§ IIICode Worked Example — A Bounded Submit Pump
The submit-pump is the child that most needs both supervision and rate-bounded concurrency. It reads packet-send events from a buffered channel, fetches the proof for each from the source RPC, and submits a MsgRecvPacket transaction to the destination RPC. Both calls cost RPC throughput on chains that may be shared with other channel workers and with other internal relayer machinery. Without a bound, a burst of packet-send events on one channel can monopolize the source RPC and starve every other channel the relayer serves.
semaphore.Weighted is the bound. It is a counted semaphore — initialized with a total weight, acquired with a weight per call, blocking until the requested weight is available — and it accepts a context for the acquire so that a cancellation during a wait returns rather than holding forever. The relayer constructs one semaphore.Weighted per shared RPC endpoint at startup and passes it into every worker that uses that endpoint.
func submitPump(ctx context.Context, ch ChannelID, rl *Relayer, work <-chan PacketSend) error {
for {
select {
case <-ctx.Done():
return ctx.Err()
case pkt, ok := <-work:
if !ok {
return nil
}
if rl.alreadyHandled(ch, pkt.Seq) {
continue
}
if err := rl.srcRpcSem.Acquire(ctx, 1); err != nil {
return fmt.Errorf("submit pump src acquire ch=%s seq=%d: %w", ch, pkt.Seq, err)
}
proof, srcErr := rl.fetchProof(ctx, ch, pkt.Seq)
rl.srcRpcSem.Release(1)
if srcErr != nil {
rl.recordRetry(ch, pkt.Seq, srcErr)
continue
}
if err := rl.dstRpcSem.Acquire(ctx, 1); err != nil {
return fmt.Errorf("submit pump dst acquire ch=%s seq=%d: %w", ch, pkt.Seq, err)
}
dstErr := rl.submitMsgRecvPacket(ctx, ch, pkt, proof)
rl.dstRpcSem.Release(1)
rl.recordResult(ch, pkt.Seq, dstErr)
}
}
}
Several disciplines hold here in tension and converge in a small function.
The select at the top is the cancellation contract. The pump blocks on two channels — the worker's context done-channel and the upstream work channel — and a cancellation flows through immediately rather than waiting for the next packet.
The alreadyHandled check at the head of the iteration is the idempotent-retry layer's first line. The relayer maintains a per-channel sequence-keyed sync.Map of recently-handled sequences, populated by recordResult. This protects the three-relayer race architecture: the same packet arriving on the work channel a second time does not produce a wasted destination submission.
The two semaphore acquire-release pairs surround the two RPC calls. The source-side semaphore caps the fetch-proof concurrency against the source chain's RPC; the destination-side semaphore caps the submit concurrency against the destination's. The two are independent because the two RPC endpoints are independent resources. Channels sharing an endpoint share the semaphore and back-pressure each other under load rather than blowing the RPC's rate limit and inviting throttling.
The errors from the two RPC calls take different paths. A source-RPC failure routes through recordRetry and continues with the next packet. A destination-RPC failure routes through recordResult, which records the failure so the next attempt at the same sequence is treated as a retry rather than as a duplicate. The shape encodes the operational fact that source-side reads are idempotent and cheap to retry, while destination-side writes are idempotent at the protocol layer and only the local-state bookkeeping needs care.
§ IVConnection to Today's Ops Lesson
The Ops lesson named three primitives of the per-channel worker. Each maps to one piece of this code.
The poll-or-subscribe loop is the pollPacketSends goroutine sibling of the submit pump in the errgroup. It runs an unbounded loop that subscribes to the source chain's Tendermint websocket for the channel's send_packet events and pushes each onto the work channel, backstopped by a polling fallback.
The proof-fetch-and-submit step is the body of the submit-pump's loop above. The discipline the Ops lesson named — that failures here have distinct shapes (destination light client not yet updated, account out of fees, packet already relayed) — is encoded in the helpers' error wrapping, which recordRetry and recordResult route to different retry policies.
The acknowledgment sweep is the ackSweep goroutine sibling. Its shape mirrors the submit pump but reverses direction: it watches the destination for acknowledgment events, fetches the acknowledgment payload, and submits a MsgAcknowledgement to the source. The symmetric structure makes its omission visible as an asymmetry that any code review will catch.
recordResult notes the success-from-the-protocol's-view and the local pump skips re-attempts. The relayer does not know whether it won or lost the race; it only knows whether the packet is now handled. The not-knowing is the discipline that makes the architecture work.
§ VPrior-Lesson Reach
The 2026-05-19 lesson on context.Context named context as the discipline for distributed-tracing carry through goroutine boundaries. The errgroup-with-context pattern is the same discipline at the supervision tier: the parent context defines the worker's lifetime, the derived context defines the children's, and cancellation propagates through every standard-library RPC call without explicit plumbing.
The 2026-05-22 lesson on Go channels and pipeline patterns named the channel as the work-queue primitive. The work channel between pollPacketSends and submitPump is the same primitive applied to packet-send events. The buffer size (sixty-four) encodes the latency-versus-memory trade the pipeline lesson named.
The 2026-05-25 lesson on worker pools and semaphores for inference-cost-aware model routing is the closest crossing. The lesson taught semaphore.Weighted as the type-bounded concurrency primitive for capping inference cost against a budget; today's lesson takes the identical type, identical API, and applies it to chain RPC rate limit against an operator budget. Bounded concurrency is one primitive that crosses every domain where the bounded resource is finite and the workload is unbounded. Today is the second supercycle crossing of that primitive.
The 2026-05-28 lesson on middleware chains and structured logging named the decorator-composition pattern. The semaphore acquire-release pair is one such decorator; a production relayer would express the sequence as a chain of middleware wrapping a base RPC client rather than as explicit acquire-release in every call site. The shape today is shown explicitly for legibility; the production shape is shown decomposed.
§ VIClosing
A per-channel IBC relayer worker in Go is three goroutines under an errgroup supervisor, two semaphores capping the two chains' RPC concurrency, and a small idempotent-retry layer keyed by per-channel packet sequence. Forty lines of Go expresses an operational shape that the prose lesson took twenty-five hundred words to name.
The semaphore-bounded resource has now crossed two domains in the curriculum — inference cost in cognition-pair, RPC rate in chain-pair — and the type that carries the bound is unchanged across the crossings. When the next domain introduces another bounded resource, the same primitive will likely fit, and the Go vocabulary for the discipline does not need re-derivation.
For now: examine the worker function above. Identify which of your own systems have an analog supervisor pattern over their concurrent children; identify which still rely on operator-managed restart of orphaned goroutines when one child fails. Where the supervisor is missing, the failure mode is implicit, and the discipline is the supervisor, not the operator.
Paired Ops → δ-Chain/Synthesis-Lessons/2026-06-06-ibc-relayer-operations-production-devops-for-cross-chain-packet-flow
Paired Cert → Cert-Prep/Interchain/2026-06-06-ibc-packet-lifecycle-and-the-light-client-verification-surface
Filed 2026-06-06 Saturday Fajr — interactive backfill of cron failure 05:30 CDT 529 Overloaded (N=2 platform-incident-class) · Sixth Go lesson · First Go × δ-Chain crossing in the 12-week supercycle
Backward-Synergy-Reach → Go context (α 05-19) · Go channels (γ 05-22) · Go semaphores for inference (α 05-25) · Go middleware (β 05-28)
HEDRONITE-AETHER-THEME v2.1 · metal-accent meta-card border + pre/code metal-accent border per Go through δ-Chain pairing · feedback_clean_code_blocks.md held