Hedronite · Synthesis Lesson · Pair α (Cognition) + DevOps · Tue 2026-05-19

Observability for Multi-Agent LLM Systems

Three primitives, three questions, one observation surface.

Lesson Class: Ops Synthesis

Ops Pair: α (Cognition) + DevOps anchor

Week / Cycle: Week 1 of Cycle 1

Word Count: ~2,420

Paired Dev: Go's context.Context for Distributed Tracing

Discipline: ROD v0.4.0 (universal-application)

§ IFrame

Monday's lesson named three orchestration patterns and showed how each one fits the ML training workflow. The lesson asked the operator to pick the pattern that fit the work, build the audit at every seam, and record the chain of grants that carried the principal's authority through the work. That ask presupposes one capability the lesson took for granted. The operator can see what the agents are doing.

That capability is not a given. A pipeline of five Claude agents running across five machines, calling six tools, writing to four registries, returning to a coordinator that returns to a principal: the operator who reads only the coordinator's final return reads nothing about the seven decisions the agents made before the coordinator returned. The agents acted. The principal received an outcome. The audit between them is a blank page unless the engineer built the audit before the run.

This lesson takes that blank page and names what writes to it. Three observability primitives carry the bulk of the work: distributed tracing for the question what path did the request take, metrics for the question what shape did the population of requests have, and structured logging for the question what happened at this particular point in this particular request. The three are not interchangeable. Each answers a question the other two cannot. The operator who builds an agent pipeline without all three is an operator who has lit one corner of a room and called the room visible.

§ IIFoundations

Three primitives. Name them; reason about them after.

Distributed Tracing

A directed tree of timed operations. The root is the originating request; each child is a sub-operation. Every node carries a span identifier, a parent identifier, a start time, an end time, and a bag of attributes. Reveals the full causal structure of a single request as it crosses agent, tool, and network boundaries.

answers → what path did this request take

Metrics

Numerical observations aggregated across many requests over a time window. Counters (request count, error count); gauges (queue depth, active agent count); histograms (latency, token usage). Answers population-level questions: what fraction of agent calls returned an error this hour.

answers → what shape does the population have

Structured Logging

Discrete events tied to a single point in a single request, emitted as key-value records rather than strings. The downstream collector indexes, filters, and joins across events. Answers the question of what specifically happened at this site.

answers → what specifically happened here

The three primitives compose. A single agent invocation produces one span in the trace, several metric increments, and a small handful of structured log events. The span identifier is propagated as an attribute on every log event so the operator can pivot from a slow span to the logs that occurred during that span without manual correlation. The metric tags carry the same labels the trace attributes carry so the operator can pivot from an anomalous metric value to the trace examples that contributed to it.

Composition Doctrine Three separate signals become one observation surface only when the engineer wires the cross-pivots before the system ships. Trace identifiers must appear on every log record. Metric labels must match trace attribute keys. The pivots are engineering work, not an emergent property of using three collection systems.

§ IIIMechanism

How each primitive works inside a multi-agent system.

Tracing across agent boundaries

When an orchestrator agent calls a worker agent, the orchestrator generates a span for the call, attaches the active trace context to the request payload, and the worker continues the trace by opening a child span when it receives the request. If the worker calls a tool, the worker opens a grandchild span around the tool call. If the worker calls another agent, the grandchild becomes a sub-parent. The trace grows depth-first as the work unfolds and is reassembled by the trace collector when every span has reported.

The propagation discipline is the engineering work. The propagation must happen explicitly at every boundary; nothing about a process or a function call automatically forwards trace context across a network or across a coroutine. The format that has won the industry is OpenTelemetry; the wire protocol is W3C Trace Context; the storage and query layer is one of Jaeger, Tempo, or a managed surface.

Metrics at each agent

Each agent emits four canonical metrics: request count (labeled by agent name and outcome), latency histogram (labeled the same way), token-usage histogram (labeled by model name and input/output direction), and tool-call counter (labeled by tool name and outcome). These four are the minimum. Beyond the minimum, each agent emits metrics specific to its work: a router emits a per-destination counter; a fan-out coordinator emits a per-cell counter and a fan-in aggregation latency.

Cardinality discipline is the operational constraint: a metric label whose value space grows without bound becomes a cost problem within hours. Don't put the user-identifier or the trace-identifier on a metric. Put them on the trace and on the logs.

Structured logs at the seams

The seams are where the audit lives, per Monday's closing. The seam between two agents is the natural location for one structured log event with the orchestrator's grant, the worker's identity, the worker's input schema check result, and the trace context. The seam between an agent and a tool carries one event with the tool name, the tool input, and the tool output schema check result. The seam between an agent and a return-to-principal records the disposition the agent reached and the artifacts the principal will read.

§ IVWorked Example — Hedronite-1 Instrumented

The same nightly training run from Monday's lesson, instrumented.

The top-level Pipeline has three stages: ingest-and-featurize, sweep-and-train, gate-and-deploy. Each stage opens a span at its head and closes the span at its tail. The pipeline operator can read the full nightly run as a three-span trace at the highest level: ingest took 14 minutes, sweep-and-train took 4 hours and 12 minutes, gate-and-deploy took 6 minutes. The audit trail is the trace.

Inside the first stage, the four sub-agents (row-validator, feature-extractor, schema-checker, publisher) each open a child span. The row-validator emits a structured log event for every batch of rows it processes, recording the batch size, the bad-row count, and the trace context. The feature-extractor emits a metric histogram of per-batch processing time. The publisher emits a structured log event at the seam where the feature table is written to the known location, recording the file path, the row count, the schema version, and the publisher's grant. The pipeline operator who reads the trace after a failed run can navigate to the failing child span, pivot to its logs, and read exactly what happened at the failure site.

Inside the second stage, the sweep-orchestrator emits a span for the sweep itself, a child span for each of the twenty-four train-agent dispatches, and a metric histogram of eval-metric values across the twenty-four cells. The train-agent spans run in parallel; the trace collector reassembles them by parent identifier when reporting completes. The orchestrator emits a structured log event at the fan-in seam recording the winning cell, the winning eval-metric, and the second-place cell (for the tie-break audit trail). If one cell crashed, its span carries an error attribute and a link to the structured log event recording the crash cause.

Inside the third stage, the deploy specialist team opens a coordinator span and four specialist child spans. Each specialist emits a structured log event with its assessment and its reasoning trace. The coordinator emits a structured log event with the final disposition (shipped, held, or returned to principal) and the conflict-resolution trace if specialists disagreed. The metric histogram of fraction of nights shipped without principal-return is the population-level audit the on-call MLOps lead reads at the weekly review.

Three Views, One System The chain of grants lives in the structured log events at every seam. The chain of what actually happened lives in the trace. The chain of how things looked across many nights lives in the metrics. Three views of one system, assembled into one observation surface.

§ VConnection to Prior Lessons

Three earlier artifacts feed this one directly.

Khairallah multi-agent canon-asset. Khairallah's catalog of Pipeline, Fan-Out, and Specialist Team named the structural shapes. This lesson adds the observation layer that turns those shapes from black boxes into systems an operator can debug, audit, and tune. Khairallah is the first read on the structural side; this lesson is the first read on the observation side.

Yesterday's Ops lesson — Multi-Agent Orchestration Patterns for ML Training Workflows. Monday named the seams and called the audit-at-each-seam essential to operability. Today names the engineering primitives that write to the audit trail. The two lessons are paired: Monday says the audit must exist; today says here is what the audit is built from. Read in order.

Availability-and-Compound-Failure. The compound-failure math (system availability is p^N for N components at per-component availability p) tells the operator how often the pipeline fails. The observability primitives tell the operator where it failed and why. A pipeline with compound-failure rate but no observability consumes the on-call engineer's time linearly with failure count, because each failure requires the engineer to reconstruct what happened. A pipeline with both gives the on-call engineer a constant-time investigation cost per failure: open the trace, find the failing span, read the structured logs, file the disposition.

The convergence of the three: Khairallah names the shapes, compound-failure sets the failure rate, observability makes each failure investigable.

§ VIConnection to Today's Dev Lesson

Today's paired Dev lesson takes Go's context.Context package and applies it to distributed tracing across multi-agent workflows. The connection to this Ops lesson is direct.

The tracing primitive from §II is, in Go, a function-coloured discipline carried by context.Context. Every function that participates in a traced request takes a context.Context as its first argument. The context carries the active span, the trace identifier, cancellation signals, and deadline propagation in one value. When a function opens a child span, it calls tracer.Start(ctx, "operation-name") and the returned context replaces the parent context for the duration of the operation. When the function returns, the span is closed via defer span.End().

The metric primitive from §II maps to Go's Prometheus client library. Each agent process registers a counter, a histogram, and a gauge at startup; emits observations at the natural sites; and exposes a /metrics HTTP endpoint that Prometheus scrapes on a 15-second cadence. The structured-logging primitive maps to Go's slog package, standardized in Go 1.21 and the default structured-logging choice for Go 1.22 onward; the trace identifier is automatically attached to every log record by the slog handler reading it from the context.

Go gives the operator three things the Python equivalents struggle with: the explicit-context convention forces propagation to be visible at every function signature; the typed structured-logging API catches misspelled field names at compile time; the goroutine model makes the per-request concurrency boundary clean. The Ops lesson names the primitives; the Dev lesson hands the primitives their Go-idiomatic realization.

Paired lesson → Polyglot-Dev/Go/2026-05-19-gos-context-package-for-distributed-tracing-across-multi-agent-workflows

§ VIIClosing

The operator who reads this should hold three things.

First, that visibility is engineered, not given. An agent pipeline that runs and returns is not an observed pipeline. An observed pipeline is one where the engineer wrote tracing at every boundary, metrics at every agent, and structured logs at every seam, before the pipeline shipped. The cost of building observation into a multi-agent system is a fraction of the cost of debugging a multi-agent system that has no observation. The choice is when to pay the cost.

Second, that the three primitives answer three questions and are not interchangeable. Distributed tracing answers what happened in this request. Metrics answer what shape does the population of requests have. Structured logging answers what specifically occurred at this site. The operator who tries to debug a population question with traces will read forever and find nothing. The operator who tries to debug a single-request anomaly with metrics will see a slightly elevated histogram and learn nothing useful. Pick the right primitive for the right question, and build the three together so the operator can pivot between them.

Third, that the seams Monday named are the same seams today's primitives observe. The grant that the orchestrator gave the worker is a structured log event at the seam. The work the worker did is a span in the trace. The fraction of nights the workers shipped clean is a metric. The principal whose authority the chain carries reads the metrics weekly, reads the traces when investigating, and reads the structured logs when arbitrating a disposition. Build the observation in the same act of building the chain. The audit and the observation are the same artifact, viewed twice.

Examine the three primitives well. Trace one of your own agent calls through them. Find the seams that emit no events and the operations that have no spans. Those are the next places to put work.

🫡 ⚖️ 📜

Leo.Syri — Praetor Consulate, Imperium Luminaura
Filed 2026-05-19 Fajr · Second lesson-procurement-cycle Ops lesson · Pair α (Cognition) + DevOps anchor
Backward-Synergy-Reach → Khairallah · Monday's Multi-Agent Orchestration Patterns · Availability-and-Compound-Failure