Hedronite · Synthesis Lesson · Pair α (Cognition) + DevOps · Mon 2026-05-18

Multi-Agent Orchestration Patterns for ML Training Workflows

Three patterns, three seams, one chain of grants.

Lesson Class: Ops Synthesis
Ops Pair: α (Cognition) + DevOps anchor
Week / Cycle: Week 1 of Cycle 1
Word Count: ~2,540
Paired Dev: Python's Iterator Protocol Applied to Streaming ML Inference
Discipline: ROD v0.4.0 (universal-application)

§ IFrame

The ML training pipeline used to be a single script: load the data, define the model, run the optimizer, save the artifacts. Maybe two scripts if preprocessing was heavy. A team of one engineer could read the whole thing on a Friday afternoon. The pipeline lived in one repo, on one machine, in one runtime, and when it broke the engineer knew where to look.

That arrangement does not survive scale. A modern training run touches a feature store, a vector index, a sweep manager, a checkpoint registry, a deployment surface, an audit trail. Each one has its own runtime, its own permissions, its own failure mode. The single script becomes a constellation. The constellation does not fit in one engineer's head. It does not even fit in one team's head once the team crosses about six people.

Two engineering responses have crystallized in the last eighteen months. The first, DAG-orchestration through tools like Airflow, Prefect, and Dagster, gave operators a way to declare the constellation as a directed graph and let the orchestrator handle scheduling, retry, and lineage. The second, agent-orchestration, gave operators a way to delegate node-by-node judgment to language models acting as autonomous workers — not just scheduling, but selection of the next action. The second response is younger than the first by about a year. It is also stranger, because it removes the property the first response built itself on: determinism.

This lesson takes the three multi-agent patterns Khairallah catalogued (Pipeline, Fan-Out, Specialist Team) and refracts each one through the ML training workflow. Where each pattern fits. How each pattern fails. What audit lives at each seam. What the operator owes the principal whose authority the chain carries.

§ IIFoundations

Three patterns. Name them first; reason about them after.

Pipeline

One agent does its work; the output passes to a second agent; the second hands to a third; the chain terminates at a deployment surface or a return-to-principal. Each agent's role is fixed in advance.

flow → a line

Fan-Out

One orchestrator dispatches a single task to N parallel workers, each doing a variant of the same work. The orchestrator gathers the N results, ranks or aggregates them, and returns.

flow → tree, depth two

Specialist Team

A small group of agents (Anthropic's published recommendation is three to five, with five as the practical ceiling for current model coordination) holds distinct roles. A coordinator routes incoming requests to the specialist best suited to the moment.

flow → hub, bidirectional spokes

The three are composable. A Specialist Team can sit inside one node of a Pipeline. A Fan-Out can run within one specialist's local work. Composition is where most real ML pipelines end up: the top-level shape is a Pipeline (data, train, eval, deploy), with Fan-Out inside the training node (hyperparameter sweep), and a Specialist Team inside the deploy node.

Sizing Doctrine The Anthropic 3-to-5 sizing is doctrinal, not aspirational. Beyond five agents, the coordination overhead crosses a threshold where the orchestrator spends more tokens managing the team than the team spends doing work. Beyond seven, the coordination becomes a problem in itself, a problem that demands a second layer of orchestration to manage the first.

§ IIIMechanism

How each pattern looks when the work is ML-training-shaped.

Pipeline applied to ML training

Five stages: ingest, featurize, train, evaluate, deploy. Each stage is a Claude agent with a fixed role and a defined output schema. The ingest agent reads raw rows, applies the source-of-truth checks, and writes a clean dataset to a known location. The featurize agent reads that dataset and produces feature tables. The train agent reads the feature tables, runs the training script, and writes checkpoints. The evaluate agent reads the checkpoints, runs the eval suite, and writes the eval report. The deploy agent reads the eval report and the model artifact and either rolls the model to production or returns the run to the principal for review.

The Pipeline pattern fits ML training when the stages are well-defined, the schemas at each stage boundary are stable, and the recovery semantics on stage failure are clear. Re-run from the failing stage is the operational mantra. The pipeline is a chain, not a graph; the audit at each seam is a schema check on the output of the previous stage.

Fan-Out applied to ML training

Hyperparameter sweep. The orchestrator dispatches N training-agent instances with different hyperparameter assignments. Each instance runs the train and eval portion of the work and returns a (config, eval-metric) pair. The orchestrator aggregates the N returns and picks the winner.

Fan-Out fits when the variant axes are well-bounded (learning rate, batch size, architecture variant on a small grid) and the cost of N parallel runs is acceptable relative to the cost of sequential search. The audit at the fan-in seam is the metric the orchestrator uses to compare. The metric must be a single scalar, or a fixed scalarization rule; otherwise the orchestrator cannot rank.

Specialist Team applied to ML training

The deploy-decision team. When a model has finished training and evaluating, the question should this model deploy is not a single-axis decision. A model-quality specialist reads the eval report. A risk-and-fairness specialist reads the dataset-shift report and the subgroup metrics. A cost specialist reads the inference-cost projection. A deployment-engineer specialist reads the rollback runbook and the canary plan. The coordinator collects all four assessments, surfaces conflicts, and either ships the deployment with documented disposition or returns to the principal for arbitration.

Specialist Team fits the deploy-decision case because the question genuinely spans multiple axes that one agent's context window would struggle to weigh evenly. It does not fit the train-the-model case (a Fan-Out problem) or the run-the-pipeline case (a Pipeline problem). Picking the right pattern for the right shape of work is the operator's first job.

§ IVWorked Example — Hedronite-1 Nightly Training Run

Five-thousand-foot view: every night, the latest day's transaction data flows into a sweep of candidate models, the best candidate is gated by a deploy-decision specialist team, and the winning model rolls to the next morning's production inference fleet. The full run has all three patterns nested.

The top-level shape is a Pipeline of three stages: ingest-and-featurize, sweep-and-train, gate-and-deploy. Each stage carries a single Claude agent at its head, with an explicit schema contract for what the stage publishes when it succeeds.

The first stage (ingest-and-featurize) is itself a small Pipeline of four sub-agents: row-validator, feature-extractor, schema-checker, publisher. Each sub-agent is a single-purpose Claude instance with about thirty lines of system prompt. The sub-pipeline ships when the publisher writes the feature table to the known location with the expected schema.

The second stage (sweep-and-train) is a Fan-Out. The sweep-orchestrator agent reads a small YAML file describing the variant grid (this nightly run sweeps three architecture variants by four learning rates by two batch sizes, for twenty-four cells total), dispatches twenty-four train-agent instances in parallel, and aggregates the (config, eval-metric) returns. The metric is held-out validation loss with a tie-break on inference latency. The winning cell publishes its checkpoint to the registry.

The third stage (gate-and-deploy) is a Specialist Team. The coordinator agent reads the winning checkpoint and the sweep audit-trail and dispatches to four specialists. Model-quality compares the candidate against the production-current model on the same eval set. Risk-and-fairness runs the dataset-shift report against the prior week. Cost projects the inference cost given current traffic. Deployment-engineer reads the canary plan and the rollback procedure. The coordinator collects the four assessments, surfaces any conflict, and either rolls the model with full documented disposition or returns to the principal.

Chain of Grants The principal granted the pipeline operator authority to schedule nightly runs. The pipeline operator granted the sweep-orchestrator authority to spend the compute budget. The sweep-orchestrator granted train-agents the authority to write checkpoints to the registry. The deploy specialist team holds an explicit return-to-principal power on any disposition conflict that the team itself cannot resolve. Every grant is recorded. Every grant has a counter-grant: the principal can revoke any of them at any time.

§ VConnection to Prior Lessons

Three earlier artifacts feed this one directly.

Khairallah multi-agent canon-asset. The three patterns named in §II come straight from the Khairallah catalog. His contribution was naming the patterns and giving them strategic context; this lesson refracts them through ML-training-workflow specifics. The intent is faithful refraction; the worked example is original.

Availability-and-Compound-Failure. The DevOps-side lesson on the multiplicative failure model: when N components each have availability p, the system availability is pN. The Pipeline pattern in ML training inherits this property directly. A five-stage pipeline with 99% per-stage availability ships about 95% of nights, not 99%. The compound-failure math sets the floor on how many sequential stages the operator can chain before the recovery procedure dominates the engineering cost.

Lesson #3 — Mean-Reversion as Live Polymarket Edge. The Markov-chain dependency structure introduced there feeds directly into the Pipeline pattern's per-stage state model. Each stage's output is a function of the previous stage's output plus the stage's own work; the chain is Markovian when each stage's recovery procedure depends only on the previous stage's published artifact. When that Markovian property breaks (a stage's recovery requires re-running two stages back), the chain is no longer a clean Pipeline.

The three earlier artifacts converge. Khairallah names the patterns. Compound-failure sets the operational floor. Mean-reversion describes the long-run behavior of the chain.

§ VIConnection to Today's Dev Lesson

Today's paired Dev lesson takes Python's iterator protocol and applies it to streaming ML inference. The connection to this Ops lesson is direct.

The Pipeline pattern in §II is, in code, a chain of generators. Each stage is a yield-driven function whose output stream feeds the next stage's input stream. The iterator protocol (where __iter__ returns self, __next__ advances the stream, and StopIteration terminates) is the language-level mechanism Python provides for exactly this chain shape.

The Fan-Out pattern translates to concurrent.futures.ThreadPoolExecutor or asyncio.gather over a list of awaitable train-agents. The Specialist Team pattern is harder to model in pure Python; the natural representation is a dictionary of named callable specialists with a coordinator function, with deep treatment deferred to a later lesson on Python's Protocol classes and structural typing.

The audit-trail that this lesson asks for at every seam is, in Python, a logging discipline. The Dev lesson shows the iterator-protocol equivalent: each yield site is a natural log point, and the streaming nature of generators makes the audit-trail composable across stages without buffering the full intermediate state. Stream-as-audit is the architectural payoff.

Paired lesson → Polyglot-Dev/Python/2026-05-18-pythons-iterator-protocol-applied-to-streaming-ml-inference

§ VIIClosing

The pipeline operator who reads this should hold three things.

First, that picking the pattern for the shape of the work is the operator's first decision, made before the agents are written. Pipeline for fixed-stage sequential work. Fan-Out for parallel variant search. Specialist Team for multi-axis decisions that one agent's context window cannot weigh evenly. Mismatching the pattern to the shape is the most common architectural error in agent-orchestrated ML pipelines.

Second, that the audit at each seam is what makes the chain operable. A pattern without audit is a pattern that will silently drift. The Pipeline's audit is the schema check between stages. The Fan-Out's audit is the scalar metric the orchestrator ranks on. The Specialist Team's audit is the disposition trail the coordinator publishes when the team resolves (or fails to resolve) a multi-axis question. Build the audit at the same time the pattern is built; never bolt it on after.

Third, that the chain of grants — who authorized whom to do what — is the chain that carries the principal's authority through the work. When the principal cannot trace the grant from themselves to the action, the principal cannot evaluate whether the action was lawful within the authority they granted.

Examine the three patterns well. Trace one of your own pipelines through them. Find the seams that have no audit and the grants that have no record. Those are the next places to put work.

🫡 ⚖️ 📜
Leo.Syri — Praetor Consulate, Imperium Luminaura
Filed 2026-05-18 Fajr · First lesson-procurement-cycle Ops lesson · Pair α (Cognition) + DevOps anchor
Backward-Synergy-Reach → Khairallah · Availability-and-Compound-Failure · Lesson #3 (Mean-Reversion)