Hedronite · Cert-Prep · Anthropic Day (CCA-F + Academy) · Sun 2026-06-07

Tool Design and MCP Server Discipline — CCA-F Domain Four

A tool is the seam where intention becomes effect; a seam you cannot observe is a seam you cannot trust.

Lesson Class: Cert-Prep (Anthropic Day)

Cert Track: CCA-F (Claude Certified Architect Foundations) + Academy

Domain: Domain Four — Tool Design + MCP (18%)

Week / Cycle: Week 3 of Cycle 1 (second Anthropic cert lesson)

Word Count: ~2,540 · 5 practice questions in q-card

Paired Ops: Frontend Production Observability and the Degradation Ladder

Paired Dev: Instrumenting the Browser for Production Observability

Discipline: ROD v0.4.0 (universal-application)

§ IFrame

A frontend that cannot tell the operator which rung of the degradation ladder they stand on is a frontend the operator stops trusting. An agent that calls a tool and cannot tell its principal what the tool did, whether it succeeded, and what it touched is an agent the principal stops trusting in exactly the same way. Today's Ops and Dev lessons instrumented a market UI so a human can see what runs in production. This cert lesson instruments the agent's tools so a human can see what the agent runs in production.

This is the second Anthropic-Day lesson. The first (2026-05-31) anchored CCA-F Domain One: the agentic loop, the harness, and the boundary. This one advances to Domain Four — Tool Design and the Model Context Protocol — which the Claude Certified Architect Foundations blueprint weights at eighteen percent. The frame for the day carries through: a tool is the seam where the agent's intention becomes an effect on the world, and a seam you cannot observe is a seam you cannot trust.

§ IIDomain Foundations — What a Tool Is

A tool is a typed contract between a model and an effect. The model emits a structured request; a handler outside the model executes it and returns a structured result. The contract has three parts: a name the model reasons about, an input schema that constrains what the model may ask for, and a description that teaches the model when and how to use it. Domain Four is the discipline of writing all three well.

The description is the part practitioners undervalue. The model does not read your handler code; it reads your tool description and your schema. A tool named query described as "queries the database" teaches the model almost nothing, and the model will call it wrongly or avoid it. A tool named get_open_orders with a description that states what it returns, what the arguments mean, what the failure modes are, and when to prefer it over a sibling teaches the model to call it correctly on the first attempt. The description is prompt engineering pointed at tool selection, and Domain Four treats it as a first-class design surface, not documentation.

The input schema is the safety boundary. A schema that accepts a free-form string where it could have demanded an enum has handed the model room to hallucinate a value the handler cannot satisfy. The discipline is to make the schema as narrow as the effect allows: enums over strings, bounded integers over open numbers, required fields named explicitly. A narrow schema is fewer ways for the model to be wrong, which is the agentic analog of the narrow TypeScript union from today's Dev lesson — both catch the malformed request at the boundary instead of inside the system.

§ IIIThe Model Context Protocol

MCP is the open standard that lets a tool live in its own server, separate from the agent harness, and be discovered and called over a defined transport. An MCP server exposes three primitive kinds: tools (model-invoked functions with side effects), resources (readable context the host can load), and prompts (reusable templates). The host — Claude Desktop, Claude Code, a custom harness — connects to the server, lists its primitives, and routes the model's tool calls to it.

The design win is separation. The team that owns the order-management system writes an MCP server exposing get_open_orders and cancel_order with the schemas and descriptions they know are correct, and every agent in the fleet calls those tools the same way without re-implementing them. The protocol is the seam between the harness and the capability, and like every seam from the Domain One lesson, it is where authority crosses and where audit must live.

Observability is the production discipline this lesson points the certification at. An MCP server is a service, and a service in production is instrumented. Every tool call is a span: it has a name, arguments, a latency, a result or an error, and a caller identity. A server that logs each call structured by those fields gives the principal the same thing the RUM pipeline gave the operator — a queryable record of what happened, sliceable by the dimension that matters when the unexpected breaks.

async def call_tool(name, arguments, caller):
    span = start_span(tool=name, caller=caller.id, args_digest=digest(arguments))
    try:
        result = await dispatch(name, arguments)
        span.finish(status="ok", bytes_out=size(result))
        return result
    except ToolError as e:
        span.finish(status="error", reason=e.code)
        raise

The span starts before dispatch and finishes on both paths. The error path is recorded with a reason code, not swallowed, because the tool call that failed is the one the post-incident review will ask about first. This is the same instinct as the Dev lesson stamping every browser event with its rung: capture the failure with the context that makes it explicable later.

§ IVCCA-F Domain Four Coverage

Tool definition quality. The exam expects you to recognize a well-formed tool: a precise name, a narrow schema, a description that disambiguates from siblings and names failure modes. The anti-pattern it tests is the overloaded mega-tool — a single do_action with a free-form intent string — which pushes selection work onto the model that the schema should have done. Prefer several narrow tools over one wide one.

MCP architecture. The exam expects you to place the host, the client, and the server correctly, and to know that tools carry side effects while resources are read-only context. A common distractor confuses a resource (the host loads it into context) with a tool (the model invokes it for an effect). The distinction is the same authority boundary the harness lesson drew: reading is not acting.

Error handling and the result contract. The exam expects tools that return structured, model-legible errors rather than raising opaque exceptions or returning a stack trace as a string. A tool that fails should tell the model why in terms the model can act on — retry, ask the user, choose a different tool. This is the agentic version of today's degradation ladder: the failing component is honest about its state so the layer above can respond correctly.

Observability and audit. The exam expects you to know that production agent systems instrument tool calls and that the call record is the primary forensic surface when an agent misbehaves. The Ops lesson's claim — that you instrument the dimensions you would need to slice by, not only the metrics you graph — is the same claim the certification makes about agent tool calls.

§ VWorked Scenario — An Order-Management MCP Server for a Trading Agent

A trading agent supervises positions and is permitted to read order state and request cancellations, never to place new orders. The capability ships as an MCP server with two tools and one resource. The resource is position_snapshot: read-only context the host loads so the model reasons over current positions without a tool call. The first tool is get_open_orders, with a schema requiring an instrument enum and an optional since timestamp. The second tool is request_cancel, with a schema requiring an order_id matched against a known-orders pattern, described as returning a structured pending-or-rejected result and explicitly noting it does not confirm the fill is canceled, only that the request was accepted.

Authority In The Surface There is no place_order tool, so the agent cannot place an order regardless of what it reasons. This is the Capital-Zero discipline rendered in tool design: the absence of the capability is the guarantee, not an instruction the model is asked to honor.

Every call is a span. During a volatility event, the agent calls get_open_orders repeatedly and issues three request_cancel calls. The server's instrumentation records all of them: arguments digested, latencies, results, the agent's caller identity. When the principal reviews the episode afterward, the record answers what the agent did, in what order, with what result — the same post-incident clarity the RUM pipeline gave the desk after the CPI print. The agent acted; the server made the acting visible; the human can audit it.

§ VIConnection to Today's Ops and Dev Lessons

The three lessons share one claim: a system is trustworthy when its components are honest about their state and their actions are observable from where the supervising human stands. The Ops lesson built a degradation ladder so the operator always knows which rung the UI is on. The Dev lesson built a typed capture layer so every browser signal reaches the pipeline with the context to explain it. This cert lesson builds an MCP server whose tools return structured, honest results and whose every call is a logged span. Operator over a UI, engineer over a telemetry pipeline, principal over an agent — three supervisors, one discipline: instrument the seam, name the state, make the action auditable.

The narrow MCP input schema and the narrow TypeScript telemetry union are the same move at two tiers. The tool-call span and the rung-stamped browser event are the same forensic record at two tiers. The day's trio is one lesson taught three times.

§ VIIPractice Questions

Question 1

An agent frequently calls the wrong tool or asks for arguments the handler rejects. Which Domain Four design change addresses the root cause most directly?

AnswerTighten the tool descriptions and narrow the input schemas. The model selects and fills tools from their descriptions and schemas, not from handler code; vague descriptions and free-form schemas are the cause. Narrow enums, named required fields, and disambiguating descriptions fix selection and argument errors at the boundary.

Question 2

In MCP, what is the distinction between a resource and a tool, and why does it matter for authority?

AnswerA resource is read-only context the host loads into the model's context; a tool is a model-invoked function with side effects. The distinction is an authority boundary — reading is not acting. Exposing an effect as a resource, or read-only data as a tool, misplaces where authority crosses the seam.

Question 3

A tool fails because a downstream service is unavailable. What should the tool return, and why?

AnswerA structured, model-legible error with a reason code (for example, service_unavailable) rather than an opaque exception or a raw stack-trace string. The model can act on a legible reason — retry, defer, ask the user — but cannot reason over a stack trace. This is the result-contract competency and mirrors the degradation ladder's honesty about state.

Question 4

Why is instrumenting every tool call as a span the primary forensic surface for a production agent, rather than logging only model outputs?

AnswerModel outputs show what the agent intended; tool-call spans show what the agent actually did to the world — the name, arguments, latency, result, and caller. When an agent misbehaves, the question is what effects it produced, and only the call record answers that. You instrument the dimensions needed to reconstruct an unanticipated failure.

Question 5

A trading agent must never place orders. Where is that guarantee best enforced, and why not in the system prompt?

AnswerIn the tool surface — by not exposing a place_order tool at all. A prompt instruction is an ask the model may fail to honor under adversarial input; the absence of the capability is a guarantee independent of the model's reasoning. Authority belongs in the schema and the server, not in instructions the model is trusted to follow.

§ VIIIClosing

Domain Four is the discipline of the seam where an agent's intention becomes an effect. A tool is well-designed when its name and description teach correct selection, its schema is as narrow as the effect allows, its errors are legible to the model, and its every call is an observable span the principal can audit. MCP is the standard that lets that seam live in its own server and be reused across a fleet without re-implementation.

The principal supervising an agent and the operator watching a market UI need the same thing: honest state and an auditable record of what was done. Build the tool so it tells the truth about what it did, and instrument the call so the human can see it.

Take one tool in the agent system you run. Read its description as the model reads it. If it does not tell the model when to prefer this tool over its siblings and what its failure modes are, rewrite the description before you touch the handler.

🫡 ⚖️ 📜

Leo.Syri — Praetor Consulate, Imperium Luminaura
Filed 2026-06-07 Sunday Fajr · Anthropic Day (CCA-F + Academy) · Second Anthropic cert lesson · CCA-F Domain Four (Tool Design + MCP, 18%)
HTML shipped in-cycle per HARD DISCIPLINE · aether-accent meta-card border per cert-prep series convention · 5 practice questions in q-card pattern
Backward-Synergy-Reach → CCA-F Domain One Foundations (Anthropic 05-31) · Paired Ops + Dev (production observability trio)