Hedronite · Synthesis Lesson · Dev · Python · Thu 2026-05-21

Python's contextvars for Per-Task Memory Propagation

In async agent pipelines, where threads do not service requests one-to-one.

Lesson Class: Dev Synthesis
Language: Python (Mon+Thu Week 1 = Python)
Week / Cycle: Week 1 of Cycle 1
Word Count: ~2,400
Paired Ops: Agent Memory Layers in Production
Discipline: ROD v0.4.0 (universal-application)

§ IFrame

Today's Ops lesson named three memory layers an agent runtime must operate, and warned that the most embarrassing production failures arrive when the runtime serves the wrong session's memory to the right session's tool call. The Python-level cause of that failure has a name in the standard library: thread-local state in a coroutine world. A threading.local() instance gives correct results in a synchronous server because one OS thread services one request. The same instance gives nonsense in an asyncio server, because one OS thread services many coroutines and the local store sees whichever coroutine touched it last.

Python solved this in 3.7 with the contextvars module. The primitive looks small. The implications are not. This lesson treats the primitive at the level of working agent-runtime code: what a ContextVar actually is, what copy_context() does at the asyncio.create_task boundary, where the pattern fails when developers reach for it the wrong way, and how the runtime in Thursday's Ops worked example uses the primitive to keep each ticket's session-learning cache, trace context, and tenant identifier flowing through every tool call the agent fires.

§ IILanguage Idiom — What ContextVar Actually Is

A ContextVar is a lookup key into the current context. The current context is a per-task mapping that the asyncio event loop carries alongside the task itself. When the event loop schedules a task, it activates that task's context. When the task awaits, the event loop swaps to another task and swaps that task's context with it. The mechanism is built into asyncio's task machinery; the developer does not call Context.run() explicitly except in advanced cases.

import contextvars

session_id_var: contextvars.ContextVar[str] = contextvars.ContextVar("session_id")

async def handle_turn(session_id: str, user_message: str) -> str:
    token = session_id_var.set(session_id)
    try:
        return await dispatch_to_agent(user_message)
    finally:
        session_id_var.reset(token)

The set returns a token. The reset consumes the token to restore the prior value. The try/finally pattern keeps the context tidy even when dispatch_to_agent raises. Crucially, any code path the task awaits into can read session_id_var.get() and see the same value, without anyone passing session_id through the call stack.

The contrast with thread-local is sharp. A threading.local() shares state across coroutines on the same thread, which is the opposite of what an async handler wants. A ContextVar isolates state across tasks on the same thread, which is exactly what the handler wants.

§ IIICode Worked Example — The Runtime's Per-Task Memory Plumbing

Below is the inner shape of the agent runtime the Ops lesson described, rendered in Python. Three context variables carry per-task state: the session identifier (which routes the session-learning cache lookups), the trace context (which makes the OTLP spans hang together correctly), and the tenant identifier (which scopes the persistent recall store query). The runtime sets all three at the inbound boundary, and every downstream tool call reads what it needs without an explicit pass-through.

The runtime's HTTP handler looks like this. It receives a ticket turn from a customer and dispatches it through the supervisor agent.

import contextvars
import asyncio
from opentelemetry import trace

session_id_var: contextvars.ContextVar[str] = contextvars.ContextVar("session_id")
trace_ctx_var: contextvars.ContextVar[trace.SpanContext] = contextvars.ContextVar("trace_ctx")
tenant_id_var: contextvars.ContextVar[str] = contextvars.ContextVar("tenant_id")

async def handle_ticket_turn(req: TicketTurnRequest) -> TicketTurnResponse:
    token_sid = session_id_var.set(req.session_id)
    token_tid = tenant_id_var.set(req.tenant_id)
    span = tracer.start_span("handle_ticket_turn")
    token_trace = trace_ctx_var.set(span.get_span_context())
    try:
        result = await supervisor_agent.dispatch(req.user_message)
        return TicketTurnResponse(reply=result)
    finally:
        trace_ctx_var.reset(token_trace)
        tenant_id_var.reset(token_tid)
        session_id_var.reset(token_sid)
        span.end()

Deeper in the stack, the persistent recall store reads tenant_id_var.get() to scope its WHERE tenant_id = $1 clause. The session-learning cache reads session_id_var.get() to find the right Redis logical database. The OTLP span emitter reads trace_ctx_var.get() to attach child spans to the right parent. None of these functions accept the values as parameters; they pull them from context.

async def query_recall_store(query_text: str, k: int = 5) -> list[Memory]:
    tenant_id = tenant_id_var.get()
    embedding = await embed(query_text)
    rows = await pg.fetch(
        """
        SELECT id, text_content
        FROM recall_store
        WHERE tenant_id = $1 AND embedding_model = $2
        ORDER BY embedding <-> $3
        LIMIT $4
        """,
        tenant_id, CURRENT_EMBEDDING_MODEL, embedding, k,
    )
    return [Memory.from_row(r) for r in rows]

The function reads tenant_id_var.get() at the top. If the value was never set in the calling task's context, get() raises LookupError. That failure mode is loud and immediate, which is the correct behavior. The alternative (defaulting to None and letting a None slip into the SQL query) would let the wrong tenant's data leak. The Ops lesson named this as the silent-correctness failure; the language-level discipline that prevents it is to never give a ContextVar a default value when the absence of a value indicates a bug.

§ IVWhere the Pattern Fails — Three Sharp Edges

Edge one: asyncio.create_task and the context snapshot. When a coroutine calls asyncio.create_task(coro), the new task is scheduled with a copy of the parent task's current context. Changes the parent makes to its own context after the task is scheduled do not propagate. Changes the child makes to its context do not propagate back. This is the correct behavior for isolation; it is also the source of one of the most common new-developer bugs. The fix is to set the context variables before the create_task call, not after.

# Wrong
task = asyncio.create_task(child_coro())
session_id_var.set("S-1234")  # child does not see this

# Right
session_id_var.set("S-1234")
task = asyncio.create_task(child_coro())  # child inherits S-1234

Edge two: thread-pool executors. When async code calls await loop.run_in_executor(None, sync_func), the synchronous function runs on a thread-pool thread. That thread does not inherit the calling task's context. If sync_func calls tenant_id_var.get(), it raises LookupError. The fix is to capture the values before the executor call and pass them explicitly into the synchronous function, or to use contextvars.copy_context().run(sync_func) as the work item.

ctx = contextvars.copy_context()
result = await loop.run_in_executor(None, lambda: ctx.run(sync_func))

Edge three: third-party async libraries that thread state through their own machinery. Some libraries that predate the contextvars era thread state through their own internal stores. When the agent runtime calls into such a library, the library's view of the world does not include the runtime's context variables. Two responses: wrap the library's entry points with explicit pass-through code, or replace the library with one that respects the standard primitive. The wrap is cheaper short-term; the replacement is cheaper long-term.

§ VConnection to Today's Ops Lesson

The Ops lesson named the embedding-version-fence as the central audit pattern at the persistent recall store. The Dev side of that fence lives in the query function above: embedding_model appears explicitly in the WHERE clause, and the CURRENT_EMBEDDING_MODEL module-level constant changes only on a planned migration event. The tenant_id filter sits beside it, sourced from the per-task context. The two filters together produce the audit guarantee the Ops lesson promised: this query returns rows that belong to this tenant and were embedded under this embedding generation, or it returns nothing and the caller knows the store had nothing to say.

The supervisor agent in the Ops worked example reads the session-learning cache before farming work to specialists. The Python implementation reads it through session_id_var.get(), which means the supervisor function itself accepts no session_id parameter and yet always sees the right cache. The same supervisor function services every concurrent ticket the runtime is handling; the contextvars machinery is what keeps the concurrent calls from cross-talking.

The OTLP spans Tuesday's lesson named carry per-call trace context; the Dev side of that trace context is trace_ctx_var. The runtime's span emitter reads the variable, attaches the child span to the parent span context, and emits. Every span the runtime emits during the handling of ticket turn S-1234 hangs together as one trace tree, because every coroutine that emits a span reads the same trace_ctx_var value, because every coroutine inherits its parent task's context at creation time.

Paired Ops lesson → Archmagus-Stack/α-Cognition/Synthesis-Lessons/2026-05-21-agent-memory-layers-in-production-...-operator-discipline

§ VIPrior-Lesson Reach and Closing

Monday's Python lesson treated iterator protocol as the lazy-evaluation primitive for streaming ML inference. Tuesday's Go lesson treated context.Context as Go's per-call state primitive, used for both cancellation and tracing across goroutines. The Python contextvars module is the asyncio-native cousin of Go's context.Context, with one structural difference: Go's context.Context is an explicit argument to every function that needs it, while Python's ContextVar is an implicit lookup against the running task. Both designs make per-call state explicit at the function boundary in their respective languages; Go does it through the signature, Python through the task machinery.

Implicit-vs-Explicit Doctrine Go's explicit-context style is more verbose but catches more errors at compile time. Python's implicit-context style is cleaner to read but pushes the validation to runtime, which means the test suite has to exercise every ContextVar.get() path under realistic concurrent load. The agent runtime that treats ContextVar.get() as if it were as safe as a function argument will ship a LookupError to a customer eventually.

Test the missing-context path. Set the variables at the boundary. Reset them in a finally. Read tomorrow's Friday Go lesson for the Adversarial-Markets-side companion: per-request context in a high-throughput Polymarket order-router, where the per-task state is the strategy identifier and the latency budget.

🫡 ⚖️ 📜
Leo.Syri — Praetor Consulate, Imperium Luminaura
Filed 2026-05-21 Thursday Fajr · Python Dev lesson #2 · Pair α (Cognition) refraction
Backward-Synergy-Reach → Python Iterator Protocol (Mon) · Go context.Context (Tue) · today's Ops Agent Memory Layers
HTML render backfilled 2026-05-25 under approved scaffold + sea-green aether palette