Terraform as Multi-Vendor Observability and Guardrail Foundation
Week 3 reach-back synthesis of Vertex monitoring, AWS observability, Sigstore admission, and Bedrock guardrails.
§ IFrame
Week 2's Friday lesson tied four vendors together through the question of trust: who is allowed to do what, and how does the provisioning layer encode that grant. Week 3 asks the next question. Once the grant is made and the workload is running, two disciplines keep it honest: the operator must be able to see what the workload is doing, and the platform must refuse the actions the operator forbade. The first discipline is observability. The second is the guardrail. This week's four cert lessons are four instances of that pair, one per vendor, and Terraform is the single tool that declares all eight surfaces as code.
The theme reads the same on every cloud. Monday's Vertex lesson measured a model's skew and drift. Tuesday's AWS lesson measured a workload's metrics, traces, and cost. Wednesday's Kubernetes lesson refused an unsigned image at admission. Thursday's agentic lesson refused a forbidden tool call at the guardrail. Measure, measure, refuse, refuse. Today's lesson shows that all four are Terraform resources, and that declaring them as code rather than clicking them into a console is what makes the observe-and-enforce posture survive the day the operator who set it up goes on vacation.
§ IIWeek in Review
Monday's lesson studied Vertex AI model monitoring across the Google GenAI Engineer and Pro ML Engineer tracks. The core was the monitoring job: a Vertex resource that samples a deployed model's prediction inputs, compares the live feature distribution against a training baseline, and raises an alert when training-serving skew or prediction drift crosses a threshold. The GenAI flavor added generative-output evaluation against a rubric; the ML-Pro flavor added classical feature attribution. Both share one operational fact: the monitoring job is a configured resource with a schedule, a baseline reference, a threshold, and an alert sink. None of those four is model code. All four are configuration.
Tuesday's lesson studied AWS observability across the Solutions Architect Professional and DevOps Professional tracks. CloudWatch embedded-metric format let a workload emit structured metrics inside its logs; X-Ray service maps traced a request across services; cost anomaly detection watched the bill for spend that moved faster than its baseline. The unifying operator discipline the lesson named was measure-everything. Each of the three is a configured AWS resource: a metric filter and alarm, a tracing configuration and sampling rule, an anomaly monitor and subscription. Again, none is application code. All are configuration.
Wednesday's lesson studied admission-time image verification across CKA and CKS. The Sigstore policy-controller is a Kubernetes admission webhook that intercepts every pod create, checks each image reference against a signature policy, and refuses the pod if the signature is missing or untrusted. CKA owned the cluster operations — installing the controller, wiring the webhook, reading the audit log. CKS owned the policy — which registries are trusted, which identities may sign, what the controller does on a policy miss. The refusal is the point: an unsigned image never runs. The policy that decides what counts as signed is a Kubernetes custom resource, which is to say configuration.
Thursday's lesson studied pre-tool risk gates for agentic AI across AWS AIP-C01 and GitHub GH-600. Amazon Bedrock Guardrails sit between an agent and its tools, screening each intended action against a content and policy filter and returning a refuse-or-proceed verdict before the tool fires. GitHub Copilot policy did the same for coding agents, gating which repositories and actions an agent may touch. The verdict shape — refuse, or proceed, with the reason logged — is the same shape today's Ops lesson gives the runtime watch and the same shape Tuesday's pre-trade gate gave the trading strategy. A Bedrock guardrail is a configured resource: a set of filters, a set of denied topics, a logging destination. Configuration once more.
Four lessons, eight surfaces, one fact repeated eight times: the observe-and-enforce layer is configuration, not code. Configuration that lives in a console is configuration no one can review, no one can diff, and no one can recreate after an accident. Configuration that lives in Terraform is reviewed in a pull request, diffed against the prior state, and recreated by a single apply. That difference is the whole case for the Friday synthesis.
§ IIITerraform as the Connective Tissue
Terraform's model has three moves that make it the right tool for the observe-and-enforce layer across four vendors. The first is the provider abstraction. The second is state and drift detection. The third is the module as a unit of reusable policy. Each maps directly onto a discipline the week's four lessons named.
The provider abstraction is the reason one tool spans Google, AWS, and Kubernetes. The google provider declares the Vertex monitoring job. The aws provider declares the CloudWatch alarm, the X-Ray sampling rule, the cost anomaly monitor, and the Bedrock guardrail. The kubernetes provider declares the Sigstore policy custom resource. Each provider speaks its vendor's API, and the operator writes the same HCL shape against all three: a resource block with a type, a name, and a set of arguments. The Pro-level Terraform discipline the cert tests is that a single configuration can hold multiple providers and apply them in one dependency-ordered run, so the monitoring job and the alarm that watches it land together rather than in two consoles on two afternoons.
State and drift detection are the reason the observe-and-enforce posture survives time. Terraform records what it created in state, and terraform plan compares the declared configuration against the real world and reports every divergence. The discipline this enforces is the one Wednesday's admission lesson most needed: a policy-controller that someone disabled by hand during an incident and forgot to re-enable is exactly the kind of silent regression that admission control exists to prevent, and a nightly terraform plan catches the disabled controller as drift. The guardrail that protects production is only as trustworthy as the assurance that the guardrail is still configured the way it was reviewed. Drift detection is that assurance, expressed as a diff.
The module is the reason a reviewed policy travels. A Terraform module that declares "a Bedrock guardrail with our denied-topics list, our logging destination, and our refusal-action" is a reviewed artifact that every team instantiates rather than re-deriving. The same holds for a module that declares the standard Vertex monitoring job, or the standard CloudWatch alarm set, or the standard Sigstore policy. The week's eight surfaces become a small library of modules, each reviewed once and instantiated many times, and the Pro-level discipline is versioning those modules so a security fix to the guardrail module rolls out by a version bump rather than by eight hand-edits.
§ IVWorked Example — One Module That Provisions the Week
Consider a single Terraform configuration that stands up a slice of each of the week's four surfaces for one production service. The service is a model-backed agent: it serves a model, runs on Kubernetes, exposes an agent that calls tools, and costs money on AWS. Each of the week's four lessons protects one face of it.
The configuration opens by declaring its providers. The google provider is pinned to a project and region. The aws provider is pinned to an account and region. The kubernetes provider is pointed at the cluster. The Terraform block pins each provider to a version range and configures a remote backend so the state is shared, locked, and not sitting on one engineer's laptop. The version pinning and remote backend are both Pro-level exam material and both the difference between a reproducible apply and a surprise.
The Vertex face comes first. A google_vertex_ai_model_deployment_monitoring_job resource references the deployed model, sets a sampling rate, points at the training baseline, and sets skew and drift thresholds. Its alert configuration points at a notification channel. This is Monday's lesson as eight lines of HCL: the monitoring job that watches the model for the skew the lesson defined, declared once and recreatable by apply.
The AWS observability face comes next. A aws_cloudwatch_metric_alarm watches the service's error-rate metric, emitted in embedded-metric format by the service itself. An aws_xray_sampling_rule sets the trace sampling for the service. An aws_ce_anomaly_monitor and its subscription watch the service's cost. These three are Tuesday's lesson as three resource blocks: the measure-everything posture, declared rather than clicked.
The Kubernetes admission face comes third. The Sigstore policy-controller is installed by a Helm release the configuration declares, and a kubernetes_manifest resource declares the ClusterImagePolicy custom resource naming the trusted registry and the trusted signing identity. This is Wednesday's lesson: the admission gate that refuses an unsigned image, declared as the policy custom resource the controller reads. The dependency ordering matters and Terraform handles it — the policy resource depends on the Helm release, so the controller is installed before the policy that configures it is applied.
The Bedrock guardrail face comes last. An aws_bedrock_guardrail resource declares the denied-topics list, the content filters, and the logging destination, and the agent's invocation references the guardrail by its identifier. This is Thursday's lesson: the pre-tool gate that returns refuse-or-proceed, declared as the guardrail resource the agent must pass through.
The four faces apply in one terraform apply. The operator runs one command and the model gets a monitoring job, the service gets its alarms and traces and cost watch, the cluster gets its image-admission policy, and the agent gets its guardrail. More to the point, the operator runs terraform plan the next morning and sees, in one diff, whether any of the four drifted overnight. The disabled policy-controller, the deleted alarm, the loosened guardrail — each shows as a line in the plan, and the apply that follows restores the reviewed state. The week's four lessons protected four faces of the service. Terraform is the one place all four are declared, reviewed, and watched for drift together.
§ VPractice Scenarios
The first scenario tests the drift discipline. An on-call engineer disables the Sigstore policy-controller during a 3 a.m. incident to get a hotfix image deployed, then forgets to re-enable it. Three days pass before anyone notices unsigned images are admitting freely. The exam point: what control would have caught this within a day. The answer is a scheduled terraform plan whose non-empty output is itself an alert; the disabled controller shows as drift against the declared Helm release and policy resource, and the daily plan surfaces it the next morning rather than three days later.
The second scenario tests provider composition. A team wants the Vertex monitoring job and the CloudWatch alarm that pages on the monitoring job's alert to land together, so there is never a window where the model is monitored but no one is paged. The exam point: how Terraform guarantees the ordering. The answer is that the alarm resource references the monitoring job's notification channel through a resource attribute, which creates an implicit dependency, so Terraform's graph applies the channel and job before the alarm and tears them down in reverse.
The third scenario tests state hygiene. Two engineers run terraform apply against the same configuration from two laptops within a minute of each other, and the second apply corrupts the first's half-finished change. The exam point: what prevents this. The answer is a remote backend with state locking; the second apply blocks on the lock the first holds, and neither overwrites the other's state. The Pro exam expects the candidate to name the backend's lock as the mechanism, not merely to say "use a remote backend."
The fourth scenario tests the module-versioning discipline. A security review finds the Bedrock guardrail module's denied-topics list is missing a category, and the fix must reach eleven services that instantiate the module. The exam point: how the fix rolls out safely. The answer is to bump the module version, update the denied-topics list in the module source, and let each service's configuration adopt the new version through its pinned source reference, so the rollout is a reviewed version bump per service rather than eleven hand-edits with no audit trail.
The fifth scenario ties the week to today's Ops lesson. A trading service runs a runtime risk watch that flattens on a stale feed, and the operator wants the watch's own staleness alerts to flow into the same observability layer as everything else. The exam point: where the watch's alerts belong in the IaC picture. The answer is that the watch emits its runtime events as CloudWatch embedded metrics, and a Terraform-declared alarm watches the staleness-trip metric the same way Tuesday's lesson declared every other alarm — so the kill-switch the Ops lesson built is observable through the same provisioned surface as the model monitor and the cost watch. Observe-and-enforce is one posture, and Terraform declares all of its faces.
§ VIPractice Questions
terraform plan run on a schedule returns a non-empty diff showing a kubernetes_manifest ClusterImagePolicy resource will be re-created. What has most likely happened, and why is the scheduled plan the right detector?google, aws, and kubernetes providers, how does Terraform decide the order in which resources across different providers are created?depends_on adds an explicit edge where no attribute reference exists. The graph is provider-agnostic, so cross-provider ordering is handled the same as same-provider ordering.aws_cloudwatch_metric_alarm watches the staleness-trip metric and pages on it. Declaring the alarm in Terraform puts the kill-switch's own health under the same drift-detected, reviewed, versioned configuration as the model monitor, the cost watch, the admission policy, and the guardrail — so a deleted or muted kill-switch alarm shows as drift in the daily plan, the same as any other regression to the observe-and-enforce layer.§ VIIClosing
The week studied four vendors and found one discipline. Measure what the workload does; refuse what the operator forbade. Vertex measures the model, CloudWatch measures the service, Sigstore refuses the unsigned image, Bedrock refuses the forbidden tool call. Eight surfaces, one posture, and every one of them a piece of configuration rather than a piece of code. Terraform is where that configuration lives as a reviewed, versioned, drift-checked declaration across every cloud it touches.
A guardrail clicked into a console is a guardrail no one can prove is still there. A guardrail declared in Terraform is a guardrail whose absence is a line in tomorrow's plan. The Friday synthesis is the same every week: the tool that provisions the week is the tool that proves, each morning, that the week's protections are still standing. Provision the observe-and-enforce layer as code, plan it on a schedule, and read the diff. The diff is the week's protections reporting for duty.
Examine well. Reflect on this.