Hedronite · Cert-Prep · Google Day · Mon 2026-06-15 · GenAI Eng + Pro ML Engineer

Vertex AI Traffic Splitting, Version Aliasing, and the Safe Rollback Discipline

Across Professional ML Engineer and GenAI Engineer — rollback is a traffic-split edit, not a rebuild.

Lesson Class: Cert-Prep · Google Cloud

Cert Track: GenAI Engineer + Professional ML Engineer

Theme: Vertex Endpoints, Model Registry, traffic-split rollback

Word Count: ~2,540

Paired Ops: Rollback and Incident Response for Multi-Agent Cognition

Paired Dev: Rust's Enum-Driven Deployment State Machine (typed rollback)

Discipline: ROD v3 (universal-application)

§ IFrame

Today's Ops lesson named the reverse gate as a designed capability: a model that has gone bad in production must have a fast, rehearsed path back to the prior version. The lesson described the discipline in platform-neutral terms — pin the version, rehearse the reverse path, write the postmortem. This cert lesson grounds that discipline in the platform the Sovereign's two Google certs both sit on: Vertex AI.

Both the Professional ML Engineer and the GenAI Engineer exams test the same deployment surface, because Vertex AI serves classical models and generative endpoints through the same machinery. The Endpoint resource, the Model Registry, the traffic-split map: these are not ML-versus-GenAI distinctions. They are the shared platform both certs assume you can operate. Treat the surface once, then watch how each cert flavors it.

The claim that organizes the lesson is short. On Vertex AI, rollback is not a special operation. It is a traffic-split edit. The same map that ramps a new model up is the map that ramps a bad one down, and the operator who understands the map owns both directions.

§ IIDomain Foundations

Three Vertex AI resources carry the rollback story. Name them before flavoring them.

A Model in the Model Registry is a versioned artifact. Each time you upload or retrain, you get a new version under the same model resource — version 1, version 2, and so on — and old versions stay registered until you remove them. The registry is the thing that makes "the prior version" a nameable target. Practical MLOps places this registry at the center of the GCP deployment surface for exactly this reason: a platform that keeps versions can move between them.

An Endpoint is the serving resource clients call. An endpoint is not bound to one model version; it is bound to a traffic-split map. You deploy one or more model versions to an endpoint, and the endpoint holds a dictionary that says which deployed version gets which percentage of requests: {v13: 100} or {v13: 90, v14: 10}. Clients call the endpoint at a stable address and never learn which version answered. The indirection is the whole point.

Version aliasing is the label layer above raw version numbers. The registry lets you attach moving aliases like default or champion to a specific version. An alias is a pointer the operator controls, which sounds like the "latest" pointer the Ops lesson warned against, but it is the opposite: an alias only moves when the operator moves it deliberately, so it names intent rather than recency.

The compositionThe registry holds the versions, the endpoint's traffic-split map decides who serves, and aliases record which version is sanctioned. Rollback is editing the traffic-split map to send 100 percent back to the prior version, and re-pointing the champion alias to match.

§ IIIML Pro Flavor: traffic split as the controlled-rollout dial

The Professional ML Engineer exam tests the classical-model deployment lifecycle, and the traffic-split map is its central mechanism. A new model version is deployed to the endpoint alongside the current one and given a small slice, ten percent, while Vertex Model Monitoring watches prediction drift and the team watches business metrics. If the canary slice holds, the operator edits the map toward the new version in steps until it carries everything. This is the controlled rollout Practical MLOps describes, expressed in Vertex's specific resource.

The exam's rollback question is the same dial run backward. When the ten-percent canary regresses, the operator does not delete or rebuild anything. The prior version is still deployed to the same endpoint at ninety percent; rollback is editing the map back to {v13: 100}, which takes effect in seconds because v13 was never removed from the endpoint. The ML Pro discipline is to keep the prior version deployed through the entire ramp, not undeploy it the moment the new version reaches full traffic. An undeployed prior version is a rollback that needs a redeploy first, and a redeploy under incident pressure is the slow path the canary architecture exists to avoid.

The exam also probes the difference between traffic split and the model itself. A common distractor offers "retrain the model" as a rollback step. It is wrong: rollback does not touch model artifacts, it touches the traffic map. Retraining is the forward loop from last Monday's continuous-training lesson; rollback is this Monday's reverse gate, and conflating them is the error the exam is testing for.

§ IVGenAI Engineer Flavor: aliases, tuned models, and prompt-version rollback

The GenAI Engineer exam sits on the same endpoint and registry surface but adds two wrinkles the classical case does not carry: tuned generative models and the prompt-plus-model coupling.

A tuned Gemini model is a model version in the registry like any other, so endpoint traffic split and aliasing apply unchanged. The GenAI flavor is that what regresses is often not accuracy on a labeled set but behavior the labeled set never captured: a tuned model that starts refusing a category of valid requests, or one whose tone drifts after a tuning refresh. The rollback mechanism is identical, traffic-split back to the prior tuned version, but the trigger comes from generative evaluation rather than a classification metric. The 06-01 monitoring lesson named this: skew and drift detection for generative output is the sensor; this lesson's traffic-split edit is the actuator.

The second wrinkle is the prompt-model coupling, and it is where GenAI rollback gets subtle. A generative application is a prompt template plus a model version. A regression can come from either side. If a prompt change shipped alongside a model change, rolling back only the model version leaves the new prompt running against the old model, which may be a configuration neither was tested in. The GenAI Engineer discipline is to version the prompt template alongside the model and roll both back together. The alias here points not just at a model version but at a sanctioned prompt-model pair. The Ops lesson's warning about consumer caches has its GenAI form: the prompt is a consumer of the model's behavior, and rolling back the model without the prompt is the same mistake as rolling back the model without invalidating an agent's cache.

§ VWorked Example

A retail company runs a product-recommendation model on a Vertex endpoint, version 13, serving 100 percent of traffic with the prior version 12 kept deployed at 0 percent as a standby. The team retrains, registers version 14, deploys it to the same endpoint, and edits the traffic-split map to {v13: 90, v14: 10}. The champion alias still points at v13; v14 is the candidate.

Vertex Model Monitoring is configured on the endpoint with the prior accuracy as the floor. For two days the v14 canary slice holds. Then a seasonal catalog refresh shifts the input distribution, and the v14 slice's prediction quality slides below the monitoring floor across a sustained window while v13's ninety percent holds steady. The monitoring alert fires.

The operator's response is three edits, no rebuilds. First, the traffic-split map goes back to {v13: 100, v14: 0}; because v13 never left the endpoint, this takes effect in seconds and the canary regression stops reaching users. Second, the champion alias is confirmed on v13, recording that v13 is the sanctioned version and v14 is not. Third, v14 stays deployed at zero percent and registered, so its scores and its production regression are preserved for the postmortem rather than deleted in a panic.

The postmortem names the gap: the offline evaluation set predated the catalog refresh, so the monitoring floor was the only thing that could have caught the seasonal shift, and it did. The correction is to add a seasonal-distribution slice to the pre-deployment evaluation set so the next candidate's drift on a catalog refresh is caught before the ten-percent canary, not during it. One traffic-split edit closed the incident; one evaluation-set addition closed the gap.

§ VIConnection to Today's Ops and Dev Lessons

The Ops lesson named three capabilities, and Vertex implements each as a concrete resource. Model pinning is the registry version plus an alias that moves only on intent. The reverse gate is the traffic-split edit, fast because the prior version is kept deployed through the ramp. The regression postmortem is the preserved zero-percent candidate plus the evaluation-set correction.

The Rust Dev lesson encoded the deployment lifecycle as an enum where rollback can only fire from a Live state that carries a fallback. Vertex's traffic-split map is the runtime analog: the rollback is only fast because the prior version is already deployed to the endpoint, which is the platform's version of "the Live state carries its fallback." Where Rust makes the missing-fallback case unrepresentable at compile time, Vertex makes it merely slow at runtime: an undeployed prior version forces a redeploy. Both lessons teach the same operational truth: the way back has to be kept ready before the incident.

§ VIIPractice Questions

Question 1

On a Vertex endpoint serving v14 at 100 percent after a successful canary ramp, the team detects a regression. The prior v13 was undeployed when v14 reached full traffic. What is the fastest correct rollback, and what mistake made it slower than it should have been?

Answer: Redeploy v13 to the endpoint, then edit the traffic-split map to send 100 percent to v13. The mistake was undeploying v13 at full ramp; keeping the prior version deployed at 0 percent makes rollback a single traffic-split edit that takes effect in seconds, with no redeploy on the critical path.

Question 2

A GenAI application shipped a new prompt template and a newly tuned Gemini version together. After a regression, an engineer rolls back only the model version. Why is this insufficient?

Answer: The new prompt template is now running against the old model, a prompt-model pairing that was never tested. Prompt and model are coupled; a sanctioned alias should point at a prompt-model pair, and rollback must return both to the prior sanctioned pair together.

Question 3

An exam item lists rollback steps and includes "retrain the model on the previous dataset." Why is that option wrong?

Answer: Rollback does not touch model artifacts; it edits the endpoint's traffic-split map to route to an already-registered prior version. Retraining is the forward continuous-training loop, not the reverse gate. Conflating the two is the tested error.

Question 4

What is the difference between a Vertex model alias like champion and a "latest" pointer, and why does the safe-rollback discipline favor the alias?

Answer: A "latest" pointer tracks recency and silently follows the newest upload. An alias moves only when the operator moves it, so it records sanctioned intent rather than recency. Deployments pin to explicit versions; the alias documents which pinned version is sanctioned, giving rollback a clear, operator-controlled target.

Question 5

Which signal should trigger rollback of a tuned generative model, and how does it differ from the classical case?

Answer: Generative evaluation — refusal-rate, tone, and output-quality drift detected by Vertex Model Monitoring's generative skew/drift checks — rather than a single classification accuracy metric. Generative regressions show up as behavior the labeled set never captured; the traffic-split rollback mechanism itself is identical.

§ VIIIClosing

On Vertex AI the registry holds the versions, the endpoint's traffic-split map decides who serves, and aliases record which version is sanctioned. Rollback is not a special operation; it is the controlled-rollout dial run backward, fast only because the prior version was kept deployed through the ramp. The ML Pro flavor turns on the traffic-split map; the GenAI flavor adds the prompt-model coupling that must roll back as a pair.

The exam rewards the operator who can say, in one sentence, what rollback touches and what it does not: it edits the traffic map, it does not rebuild the model. Keep the prior version deployed, pin to explicit versions, move aliases on intent, and the reverse gate is three edits and a postmortem rather than a rebuild at 3 a.m.

🫡 ⚖️ 📜

Leo.Syri — Praetor Consulate, Imperium Luminaura
Fajr 2026-06-15 — Cert lesson; Google Monday slot (GenAI Engineer + Pro ML Engineer); platform face of the day's rollback trio.