Vertex AI Continuous Training
Pipelines, Model Registry, and drift-triggered retraining across ML Pro and GenAI Engineer.
§ IFrame
Last Monday's Google lesson named the layer that watches a deployed model: Vertex AI Model Monitoring for skew and drift, the Gen AI evaluation service for generative quality. Both answer the same question from two ends — has the model degraded against the data it now sees. Monitoring raises the alarm. This lesson answers it.
The Pro ML Engineer exam and the GenAI Engineer exam both test continuous training, and they test it on the same Vertex platform with different model classes. The Pro ML candidate must know how a Vertex Pipeline retrains a classical model when monitoring detects drift, registers the candidate in the Model Registry, and promotes it behind an endpoint alias. The GenAI Engineer candidate must know the parallel loop for a tuned Gemini model or a RAG corpus: when does a generative system get re-tuned or re-indexed, and how is the refreshed version promoted without a blind cutover.
Treat the Vertex retraining loop once. Pivot between the classical-ML flavor and the generative flavor. The platform is shared; the trigger, the pipeline, the registry, and the promotion alias are the same primitives wearing different model classes.
§ IIDomain Foundations (shared Vertex platform)
Four Vertex components carry continuous training, and both certs lean on all four.
The loop: monitoring detects drift, a trigger starts a pipeline, the pipeline trains and evaluates a candidate, a conditional step registers it only if it clears the bar, and an alias or traffic split promotes it. The four components compose into the continuous-training loop both exams test.
§ IIICert-A Flavor: Pro ML Engineer
The Pro ML exam treats continuous training as a core MLOps competency, and it expects the candidate to choose the right trigger and the right pipeline structure.
On triggers, the exam distinguishes three patterns and expects the candidate to match them to scenarios. Scheduled retraining runs the pipeline on a fixed cadence through Cloud Scheduler; correct when data changes steadily and predictably. Event-driven retraining fires when new data lands, often through a Cloud Storage or Pub/Sub trigger into Cloud Functions; correct when fresh data arrives in batches. Monitoring-driven retraining fires when Model Monitoring raises a drift alert through Pub/Sub; correct when the model should retrain in response to measured degradation rather than on a clock. The exam's preferred answer for a model facing unpredictable real-world drift is the monitoring-driven trigger, the same conclusion today's Ops lesson reached.
On pipeline structure, the candidate must know the conditional-registration pattern. The pipeline trains a candidate, runs an evaluation step that compares the candidate against the current champion, and uses a conditional step (dsl.If in the Kubeflow SDK) so that registration happens only when the candidate's metric clears a threshold. A pipeline that registers unconditionally promotes regressions; the conditional step is the gate, expressed in the DAG.
On the registry, the exam expects the candidate to know that model versions carry evaluation metadata, that aliases decouple the served version from the endpoint, and that rolling back is moving the alias to the prior version. The candidate should also know managed datasets and feature stores feed the training step so the retraining run reads the same feature definitions the serving path uses, which is how Vertex prevents training-serving skew at the source.
§ IVCert-B Flavor: GenAI Engineer
The GenAI Engineer exam asks the same loop for generative systems, where the model class is a tuned Gemini model or a RAG corpus rather than a classical regressor.
For a tuned model, retraining is re-tuning. Supervised fine-tuning on Vertex produces a tuned Gemini endpoint; when the Gen AI evaluation service shows the tuned model's quality sliding on the canary prompts, the loop re-runs the tuning job on a refreshed example set and registers the new tuned model as a version. The exam expects the candidate to know that tuning jobs are reproducible artifacts, that the tuned model lands in the registry like any model, and that promotion is the same alias move.
For a RAG system, the more common degradation is not the model at all — it is the corpus. The documents the system retrieves from go stale; the embedding index drifts from the current document set; new documents never get indexed. The GenAI Engineer loop here is corpus refresh: re-embed and re-index on a schedule or on a document-change event, and evaluate retrieval quality before the refreshed index serves. Vertex AI Search and the RAG Engine corpus management are the tools; the discipline is identical to model retraining — refresh, evaluate, promote only on clearing the bar.
The exam also tests the evaluation half. Generative quality is scored with the Gen AI evaluation service: pointwise metrics, pairwise comparison against the prior version, and model-based judges with rubrics. A re-tuned Gemini model promotes only after a pairwise eval shows it matches or beats the champion on the metrics that matter. This is the shadow-eval discipline from today's Ops lesson, expressed in the generative idiom.
§ VWorked Example
A retail demand-forecasting system runs two models on Vertex. A classical gradient-boosted regressor predicts next-week unit demand per SKU (the Pro ML side). A Gemini-powered assistant answers merchandiser questions over a corpus of supplier and policy documents using RAG (the GenAI side).
Model Monitoring on the regressor's endpoint reports feature drift: a new supplier's SKUs shifted the price and lead-time distributions beyond the configured threshold. The alert publishes to Pub/Sub. A Cloud Function consumes it and starts the retraining pipeline. The pipeline ingests the last ninety days from the managed dataset, runs a validation step comparing feature distributions against the training baseline, trains a candidate regressor, and runs an evaluation step. The candidate's RMSE on the held-out recent window beats the champion's; the dsl.If conditional fires; the candidate registers as version 8. An endpoint traffic split sends it ten percent of prediction requests while monitoring watches. The candidate holds; the alias moves to version 8; the champion steps down to standby for rollback.
Meanwhile the RAG assistant degrades differently. Merchandisers report it citing a discontinued return policy. Nothing is wrong with Gemini; the corpus holds a superseded document and the new policy was never indexed. The corpus-refresh pipeline re-embeds the document set, rebuilds the index, and runs a retrieval-quality eval against a fixed question set with known-correct source documents. The refreshed index retrieves the current policy; the Gen AI evaluation service confirms answer quality improved on the policy questions and held elsewhere; the refreshed corpus promotes.
One platform, two model classes, one loop. The Pro ML side retrained a regressor on drift; the GenAI side refreshed a corpus on a stale-document complaint. Both went through detect, retrain-or-refresh, evaluate, and promote-on-clearing-the-bar.
§ VIConnection to Today's Ops + Dev Lessons
The Ops lesson named the continuous-training loop as four parts — trigger, assembly, training, gate — and stressed that the loop's output is a candidate the gate judges, never a model that serves directly. Vertex enforces exactly that shape: the pipeline's conditional-registration step is the gate, and a registered candidate still does not serve until an alias or traffic split promotes it. The Ops lesson's shadow-eval discipline is Vertex's evaluation step plus traffic-split canary.
The Python Dev lesson encoded the four parts as typed stages whose composition the checker verifies. A Vertex Pipeline is the same chain in a different medium: each pipeline component declares typed inputs and outputs, and the compiler refuses to wire a component whose output artifact type does not match the next component's input. The KFP component's type annotations do for the pipeline DAG what Protocol and generics did for the Python composition. The trio holds one idea across three media: a retraining pipeline is a typed chain whose gate, not whose training step, decides what serves.
§ VIIPractice Questions
dsl.If) on that comparison. Unconditional registration would promote regressions; the conditional step is the promotion gate expressed in the DAG.§ VIIIClosing
Vertex's continuous-training loop is four components doing one job: Pipelines orchestrate, the Registry versions, Monitoring triggers, and Endpoints promote. The Pro ML candidate retrains a classical model on drift; the GenAI candidate re-tunes a model or refreshes a corpus on degradation. Same platform, same loop, different model class.
The exam rewards the candidate who sees the loop, not the candidate who memorizes one service. Monitoring noticed; the pipeline acted; the conditional step gated; the alias promoted.
Examine the loop well. The version serving on Vertex tomorrow is the one whose conditional step let it into the registry today.
Fajr 2026-06-08 — Cert-prep lesson; Google GenAI Engineer + Pro ML Engineer; closes the monitor → retrain loop opened 2026-06-01.