Skip to main content

Model Registry & Versioning

Models are not just code. A model version includes: code, weights/parameters, training data reference, hyperparameters, evaluation results, and dependencies. Git alone cannot version these artifacts. A model registry provides the central source of truth for what is deployed, what was deployed, and what is ready to deploy.

Model Registry Requirements

Every production AI model SHOULD be registered in a central model registry. Registry entries SHOULD include:

FieldDescription
Model nameUnique identifier for the model
VersionSemantic version (see below)
FrameworkTensorFlow, PyTorch, scikit-learn, LLM provider, etc.
Training data versionReference to the dataset version used
Evaluation metricsKey metrics from offline evaluation
OwnerTeam or individual responsible
Creation dateISO 8601 timestamp
Promotion statusCurrent lifecycle stage

The registry SHOULD support the following model lifecycle stages:

Development → Staging → Canary → Production → Archived
  • Development: Being trained, tuned, or evaluated. Not exposed to real traffic.
  • Staging: Passed offline evaluation, under integration testing.
  • Canary: Serving a small percentage of production traffic for validation.
  • Production: Serving full production traffic.
  • Archived: Retired from production, retained for audit or rollback.

Access controls SHOULD restrict who can promote models to Production. Promotion SHOULD require approval from both the model owner and a second reviewer.

Version Management

Use semantic versioning adapted for ML artifacts:

Version ComponentTrigger
Major (X.0.0)Architecture change, new model type, fundamentally different approach
Minor (0.X.0)Retraining with new data, hyperparameter tuning, prompt revision
Patch (0.0.X)Configuration change, threshold adjustment, dependency update

Model artifacts SHOULD be immutable once registered. If a correction is needed, register a new version.

Every registered model SHOULD include a model card with:

  • Intended use and target users
  • Known limitations and failure modes
  • Evaluation results (overall and by segment)
  • Fairness assessment summary (see Fairness & Bias Assessment)
  • Training data summary (source, size, date range)
  • Responsible team and escalation contact

Artifact Storage

Model weights, scalers, encoders, and associated artifacts SHOULD be stored in versioned artifact storage (S3, GCS, Azure Blob with versioning, or a dedicated ML artifact store).

  • Artifacts SHOULD be checksummed (SHA-256) at registration for integrity verification.
  • Checksums SHOULD be verified at deployment time.
  • Large artifacts SHOULD be stored outside git, with git tracking only metadata and checksum references.
  • Artifact storage SHOULD have access controls aligned with model lifecycle stage and retention policies matching the model lifecycle.

Promotion Workflow

Define explicit promotion gates between lifecycle stages:

TransitionRequired Gates
Development → StagingAutomated offline evaluation passes thresholds
Staging → CanaryHuman review of evaluation results + fairness check
Canary → ProductionSLO validation in canary + stakeholder sign-off
Production → ArchivedReplacement model promoted + archival checklist complete

Each promotion SHOULD be logged with: promoter identity, date, evaluation evidence, approvers, and any conditions.

Rollback SHOULD be possible to any previously promoted Production version within the retention period. Rollback procedures SHOULD be tested regularly.

Third-Party Model Management

When using third-party or foundation models, register the provider model version in the registry.

  • Monitor provider model version changes — subscribe to changelogs and deprecation notices.
  • Qualify new provider versions before promoting to production: regression tests, safety evaluation, cost/latency assessment, API compatibility.
  • Pin provider model versions for production workloads to prevent unexpected behavior changes.

Maintain a provider model change log:

Provider ModelVersionChange DateImpact AssessmentQualification Status
provider-llm-v33.5-turbo-2026-012026-01-20Minor quality improvementQualified
provider-llm-v44.0-2026-022026-02-10New capabilities, prompt changesUnder evaluation

Cross-References