Model Registry & Versioning
Models are not just code. A model version includes: code, weights/parameters, training data reference, hyperparameters, evaluation results, and dependencies. Git alone cannot version these artifacts. A model registry provides the central source of truth for what is deployed, what was deployed, and what is ready to deploy.
Model Registry Requirements
Every production AI model SHOULD be registered in a central model registry. Registry entries SHOULD include:
| Field | Description |
|---|---|
| Model name | Unique identifier for the model |
| Version | Semantic version (see below) |
| Framework | TensorFlow, PyTorch, scikit-learn, LLM provider, etc. |
| Training data version | Reference to the dataset version used |
| Evaluation metrics | Key metrics from offline evaluation |
| Owner | Team or individual responsible |
| Creation date | ISO 8601 timestamp |
| Promotion status | Current lifecycle stage |
The registry SHOULD support the following model lifecycle stages:
Development → Staging → Canary → Production → Archived
- Development: Being trained, tuned, or evaluated. Not exposed to real traffic.
- Staging: Passed offline evaluation, under integration testing.
- Canary: Serving a small percentage of production traffic for validation.
- Production: Serving full production traffic.
- Archived: Retired from production, retained for audit or rollback.
Access controls SHOULD restrict who can promote models to Production. Promotion SHOULD require approval from both the model owner and a second reviewer.
Version Management
Use semantic versioning adapted for ML artifacts:
| Version Component | Trigger |
|---|---|
| Major (X.0.0) | Architecture change, new model type, fundamentally different approach |
| Minor (0.X.0) | Retraining with new data, hyperparameter tuning, prompt revision |
| Patch (0.0.X) | Configuration change, threshold adjustment, dependency update |
Model artifacts SHOULD be immutable once registered. If a correction is needed, register a new version.
Every registered model SHOULD include a model card with:
- Intended use and target users
- Known limitations and failure modes
- Evaluation results (overall and by segment)
- Fairness assessment summary (see Fairness & Bias Assessment)
- Training data summary (source, size, date range)
- Responsible team and escalation contact
Artifact Storage
Model weights, scalers, encoders, and associated artifacts SHOULD be stored in versioned artifact storage (S3, GCS, Azure Blob with versioning, or a dedicated ML artifact store).
- Artifacts SHOULD be checksummed (SHA-256) at registration for integrity verification.
- Checksums SHOULD be verified at deployment time.
- Large artifacts SHOULD be stored outside git, with git tracking only metadata and checksum references.
- Artifact storage SHOULD have access controls aligned with model lifecycle stage and retention policies matching the model lifecycle.
Promotion Workflow
Define explicit promotion gates between lifecycle stages:
| Transition | Required Gates |
|---|---|
| Development → Staging | Automated offline evaluation passes thresholds |
| Staging → Canary | Human review of evaluation results + fairness check |
| Canary → Production | SLO validation in canary + stakeholder sign-off |
| Production → Archived | Replacement model promoted + archival checklist complete |
Each promotion SHOULD be logged with: promoter identity, date, evaluation evidence, approvers, and any conditions.
Rollback SHOULD be possible to any previously promoted Production version within the retention period. Rollback procedures SHOULD be tested regularly.
Third-Party Model Management
When using third-party or foundation models, register the provider model version in the registry.
- Monitor provider model version changes — subscribe to changelogs and deprecation notices.
- Qualify new provider versions before promoting to production: regression tests, safety evaluation, cost/latency assessment, API compatibility.
- Pin provider model versions for production workloads to prevent unexpected behavior changes.
Maintain a provider model change log:
| Provider Model | Version | Change Date | Impact Assessment | Qualification Status |
|---|---|---|---|---|
| provider-llm-v3 | 3.5-turbo-2026-01 | 2026-01-20 | Minor quality improvement | Qualified |
| provider-llm-v4 | 4.0-2026-02 | 2026-02-10 | New capabilities, prompt changes | Under evaluation |