Orchestration Rules and State Machine

The orchestrator is the control layer that routes work between agents, enforces stage order, and handles failures. This document defines the rules that govern the orchestrator's behavior. It is mandatory per PRD-STD-009 REQ-009-05.

State Machine Definition

Every work item in the pipeline exists in exactly one state at any time. The orchestrator manages state transitions.

┌─────────────────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATION STATE MACHINE                          │
│                                                                             │
│                           ┌──────────┐                                     │
│                           │  INTAKE  │                                     │
│                           └────┬─────┘                                     │
│                                │                                           │
│                                v                                           │
│  ┌────────────┐  PASS   ┌──────────┐  PASS   ┌──────────────┐            │
│  │REQUIREMENTS│────────>│  DESIGN  │────────>│IMPLEMENTATION│            │
│  │  (Stage 1) │         │(Stage 2) │         │  (Stage 3)   │            │
│  └─────┬──────┘         └────┬─────┘         └──────┬───────┘            │
│        │ FAIL                │ FAIL                  │ FAIL               │
│        v                     v                       v                    │
│     REWORK              REWORK                   REWORK                  │
│     (Stage 1)           (Stage 2)                (Stage 3)               │
│                                                      │ PASS              │
│                                                      v                   │
│                          ┌──────────┐  PASS   ┌──────────┐              │
│                          │ SECURITY │<────────│ TESTING  │              │
│                          │(Stage 5) │         │(Stage 4) │              │
│                          └────┬─────┘         └────┬─────┘              │
│                               │ FAIL               │ FAIL               │
│                               v                    v                    │
│                           REWORK              REWORK                    │
│                           (Stage 3/5)         (Stage 3/4)               │
│                               │ PASS                                    │
│                               v                                         │
│                          ┌──────────┐  PASS   ┌──────────┐             │
│                          │DEPLOYMENT│────────>│OPERATIONS│             │
│                          │(Stage 6) │         │(Stage 7) │             │
│                          └────┬─────┘         └────┬─────┘             │
│                               │ FAIL               │ INCIDENT          │
│                               v                    v                   │
│                           REWORK              ROLLBACK                 │
│                           (Stage 6)           + REWORK                 │
│                                                    │                   │
│                                               FEEDBACK                 │
│                                               TO INTAKE                │
│                                                                        │
│  Terminal states: DEPLOYED, ROLLED_BACK, CANCELLED                     │
└─────────────────────────────────────────────────────────────────────────┘

State Definitions

State	Description	Owner Agent	Valid Transitions
`INTAKE`	Work item received, not yet assigned	Orchestrator	→ `REQUIREMENTS`
`REQUIREMENTS`	Stage 1 active	`product-agent`, `scrum-agent`	→ `DESIGN` (pass), → `REWORK_REQ` (fail)
`DESIGN`	Stage 2 active	`architect-agent`	→ `IMPLEMENTATION` (pass), → `REWORK_DESIGN` (fail)
`IMPLEMENTATION`	Stage 3 active	`developer-agent`	→ `TESTING` (pass), → `REWORK_IMPL` (fail)
`TESTING`	Stage 4 active	`qa-agent`, `devmgr-agent`	→ `SECURITY` (pass), → `REWORK_IMPL` (code fix), → `REWORK_TEST` (test fix)
`SECURITY`	Stage 5 active	`security-agent`, `compliance-agent`	→ `DEPLOYMENT` (pass), → `REWORK_IMPL` (remediation), → `REWORK_SEC` (scan config)
`DEPLOYMENT`	Stage 6 active	`platform-agent`	→ `OPERATIONS` (pass), → `REWORK_DEPLOY` (config fix)
`OPERATIONS`	Stage 7 active	`ops-agent`, `executive-agent`	→ `DEPLOYED` (stable), → `ROLLBACK` (incident)
`REWORK_*`	Rework in progress at specific stage	Stage-specific agent	→ Return to originating stage
`DEPLOYED`	Successfully in production (terminal)	—	→ `INTAKE` (new work via feedback)
`ROLLED_BACK`	Rolled back from production (terminal)	—	→ `INTAKE` (rework item created)
`CANCELLED`	Work item cancelled (terminal)	—	None

Transition Rules

Forward Transitions (Happy Path)

Every forward transition requires:

Gate criteria met — all checks for the current stage passed
Handoff artifact produced — structured output per PRD-STD-009 REQ-009-06
Target agent available — next agent has capacity and valid contract
No blocking escalations — no unresolved escalation requests

# Example transition rule
transition:
  from: IMPLEMENTATION
  to: TESTING
  requires:
    - gate_3_passed: true
    - handoff_artifact: present
    - ai_metadata: [AI-Usage, Agent-IDs, AI-Prompt-Ref]
    - unit_tests: passing
    - lint: passing
  produces:
    - handoff: "HO-developer-agent-qa-agent-{timestamp}"
    - state_change: "IMPLEMENTATION → TESTING"
    - audit_record: true

Failure Transitions (Rework Routing)

When a gate fails, the orchestrator must route the work item to the correct agent for rework. The routing depends on the failure type.

Current Stage	Failure Type	Route To	Rework Agent	Max Rework Iterations
Testing (4)	Test failure (code bug)	`REWORK_IMPL`	`developer-agent`	3
Testing (4)	Test gap (missing coverage)	`REWORK_TEST`	`qa-agent`	2
Testing (4)	Acceptance criteria mismatch	`REWORK_REQ`	`product-agent`	1
Security (5)	Vulnerability found	`REWORK_IMPL`	`developer-agent`	3
Security (5)	License violation	`REWORK_IMPL`	`developer-agent`	2
Security (5)	Compliance evidence gap	`REWORK_SEC`	`compliance-agent`	2
Deployment (6)	Configuration error	`REWORK_DEPLOY`	`platform-agent`	2
Deployment (6)	Environment mismatch	`REWORK_DEPLOY`	`platform-agent`	2
Operations (7)	Health check failure	`ROLLBACK`	`ops-agent` + `platform-agent`	1 (then escalate)
Operations (7)	Critical incident	`ROLLBACK`	`ops-agent` + human	Immediate

Escalation Transitions

When an agent cannot resolve an issue within its contract, it must escalate to a human.

Trigger	Escalation Target	Response SLA	Action if SLA Breached
Architecture-impacting decision	Solution Architect	4 hours	Block pipeline, notify CTO
Auth/crypto/PII change	Security Engineer	2 hours	Block pipeline, notify Security lead
Rework iteration limit reached	Development Manager	4 hours	Block pipeline, create incident
Agent contract violation	CTO	1 hour	Suspend agent immediately
Cross-agent conflict (contradictory outputs)	Solution Architect	4 hours	Block pipeline, convene review

Iteration Limits and Deadlock Prevention

Maximum Iteration Thresholds

Per PRD-STD-009 REQ-009-07, autonomous loops must enforce maximum iteration limits.

Loop Type	Max Iterations	On Breach
Single agent rework (same stage)	3	Escalate to human owner of that stage
Cross-stage rework (bouncing between stages)	5 total across all stages	Escalate to Development Manager
Full pipeline retry (Stage 1 restart)	2	Escalate to CTO, likely needs scope change
Deployment retry	2	Block deployment, human investigation required

Deadlock Detection

A deadlock occurs when two or more agents are waiting for each other's output. The orchestrator must detect and resolve these.

Detection rules:

Circular wait: Agent A waits for Agent B, Agent B waits for Agent A
Timeout: Any agent state unchanged for >2x its expected execution time
Contradictory outputs: Two agents produce conflicting recommendations with no resolution path

Resolution protocol:

1. Orchestrator detects deadlock condition
2. Orchestrator pauses all involved agents
3. Orchestrator notifies the Development Manager with:
   - Deadlock type (circular, timeout, contradictory)
   - Involved agents and their current states
   - Last handoff artifacts from each agent
   - Suggested resolution (human decision needed)
4. Development Manager resolves by:
   - Choosing one agent's output over the other
   - Providing additional context to break the tie
   - Escalating to Solution Architect for architecture decisions
   - Cancelling the work item if resolution is not feasible
5. Orchestrator resumes pipeline with resolution applied

Parallel Execution Rules

Some stages can run in parallel to reduce cycle time. The orchestrator manages parallelism.

Allowed Parallel Paths

Stage 3 (Implementation) completes
        │
        ├──> Stage 4 (qa-agent) ──────────────┐
        │                                       │
        └──> Stage 5 (security-agent) ─────────┤ ──> Merge results ──> Gate 5
             Stage 5 (compliance-agent) ────────┘

Rules for parallel execution:

qa-agent and security-agent MAY run in parallel after Gate 3
compliance-agent MAY run in parallel with security-agent
Both paths must complete and pass before Gate 5 is evaluated
If one path fails, the other continues but the work item cannot advance
devmgr-agent runs after both qa-agent and security-agent complete (needs both outputs)

Forbidden Parallel Paths

These stages MUST run sequentially:

Stage A	Stage B	Reason
Requirements (1)	Design (2)	Design depends on approved requirements
Design (2)	Implementation (3)	Code depends on approved design
Security (5)	Deployment (6)	Cannot deploy security-uncleared code
Deployment (6)	Operations (7)	Cannot monitor what is not deployed

Orchestrator Configuration

Work Item Metadata

Every work item tracked by the orchestrator carries this metadata:

work_item:
  id: "WI-{project}-{sequence}"
  title: "{descriptive title}"
  risk_tier: 1|2|3|4
  data_classification: public|internal|confidential|restricted
  current_state: "{state from state machine}"
  current_agent: "{agent-id or null}"
  created_at: "{ISO 8601}"
  updated_at: "{ISO 8601}"
  stage_history:
    - stage: 1
      agent: "product-agent"
      entered_at: "{ISO 8601}"
      exited_at: "{ISO 8601}"
      result: "pass|fail|escalate"
      handoff_id: "HO-{id}"
      iteration: 1
  rework_count: 0
  total_iterations: 0
  escalation_history: []
  trust_levels:
    product-agent: 1
    architect-agent: 0
    developer-agent: 2
    qa-agent: 1
    security-agent: 1

Orchestrator Health Checks

The orchestrator itself must be monitored:

Metric	Threshold	Action on Breach
Queue depth	>50 work items	Alert Development Manager, assess capacity
Average cycle time	>2x baseline	Investigate bottleneck stages
Deadlock rate	>1 per week	Review agent contracts for conflicts
Escalation rate	>10% of work items	Review trust levels and agent capabilities
Gate failure rate	>30% at any single gate	Investigate root cause, retrain agents

Event Log Format

Every state transition produces an event in the orchestration log:

{
  "event_id": "EVT-{uuid}",
  "timestamp": "2026-02-23T14:30:00Z",
  "work_item_id": "WI-myproject-042",
  "transition": {
    "from_state": "IMPLEMENTATION",
    "to_state": "TESTING",
    "trigger": "gate_3_passed"
  },
  "agent": {
    "source": "developer-agent",
    "target": "qa-agent"
  },
  "handoff_id": "HO-developer-agent-qa-agent-20260223T143000",
  "gate_results": {
    "lint": "pass",
    "unit_tests": "pass",
    "sast_basic": "pass",
    "ai_metadata": "pass"
  },
  "trust_level": 2,
  "human_approval": null,
  "duration_seconds": 3420
}

This log format satisfies PRD-STD-009 REQ-009-14 (auditable run records) and PRD-STD-005 (documentation requirements).

Quick Reference: Orchestration Decision Tree

Is the work item new?
├── YES → State: INTAKE → Route to product-agent (Stage 1)
└── NO → Is the current gate passed?
    ├── YES → Is there a next stage?
    │   ├── YES → Can parallel paths run?
    │   │   ├── YES → Launch parallel agents
    │   │   └── NO → Route to next stage's owner agent
    │   └── NO → State: DEPLOYED (terminal)
    └── NO → Is the rework limit reached?
        ├── YES → Escalate to human (Development Manager)
        └── NO → What type of failure?
            ├── Code bug → Route to developer-agent
            ├── Test gap → Route to qa-agent
            ├── Security finding → Route to developer-agent (remediate)
            ├── Compliance gap → Route to compliance-agent
            ├── Config error → Route to platform-agent
            └── Unclear → Escalate to human (Development Manager)

State Machine Definition​

State Definitions​

Transition Rules​

Forward Transitions (Happy Path)​

Failure Transitions (Rework Routing)​

Escalation Transitions​

Iteration Limits and Deadlock Prevention​

Maximum Iteration Thresholds​

Deadlock Detection​

Parallel Execution Rules​

Allowed Parallel Paths​

Forbidden Parallel Paths​

Orchestrator Configuration​

Work Item Metadata​

Orchestrator Health Checks​

Event Log Format​

Quick Reference: Orchestration Decision Tree​