Agent Swarm Patterns
Agent Swarm enables parallel execution of complex development tasks by coordinating multiple AI agents. This guide provides proven patterns for implementing swarm workflows effectively while maintaining governance and quality.
Before implementing swarm patterns, review PRD-STD-019: Agent Swarm Coordination for mandatory governance requirements.
When to Use Agent Swarm
Appropriate Use Cases
| Scenario | Benefit | Example |
|---|---|---|
| Large codebase refactoring | 4.5x speedup | Migrate 100+ files to TypeScript |
| Multi-component updates | Parallel execution | Update auth across frontend, backend, mobile |
| Comprehensive testing | Coverage in parallel | Generate tests for all API endpoints |
| Documentation updates | Consistency across scope | Update all README files with new API |
| Dependency upgrades | Ripple effect handling | React 17→18 upgrade across codebase |
Inappropriate Use Cases
| Scenario | Why Not Swarm | Better Approach |
|---|---|---|
| Single-file changes | Coordination overhead | Single agent |
| Sequential dependencies | Cannot parallelize | Sequential handoffs |
| Security-critical code | Requires focused review | Senior engineer + single agent |
| Novel architecture design | Needs coherent vision | Single agent with deep reasoning |
The Parallelization Test
Before using swarm, verify your task passes the PARALLEL test:
- Partitionable — Can be divided into independent sub-tasks?
- Aggregatable — Can sub-results be combined into coherent output?
- Reviewable — Can you verify each sub-task independently?
- Accountable — Can you identify ownership for each sub-task?
- Limited dependencies — Are cross-task dependencies minimal?
- Logged — Can you capture full audit trail?
- Estimable — Can you estimate cost/time for each sub-task?
Core Patterns
Pattern 1: Domain-Based Decomposition
Divide work by architectural domain.
Task: Migrate monolith to microservices
Orchestrator
├── Frontend Agent (React components)
├── Backend API Agent (Express routes)
├── Database Agent (schema, migrations)
├── Worker Agent (background jobs)
└── Integration Test Agent (depends on all)
Implementation:
kimi --mode swarm \
--decomposition domain \
--checkpoint-interval 3 \
"Migrate user service to microservices architecture"
Governance Notes:
- Each domain agent receives only relevant context
- Integration agent waits for domain agents (explicit dependency)
- Domain expertise documented in agent configuration
When to Use: Large architectural changes affecting multiple layers
Pattern 2: Component-Based Decomposition
Divide work by discrete components.
Task: Add OAuth2 authentication
Orchestrator
├── Login Component Agent (UI)
├── Token Service Agent (backend)
├── Middleware Agent (auth checks)
├── Database Agent (user sessions)
└── E2E Test Agent (depends on all)
Implementation:
kimi --mode swarm \
--decomposition component \
--max-agents 10 \
"Implement OAuth2 authentication flow"
Governance Notes:
- Component interfaces defined before swarm launch
- Contract tests between components
- Clear ownership per component
When to Use: Feature implementation spanning multiple services/modules
Pattern 3: Data-Based Decomposition
Divide work by data partitions.
Task: Process and migrate user data
Orchestrator
├── Shard A Agent (users A-F)
├── Shard B Agent (users G-M)
├── Shard C Agent (users N-S)
├── Shard D Agent (users T-Z)
└── Aggregation Agent (depends on all shards)
Implementation:
kimi --mode swarm \
--decomposition data \
--shards 4 \
--shard-key "user_id" \
"Migrate user preferences to new schema"
Governance Notes:
- Idempotency required (safe to retry)
- Shard boundaries must not overlap
- Aggregation validates completeness
When to Use: Batch processing, data migrations, ETL workflows
Pattern 4: Stage-Based Decomposition
Divide by pipeline stages.
Task: CI/CD pipeline optimization
Orchestrator
├── Build Agent (compile, bundle)
├── Test Agent (unit, integration)
├── Security Agent (SAST, dependency scan)
├── Deploy Agent (staging)
└── Verify Agent (smoke tests, depends on deploy)
Implementation:
kimi --mode swarm \
--decomposition pipeline \
--stage-gates \
"Optimize CI/CD pipeline for faster builds"
Governance Notes:
- Stage gates require approval before progression
- Each stage has defined success criteria
- Rollback triggers defined per stage
When to Use: CI/CD improvements, release automation, quality gates
Pattern 5: Expertise-Based Decomposition
Divide by specialized knowledge areas.
Task: Comprehensive security audit
Orchestrator
├── Authentication Agent (auth flows)
├── Input Validation Agent (sanitization)
├── Secrets Agent (credential handling)
├── API Security Agent (endpoints)
└── Reporting Agent (aggregate findings)
Implementation:
kimi --mode swarm \
--decomposition expertise \
--expert-config ".security-experts.yaml" \
"Conduct security audit of payment module"
Governance Notes:
- Expert agents configured with domain-specific rules
- Findings require human security review
- Severity classification per finding
When to Use: Security audits, compliance checks, specialized reviews
Advanced Patterns
Pattern 6: Hierarchical Swarm
Nested swarms for very large tasks.
Orchestrator (Level 1)
├── Service A Lead
│ ├── A-Frontend Agent
│ ├── A-Backend Agent
│ └── A-Test Agent
├── Service B Lead
│ ├── B-Frontend Agent
│ ├── B-Backend Agent
│ └── B-Test Agent
└── Integration Lead
└── Cross-Service Test Agent
Governance Notes:
- Each lead manages their own sub-swarm
- Clear escalation paths between levels
- Aggregated reporting at each level
When to Use: Enterprise-scale migrations (100+ services)
Pattern 7: Competitive Swarm
Multiple agents solve same problem, best result selected.
Orchestrator
├── Algorithm A Agent (recursive approach)
├── Algorithm B Agent (iterative approach)
├── Algorithm C Agent (functional approach)
└── Evaluation Agent (benchmarks, selects winner)
Governance Notes:
- Objective evaluation criteria defined upfront
- Human review of selected solution
- Alternative approaches documented
When to Use: Algorithm optimization, architectural decisions, complex problem-solving
Pattern 8: Verification Swarm
Separate agents implement and verify.
Orchestrator
├── Implementation Agent (generates code)
├── Review Agent 1 (checks correctness)
├── Review Agent 2 (checks performance)
├── Review Agent 3 (checks security)
└── Consolidation Agent (addresses findings)
Governance Notes:
- Review agents use different criteria
- Conflicting findings escalated to human
- Final approval still requires human
When to Use: Critical path code, high-stakes implementations
Implementation Guidelines
Task Decomposition Best Practices
-
Define Clear Boundaries
Bad: "Handle authentication"
Good: "Implement JWT token generation in auth.service.ts" -
Minimize Cross-Dependencies
- Aim for <10% of sub-tasks having dependencies
- Document all dependencies explicitly
- Consider dependency order in agent scheduling
-
Size Appropriately
Total Work Sub-Tasks Agents Small (1-10 files) 2-3 2-3 Medium (10-50 files) 5-10 5-10 Large (50-200 files) 10-20 10-20 Enterprise (200+ files) 20-50 20-50 -
Include Validation Criteria Each sub-task must have measurable completion criteria:
- Tests pass
- Linting clean
- Type checking passes
- Human review checkpoint (for production)
Error Handling Strategies
Strategy 1: Fail Fast
kimi --mode swarm \
--fail-fast \
--notify-on-failure \
"Critical production fix"
Any sub-agent failure stops entire swarm. Use for: Critical changes where partial completion is dangerous.
Strategy 2: Continue with Logging
kimi --mode swarm \
--continue-on-failure \
--failure-log "/var/log/swarm-failures.log" \
"Batch documentation updates"
Failed agents logged, others continue. Use for: Non-critical batch work where partial completion is acceptable.
Strategy 3: Retry with Backoff
kimi --mode swarm \
--retry 3 \
--retry-delay 5s \
--retry-backoff exponential \
"API integration updates"
Automatic retry for transient failures. Use for: External dependency work, network-dependent tasks.
Strategy 4: Human Escalation
kimi --mode swarm \
--escalate-on-failure \
--escalation-contact "oncall@company.com" \
--escalation-timeout 300 \
"Complex database migration"
Human intervention required for failures. Use for: High-risk changes requiring expert judgment.
Cost Management
Swarm execution can consume significant token budgets:
# Estimate before execution
kimi --mode swarm --estimate-only "Large refactoring task"
# Output: Estimated 500K input tokens, 200K output tokens, ~$4.00
# Set hard budget
kimi --mode swarm \
--token-budget-input 1000000 \
--token-budget-output 500000 \
--action-on-budget-exceed notify-and-pause \
"Feature implementation"
Cost Optimization Tips:
- Use smaller models for sub-agents when possible
- Cache common context across agents
- Set checkpoints to enable early termination if quality degrades
- Review decomposition plan to minimize coordination overhead
Monitoring and Observability
# Real-time dashboard
kimi --mode swarm \
--dashboard-port 8080 \
--metrics-prometheus \
"Production deployment"
# Structured logging
kimi --mode swarm \
--log-format json \
--log-destination /var/log/swarm/$(date +%Y%m%d-%H%M%S).json \
"Audit-required change"
Key Metrics to Track:
- Execution time per agent
- Token usage per agent and total
- Success/failure rates
- Conflict count
- Human intervention frequency
- Time to completion vs. estimate
Anti-Patterns and Mitigations
Anti-Pattern 1: The Uncoordinated Stampede
Problem: Multiple agents editing same files simultaneously.
Symptoms:
- Merge conflicts
- Inconsistent changes
- Lost work
Mitigation:
- Clear file ownership per agent
- Pre-execution file locking
- Conflict detection in orchestrator
Anti-Pattern 2: The Cascade Failure
Problem: One failed agent causes entire swarm collapse.
Symptoms:
- All agents stop on first failure
- Partial work lost
- No recovery mechanism
Mitigation:
--continue-on-failurefor non-critical tasks- Dependency isolation
- Checkpoint recovery
Anti-Pattern 3: The Silent Conflict
Problem: Agents produce contradictory outputs that auto-merge.
Symptoms:
- Inconsistent code style
- Conflicting implementations
- Test failures post-merge
Mitigation:
- Conflict detection rules
- Human review for divergent outputs
- Standardized patterns enforced
Anti-Pattern 4: The Runaway Swarm
Problem: Excessive token usage or indefinite execution.
Symptoms:
- Budget exceeded
- Agents stuck in loops
- No progress visibility
Mitigation:
- Token budgets with enforcement
- Timeouts per agent and total
- Progress checkpoints
Anti-Pattern 5: The Over-Decomposition
Problem: Too many sub-tasks create coordination overhead.
Symptoms:
- More time coordinating than working
- Excessive inter-agent messaging
- Diminishing returns
Mitigation:
- Target 5-20 sub-tasks per swarm
- Batch small related tasks
- Measure coordination overhead
Tool-Specific Implementation
Kimi Code Agent Swarm
Configuration File (.kimi/swarm-config.yaml):
swarm:
max_agents: 20
decomposition: domain
checkpoint_interval: 5
token_budget:
input: 1000000
output: 500000
error_handling:
strategy: continue_with_logging
max_retries: 3
logging:
level: debug
destination: ./logs/swarm/
governance:
owner: "senior-dev@company.com"
require_decomposition_review: true
human_approval_checkpoints: [5, 10, 15]
Execution:
kimi --mode swarm --config .kimi/swarm-config.yaml "Migrate to TypeScript"
Custom Swarm Implementation
For teams building custom orchestration:
# Simplified swarm orchestrator pattern
class SwarmOrchestrator:
def __init__(self, config):
self.max_agents = config.max_agents
self.token_budget = config.token_budget
self.agents = []
self.checkpoints = []
def decompose(self, task):
# Use AI to decompose task
subtasks = self.llm.decompose(task)
return subtasks
def spawn_agents(self, subtasks):
for subtask in subtasks:
if len(self.agents) >= self.max_agents:
break
agent = SubAgent(subtask, self.config)
self.agents.append(agent)
def coordinate(self):
# Parallel execution with dependency resolution
results = parallel_execute(self.agents)
return self.aggregate(results)
def checkpoint(self):
# Save recoverable state
self.checkpoints.append({
'agents': [a.state for a in self.agents],
'timestamp': now(),
'tokens_used': self.token_usage()
})
Related Resources
- PRD-STD-019: Agent Swarm Coordination — Governance requirements
- Kimi Code Guide — Tool-specific capabilities
- Agentic Software Engineering — Theoretical foundations