Skip to main content

Vision-to-Code Workflows

Vision-to-code capabilities enable generating implementation directly from visual specifications. This guide covers workflows for mockup-to-code, screenshot-to-code, and video-to-code transformations while maintaining quality, accessibility, and IP compliance.

Prerequisites

Before implementing vision-to-code workflows, review PRD-STD-018: Multi-Modal AI Governance for mandatory requirements.

Capabilities Overview

What Vision-to-Code Can Do

Input TypeOutputTools
UI MockupsReact/Vue/Angular componentsKimi, Gemini
ScreenshotsHTML/CSS reproductionKimi, Gemini, Claude
WireframesStructured layout codeKimi, Gemini
Video WalkthroughsFull page/applicationKimi
Design SystemsComponent libraryKimi, Gemini
Hand SketchesDigital implementationKimi

Tool Capabilities Comparison

CapabilityKimi K2.5Gemini 2.5Claude 4.5
Native multimodalYesYesLimited
Video understandingYesYesNo
Autonomous visual debuggingYesNoNo
Code from wireframesYesYesLimited
Responsive generationYesYesYes
Animation extractionYesLimitedNo

Core Workflows

Workflow 1: Mockup to Component

Use Case: Convert Figma/Sketch designs to React components

Step 1: Input Preparation

✓ Export mockup as PNG (high resolution, 2x if possible)
✓ Export individual assets (icons, images) separately
✓ Note typography specifications
✓ Document color palette (hex codes)
✓ Verify design system tokens if applicable

Step 2: Prompt Engineering

## Context
Design System: Material-UI v5
Tech Stack: React, TypeScript, Tailwind CSS
Target: Reusable component

## Visual Input
[Attach mockup.png]

## Requirements
- Implement as React functional component
- Use TypeScript with proper interfaces
- Match visual design precisely (pixel-perfect)
- Implement responsive breakpoints: sm, md, lg
- Ensure WCAG 2.1 AA accessibility
- Add Storybook stories
- Include unit tests with React Testing Library

## Constraints
- Use design system components where available
- Don't include hardcoded colors (use theme)
- Optimize images for web
- Add proper ARIA labels

## Output Format
1. Component file (tsx)
2. Styles (tailwind classes)
3. Types definition
4. Storybook story
5. Test file

Step 3: Generation

kimi vision --input mockup.png --prompt workflow.md

Step 4: Verification Checklist

CheckMethodPass Criteria
Visual fidelityPixel-by-pixel comparison<2px deviation
ResponsiveResize browserBreakpoints correct
Accessibilityaxe DevTools0 violations
TypographyInspect elementFont, size, weight match
ColorsColor pickerHex values match design
SpacingMeasure toolMargin/padding match

Step 5: Refinement

# If visual diff shows issues
kimi vision --input mockup.png --input current-implementation.png \
--prompt "Fix: button padding should be 16px not 12px"

Workflow 2: Screenshot to Code

Use Case: Reproduce existing UI (competitor analysis, legacy system migration)

⚠️ IP Warning: Only use for systems you own or have explicit permission to reproduce.

Step 1: Input Preparation

✓ Screenshot target UI (no PII or sensitive data)
✓ Capture at multiple viewport sizes
✓ Note interactive states (hover, active)
✓ Document color scheme
✓ Identify fonts used

Step 2: Generation with IP Safety

## Task
Create a dashboard layout with the following visual structure:
[Attach screenshot for structural reference only]

## IP Compliance
- Do NOT copy specific icons - use generic equivalents
- Do NOT reproduce proprietary graphics
- Do NOT use exact color values from screenshot
- Use original content only
- Structure reference only, not visual copying

## Output
Clean implementation using:
- Heroicons for icons
- Standard Tailwind color palette
- System fonts or licensed fonts only

Step 3: Legal Review Checkpoint

  • No copyrighted images reproduced
  • No trademarked logos included
  • Color scheme sufficiently distinct
  • Typography uses licensed fonts
  • Layout structure is generic pattern

Workflow 3: Video to Application

Use Case: Reconstruct full application from video walkthrough

Step 1: Video Preparation

✓ Extract keyframes at state changes
✓ Document navigation flows
✓ Note animation timings
✓ Identify data flows
✓ Timestamp important interactions

Step 2: Staged Generation

# Stage 1: Extract keyframes
ffmpeg -i walkthrough.mp4 -vf "fps=1,scale=1920:-1" keyframes/%04d.png

# Stage 2: Generate page structure from keyframes
kimi vision --input keyframes/ --mode thinking \
"Identify all pages and navigation structure"

# Stage 3: Implement each page
for frame in keyframes/*.png; do
kimi vision --input $frame \
--prompt "Implement this page in Next.js"
done

# Stage 4: Connect navigation
kimi --mode agent \
"Wire up all pages with proper routing"

Step 3: Animation Reconstruction

## Animation Specifications
From video analysis:
- Page transition: 300ms ease-in-out
- Button hover: scale(1.05), 150ms
- Modal open: fade + slide up, 250ms
- Loading skeleton: pulse animation, 1.5s loop

## Implementation
Use Framer Motion for React animations matching these specifications.

Workflow 4: Design System Generation

Use Case: Generate component library from design system documentation

Step 1: Design System Documentation

✓ Component gallery image
✓ Token specifications (colors, typography, spacing)
✓ Usage examples
✓ Do/don't guidelines

Step 2: Token Extraction

kimi vision --input design-system.png \
--prompt "Extract all design tokens: colors, typography, spacing, shadows"

Step 3: Component Generation

# Generate base components
kimi vision --input button-examples.png \
--prompt "Generate Button component with all variants"

kimi vision --input input-examples.png \
--prompt "Generate Input component with all states"

# Continue for all components...

Step 4: Documentation Generation

kimi --mode agent \
--input components/ \
--prompt "Generate Storybook documentation for all components"

Quality Assurance

Visual Diff Testing

// Using Playwright for visual regression
test('component matches design', async ({ page }) => {
await page.goto('/component');
await expect(page).toHaveScreenshot('component.png', {
threshold: 0.1 // 0.1% pixel difference allowed
});
});

Accessibility Audit

# Automated accessibility check
axe-core --url http://localhost:3000 --format json

# Manual checklist
✓ Color contrast ratio ≥ 4.5:1
✓ Focus indicators visible
✓ ARIA labels present
✓ Keyboard navigation works
✓ Screen reader compatible

Responsive Verification

# Test at multiple viewports
for width in 375 768 1024 1440 1920; do
playwright test --viewport="${width}x800"
done

Common Pitfalls and Solutions

Pitfall 1: Hardcoded Values

Problem: Generated code uses hardcoded colors/sizes instead of design tokens.

Solution:

## Constraint (add to prompt)
- Use design system tokens ONLY
- Reference theme.colors.primary not #3B82F6
- Use spacing scale: 4, 8, 12, 16, 24, 32, 48

Pitfall 2: Missing Responsive Behavior

Problem: Component works at one size only.

Solution:

## Responsive Requirements
- Mobile (<640px): Stack layout, full-width buttons
- Tablet (640-1024px): Side-by-side layout
- Desktop (>1024px): Full layout with max-width container

Pitfall 3: Accessibility Oversight

Problem: Missing ARIA labels, poor contrast, no keyboard support.

Solution:

## Accessibility Requirements
- All interactive elements keyboard accessible
- ARIA labels for icon-only buttons
- Color contrast WCAG AA compliant
- Focus management for modals
- Screen reader announcements for state changes

Pitfall 4: Asset Management

Problem: Generated code references missing images.

Solution:

## Asset Handling
- Use placeholder images from placehold.co
- Mark image sources with TODO comments
- Provide image dimensions for layout stability
- Use Next.js Image component with proper sizing

Pre-Generation Checklist

  • Do you own the visual design or have license?
  • Are fonts properly licensed?
  • Are images stock or original?
  • Does reproduction stay within fair use?
  • Is this for competitive analysis (verify legality)?

Post-Generation Checklist

  • No copyrighted images in output
  • No trademarked logos reproduced
  • Color scheme distinct enough
  • Typography uses licensed fonts only
  • Documentation of visual source

Documentation Template

## Visual Source Documentation
- **Source:** [Figma file / Screenshot / Video]
- **License:** [Owned / Licensed / Public domain]
- **Assets:** [List of extracted assets and licenses]
- **Attribution:** [If required]
- **Generated:** [Date, Tool version]
- **Reviewed by:** [Name, Role]

Tool-Specific Tips

Kimi K2.5

Strengths: Native multimodal, autonomous visual debugging

Best Practices:

# Use Thinking mode for complex designs
kimi --mode thinking vision --input mockup.png

# Enable visual debugging for refinement
kimi --mode agent --visual-debug \
--input mockup.png --input draft-implementation.png \
"Fix visual discrepancies"

# Use Agent Swarm for design system
kimi --mode swarm \
--input component-gallery.png \
"Generate all components in parallel"

Gemini 2.5

Strengths: 1M context, Google ecosystem integration

Best Practices:

# Include entire design system in context
gemini vision --input design-system.pdf \
--context tokens 1000000 \
"Generate components following this system"

Claude (Limited Vision)

Strengths: Reasoning about visual content

Best Practices:

# Use for analysis rather than generation
claude vision --input mockup.png \
"Describe the layout structure, color palette, and components"

# Then use text-based generation
claude "Implement the described component"

Cost Optimization

Resolution Strategy

Use CaseResolutionReason
Layout structure1024px widthSufficient for structure, lower cost
Component details1920px widthNeed fine details for pixel-matching
Color extractionOriginalAccurate color sampling
Typography2x resolutionSharp text for font identification

Token Budgeting

# Estimate vision token usage
kimi vision --estimate --input mockup.png
# Output: ~5000 vision tokens

# Batch similar components
kimi vision --batch component-mockups/ \
--shared-prompt "Generate React components"

Integration with AEEF Standards

PRD-STD-001: Prompt Engineering

Vision prompts follow CRAFT framework:

  • Context: Design system, tech stack
  • Requirements: Visual fidelity, responsive, accessible
  • Assumptions: Asset availability, license status
  • Format: Component files, stories, tests
  • Tests: Visual diff, accessibility audit

PRD-STD-002: Code Review

Vision-generated code requires:

  • Visual diff verification
  • Designer review for fidelity
  • Accessibility audit
  • IP clearance documentation

PRD-STD-018: Multi-Modal Governance

Mandatory compliance:

  • Visual input provenance logged
  • IP clearance verified
  • Accessibility requirements met
  • Audit trail maintained