PRD-STD-015: Multilingual AI Quality & Safety

Standard ID: PRD-STD-015 Version: 1.0 Status: Active Compliance Level: Level 2 (Managed) Effective Date: 2026-02-22 Last Reviewed: 2026-02-22

How To Use This Standard

This page is the normative source of requirements for this control area. Use it to define policy, evidence expectations, and audit/compliance criteria.

For implementation and rollout support:

Execution plan: Apply-Ready Rollout Kit
Adoption sequencing: Production Rollout Paths
Hands-on tutorials: Production Tutorials & Starter Guides
Runnable repos / apply paths: Reference Implementations

Use the Compliance Level metadata on this page to sequence adoption with other PRD-STDs.

1. Purpose

This standard defines mandatory quality and safety controls for AI products that operate across multiple languages, dialects, or scripts. AI models exhibit significant performance variance across languages — safety filters calibrated for English often fail for Arabic, code-switching inputs produce unpredictable outputs, and bias manifests differently across linguistic and cultural contexts.

Without explicit multilingual controls, organizations risk deploying AI features that are safe in one language but harmful, inaccurate, or unusable in others.

2. Scope

This standard applies to:

Any AI product feature that supports more than one language, processes multilingual input, or serves users across linguistic communities
Conversational AI, content generation, classification, moderation, search, and recommendation features operating in multilingual contexts
Single-language products serving dialect-diverse populations (e.g., Arabic dialects: MSA, Egyptian, Gulf, Levantine, Maghrebi)

This standard does not replace PRD-STD-010 or PRD-STD-001. It adds language-specific controls required for multilingual AI product operation.

3. Definitions

Term	Definition
Supported Language	A language for which the AI product claims functional coverage, including quality, safety, and performance guarantees
Language Coverage Matrix	A documented mapping of supported languages to evaluated quality metrics, safety test results, and known limitations per language
Code-Switching	The practice of alternating between two or more languages within a single conversation, sentence, or input — common in multilingual user populations
Dialect Variant	A regional or social variation of a language with distinct vocabulary, grammar, or pragmatic norms that may affect AI model performance
Cross-Language Parity	The degree to which AI product quality, safety, and fairness metrics are consistent across supported languages
Multilingual Safety Evaluation	Structured testing of harmful output, policy violations, and abuse patterns across all supported languages
Script Normalization	The process of standardizing text encoding, directionality (LTR/RTL), and character representations to ensure consistent AI processing

4. Requirements

4.1 Multilingual Evaluation Standards

MANDATORY

REQ-015-01: Every AI product MUST maintain a Language Coverage Matrix documenting all supported languages with evaluated quality benchmarks, safety test status, and known limitations.

REQ-015-02: Quality evaluation MUST be performed independently for each supported language. Aggregate cross-language metrics MUST NOT be used as the sole indicator of per-language quality.

REQ-015-03: Minimum evaluation coverage MUST include task accuracy, response relevance, fluency, and factual consistency per supported language.

RECOMMENDED

REQ-015-04: Organizations SHOULD maintain language-specific evaluation datasets curated with native-speaker review, refreshed at least annually.

4.2 Cross-Language Safety Testing

MANDATORY

REQ-015-05: Safety evaluation MUST be executed independently for every supported language before release. A feature MUST NOT launch in a language that has not passed safety evaluation.

REQ-015-06: Adversarial abuse testing MUST include language-specific attack patterns including culturally-specific harmful content, script-based obfuscation, and transliteration-based policy evasion.

REQ-015-07: Cross-lingual transfer attacks — where harmful prompts in one language exploit model behavior in another — MUST be included in Tier 2 and Tier 3 safety evaluation.

RECOMMENDED

REQ-015-08: Organizations SHOULD maintain per-language harmful content taxonomies that account for culturally-specific sensitivities, taboo topics, and regulatory differences.

4.3 Dialect & Code-Switching Handling

MANDATORY

REQ-015-09: When an AI product serves dialect-diverse populations, evaluation MUST include the major dialect variants relevant to the user population with documented coverage and known limitations.

REQ-015-10: AI features MUST handle code-switching input without producing errors, truncated responses, or language confusion. Graceful degradation to a dominant language is acceptable if documented.

RECOMMENDED

REQ-015-11: Organizations SHOULD implement script normalization for languages with multiple encoding standards (e.g., Arabic Unicode normalization forms, CJK unified ideographs) to ensure consistent AI processing.

4.4 Multilingual Bias & Fairness Assessment

MANDATORY

REQ-015-12: Fairness evaluation MUST be conducted per supported language, not only on aggregated cross-language results.

REQ-015-13: AI products MUST test for and document cross-language quality parity gaps where performance in one supported language is materially worse than others.

REQ-015-14: When significant cross-language parity gaps are detected, the organization MUST either remediate before launch, restrict the affected language to a lower capability tier with user disclosure, or document the gap as a known limitation with a remediation timeline.

RECOMMENDED

REQ-015-15: Organizations SHOULD evaluate demographic fairness within each supported language (e.g., gender bias in Arabic vs. English may manifest differently due to grammatical gender systems).

4.5 Language-Specific Prompt Engineering

MANDATORY

REQ-015-16: System prompts and safety instructions MUST be validated in each supported language. Direct translation of English-language prompts without validation is prohibited.

REQ-015-17: Prompt libraries MUST include language-specific variants where prompt effectiveness varies by language (e.g., instruction-following patterns, formatting conventions, politeness norms).

RECOMMENDED

REQ-015-18: Organizations SHOULD implement language detection and routing to direct inputs to language-optimized model configurations or prompt variants.

5. Implementation Guidance

Minimum Multilingual Governance Pack

Teams SHOULD establish:

Language Coverage Matrix template
Per-language safety evaluation protocol
Dialect coverage assessment for primary user populations
Cross-language parity dashboard
Language-specific prompt validation checklist
Multilingual adversarial test suite

Example Language Coverage Matrix

Language	Quality Score	Safety Status	Dialect Coverage	Known Limitations	Last Evaluated
English (en)	92/100	Passed	N/A	None	2026-02-15
Arabic (ar-MSA)	87/100	Passed	MSA baseline	Reduced accuracy for technical domains	2026-02-15
Arabic (ar-EG)	79/100	Passed	Egyptian dialect	Code-switching with English degrades quality by ~8%	2026-02-15
Arabic (ar-SA)	81/100	Passed	Gulf dialect	Limited Najdi sub-dialect coverage	2026-02-15
French (fr)	85/100	Passed	Metropolitan French	Quebec French not evaluated	2026-02-15
Urdu (ur)	68/100	Conditional	Standard Urdu	Script rendering issues; safety tests incomplete for 2 categories	2026-01-30

Minimum Operational Metrics

Track at least:

per-language quality score trend
cross-language parity gap (max/min quality ratio)
per-language safety evaluation pass rate
code-switching error rate
dialect coverage percentage for primary markets
language-specific user satisfaction scores

6. Exceptions & Waiver Process

Waivers are limited to non-safety procedural controls and MUST include:

business justification
compensating controls
named approver
expiration date (maximum 30 days)

No waivers are permitted for:

launching in a language without safety evaluation
ignoring cross-language parity gaps exceeding 20% without a documented remediation plan
deploying untranslated English safety instructions in non-English language surfaces

8. Revision History

Version	Date	Author	Changes
1.0	2026-02-22	AEEF Standards Committee	Initial release

1. Purpose​

2. Scope​

3. Definitions​

4. Requirements​

4.1 Multilingual Evaluation Standards​

4.2 Cross-Language Safety Testing​

4.3 Dialect & Code-Switching Handling​

4.4 Multilingual Bias & Fairness Assessment​

4.5 Language-Specific Prompt Engineering​

5. Implementation Guidance​

Minimum Multilingual Governance Pack​

Example Language Coverage Matrix​

Minimum Operational Metrics​

6. Exceptions & Waiver Process​

7. Related Standards​

8. Revision History​

1. Purpose

2. Scope

3. Definitions

4. Requirements

4.1 Multilingual Evaluation Standards

4.2 Cross-Language Safety Testing

4.3 Dialect & Code-Switching Handling

4.4 Multilingual Bias & Fairness Assessment

4.5 Language-Specific Prompt Engineering

5. Implementation Guidance

Minimum Multilingual Governance Pack

Example Language Coverage Matrix

Minimum Operational Metrics

6. Exceptions & Waiver Process

7. Related Standards

8. Revision History