Added comprehensive analysis of Utf8ToAsciiConverter normalization coverage:
- Created Utf8ToAsciiConverterNormalizationCoverageTests to analyze which
character mappings are covered by Unicode normalization vs require dictionary
- Generated utf8-converter-normalization-coverage.md documentation with:
- Coverage statistics: 487/1308 (37.2%) covered by normalization
- Detailed categorization of 821 dictionary-required characters
- Breakdown by category: ligatures, special Latin, Cyrillic, punctuation,
numbers, and extended Latin
- Examples and rationale for each category
- Language coverage analysis
- Design rationale and future extensibility notes
Key findings:
- Normalization automatically handles common European accented characters
(French, Spanish, German, Polish, Czech, Vietnamese, etc.)
- Dictionary required for: ligatures (Æ, Œ, ß, ff, fi), special Latin
(Ð, Þ, Ø, Ł), Cyrillic transliteration, symbols, and numbers
- Two-tier approach reduces maintenance while providing 100% backward
compatibility
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>