Umbraco-CMS

Files

yv01p 6adb654ec4 docs(strings): document normalization coverage and character mapping analysis

Added comprehensive analysis of Utf8ToAsciiConverter normalization coverage:
- Created Utf8ToAsciiConverterNormalizationCoverageTests to analyze which
  character mappings are covered by Unicode normalization vs require dictionary
- Generated utf8-converter-normalization-coverage.md documentation with:
  - Coverage statistics: 487/1308 (37.2%) covered by normalization
  - Detailed categorization of 821 dictionary-required characters
  - Breakdown by category: ligatures, special Latin, Cyrillic, punctuation,
    numbers, and extended Latin
  - Examples and rationale for each category
  - Language coverage analysis
  - Design rationale and future extensibility notes

Key findings:
- Normalization automatically handles common European accented characters
  (French, Spanish, German, Polish, Czech, Vietnamese, etc.)
- Dictionary required for: ligatures (Æ, Œ, ß, ﬀ, ﬁ), special Latin
  (Ð, Þ, Ø, Ł), Cyrillic transliteration, symbols, and numbers
- Two-tier approach reduces maintenance while providing 100% backward
  compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-13 04:00:42 +00:00

utf8-converter-normalization-coverage.md

docs(strings): document normalization coverage and character mapping analysis

2025-12-13 04:00:42 +00:00