Commit Graph

2 Commits

Author SHA1 Message Date
45edc5916b docs(strings): fix Cyrillic mapping examples to match actual implementation
Update design doc and implementation plan to reflect that the actual
Cyrillic mappings use simplified transliterations for backward
compatibility with existing Umbraco URLs:
- Щ→"Sh" (not "Shch")
- Ц→"F" (not "Ts")

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 04:28:42 +00:00
6adb654ec4 docs(strings): document normalization coverage and character mapping analysis
Added comprehensive analysis of Utf8ToAsciiConverter normalization coverage:
- Created Utf8ToAsciiConverterNormalizationCoverageTests to analyze which
  character mappings are covered by Unicode normalization vs require dictionary
- Generated utf8-converter-normalization-coverage.md documentation with:
  - Coverage statistics: 487/1308 (37.2%) covered by normalization
  - Detailed categorization of 821 dictionary-required characters
  - Breakdown by category: ligatures, special Latin, Cyrillic, punctuation,
    numbers, and extended Latin
  - Examples and rationale for each category
  - Language coverage analysis
  - Design rationale and future extensibility notes

Key findings:
- Normalization automatically handles common European accented characters
  (French, Spanish, German, Polish, Czech, Vietnamese, etc.)
- Dictionary required for: ligatures (Æ, Œ, ß, ff, fi), special Latin
  (Ð, Þ, Ø, Ł), Cyrillic transliteration, symbols, and numbers
- Two-tier approach reduces maintenance while providing 100% backward
  compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 04:00:42 +00:00