Commit Graph

5 Commits

Author SHA1 Message Date
1102b34e88 feat(strings): implement SIMD-optimized Utf8ToAsciiConverterNew with golden file tests
Implements Task 4 of the Utf8ToAsciiConverter refactor plan.

Key features:
- SIMD-optimized ASCII detection using SearchValues (AVX-512 capable)
- Unicode normalization for accented characters (FormD decomposition)
- FrozenDictionary for ligatures, Cyrillic, and special Latin mappings
- Span-based API for zero-allocation scenarios
- ArrayPool usage for temporary buffers
- Comprehensive test coverage (21 unit tests, all passing)

Implementation details:
- Fast path for pure ASCII input (no conversion needed)
- Dictionary lookup for special cases (ligatures, Cyrillic, etc.)
- Unicode normalization fallback for accented characters
- Control character stripping and whitespace normalization
- Proper surrogate pair handling

Test coverage:
- Null/empty string handling
- ASCII fast path verification
- Accented character normalization (café → cafe)
- Ligature expansion (Æ → AE, ß → ss, Œ → OE)
- Cyrillic transliteration (Москва → Moskva, Щ → Shch)
- Special Latin characters (Ł → L, Ø → O, Þ → TH)
- Span API for zero-allocation scenarios
- Mixed content handling

Golden file tests are included for regression testing against the original
implementation, though they require test data file configuration to run.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 00:13:11 +00:00
72dfd667c5 test(strings): add edge case tests for CharacterMappingLoader
Add comprehensive edge case testing for CharacterMappingLoader:
- Test priority override behavior (user mappings vs built-in)
- Test graceful handling of invalid user mapping files
- Test multi-character key warning logging
- Add logging for multi-character keys that are skipped

All tests pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-13 00:01:52 +00:00
ca05d69be2 feat(strings): implement CharacterMappingLoader for JSON-based character mappings
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-12 23:52:41 +00:00
486aa6be81 feat(strings): add character mapping JSON files and golden test data
- Extract 1,308 character mappings from original Utf8ToAsciiConverter.cs switch statement
- Create golden-mappings.json test data file with complete mappings for regression testing
- Create ligatures.json (14 mappings: Æ, Œ, IJ, ß, ff, fi, fl, ffi, ffl, st ligatures)
- Create special-latin.json (14 mappings: Ð, Đ, Ħ, Ł, Ŀ, Ø, Þ, Ŧ and lowercase variants)
- Create cyrillic.json (66 mappings: Russian Cyrillic alphabet transliteration)
- Update Umbraco.Core.csproj to embed JSON files as resources
- Verified embedded resources in compiled DLL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-12 23:38:33 +00:00
f750f37a32 feat(strings): add IUtf8ToAsciiConverter and ICharacterMappingLoader interfaces
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-12 23:31:53 +00:00