From f384d9b332f747baa84592040962d82d15a61efc Mon Sep 17 00:00:00 2001 From: yv01p Date: Sun, 7 Dec 2025 20:43:57 +0000 Subject: [PATCH] docs: add StringExtensions refactor design document MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comprehensive design for splitting the 1,600-line StringExtensions.cs into 5 focused partial class files with performance optimizations. Key decisions: - Partial class approach to prevent breaking changes - 5 files: Culture, Manipulation, Encoding, Parsing, Sanitization - Performance fixes for regex caching and char operations - 4-phase implementation with individual plans and summaries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- ...12-07-string-extensions-refactor-design.md | 342 ++++++++++++++++++ 1 file changed, 342 insertions(+) create mode 100644 docs/plans/2025-12-07-string-extensions-refactor-design.md diff --git a/docs/plans/2025-12-07-string-extensions-refactor-design.md b/docs/plans/2025-12-07-string-extensions-refactor-design.md new file mode 100644 index 0000000000..65952122f7 --- /dev/null +++ b/docs/plans/2025-12-07-string-extensions-refactor-design.md @@ -0,0 +1,342 @@ +# StringExtensions Refactor Design + +**Date**: 2025-12-07 +**Branch**: `refactor/StringExtensions` +**Status**: Approved + +--- + +## 1. Overview + +### Problem Statement + +The `StringExtensions.cs` file in `Umbraco.Core` is a 1,600-line "utility dumping ground" with: +- Many unrelated string operations mixed together +- Performance issues in several methods +- Difficult to navigate and maintain + +### Goals + +1. **Consolidate & Organize**: Split into logical partial class files +2. **Reduce Complexity**: Each file focuses on one category of operations +3. **Fix Performance**: Optimize methods with known inefficiencies +4. **Maintain Compatibility**: Zero breaking changes + +### Scope + +**In Scope:** +- `src/Umbraco.Core/Extensions/StringExtensions.cs` (1,600 lines) - split and optimize + +**Out of Scope (no changes):** +- `src/Umbraco.Web.Common/Extensions/StringExtensions.cs` - internal, project-specific +- `src/Umbraco.Cms.Persistence.EFCore/StringExtensions.cs` - internal, project-specific +- Test project StringExtensions files - separate concern + +--- + +## 2. File Structure + +Using **partial class** approach to prevent breaking changes: + +``` +src/Umbraco.Core/Extensions/ +├── StringExtensions.cs → DELETE +├── StringExtensions.Culture.cs → NEW +├── StringExtensions.Manipulation.cs → NEW +├── StringExtensions.Encoding.cs → NEW +├── StringExtensions.Parsing.cs → NEW +└── StringExtensions.Sanitization.cs → NEW +``` + +Each file follows this pattern: +```csharp +// Copyright (c) Umbraco. +// See LICENSE for more details. + +namespace Umbraco.Extensions; + +public static partial class StringExtensions +{ + // Methods for this category +} +``` + +--- + +## 3. Method Assignments + +### StringExtensions.Culture.cs (~80 lines) + +**Methods:** +- `InvariantEquals` +- `InvariantStartsWith` +- `InvariantEndsWith` +- `InvariantContains` (both overloads) +- `InvariantIndexOf` +- `InvariantLastIndexOf` +- `InvariantFormat` +- `ToInvariantString` (int and long overloads) +- `EnsureCultureCode` + +**Static Fields:** None + +--- + +### StringExtensions.Manipulation.cs (~300 lines) + +**Methods:** +- `Trim`, `TrimStart`, `TrimEnd` (string overloads) +- `EnsureStartsWith`, `EnsureEndsWith` (all overloads) +- `ToFirstUpper`, `ToFirstLower` (all overloads including culture/invariant) +- `ReplaceMany`, `ReplaceFirst`, `Replace` (StringComparison overload) +- `ReplaceNonAlphanumericChars` (both overloads) +- `ExceptChars` +- `Truncate`, `StripWhitespace`, `StripNewLines`, `ToSingleLine` +- `MakePluralName`, `IsVowel` +- `IsLowerCase`, `IsUpperCase` +- All `IShortStringHelper` wrappers: + - `ToSafeAlias` (3 overloads) + - `ToUrlSegment` (2 overloads) + - `ToCleanString` (4 overloads) + - `SplitPascalCasing` + - `ToSafeFileName` (2 overloads) + - `SpaceCamelCasing` (internal) + +**Static Fields:** +- `Whitespace` (Lazy) - shared, used by `StripWhitespace` + +--- + +### StringExtensions.Encoding.cs (~200 lines) + +**Methods:** +- `GenerateHash` (both overloads) +- `ToSHA1` +- `ToUrlBase64`, `FromUrlBase64` +- `UrlTokenEncode` (note: extends `byte[]`, not `string`) +- `UrlTokenDecode` +- `ConvertToHex`, `DecodeFromHex` +- `EncodeAsGuid`, `ToGuid` +- `CreateGuidFromHash` (internal) +- `SwapByteOrder` (internal) +- `ToCSharpString` +- `EncodeJsString` +- Private `GenerateHash(string, string?)` helper + +**Static Fields:** +- `ToCSharpHexDigitLower` +- `ToCSharpEscapeChars` +- `UrlNamespace` + +**Static Constructor:** Yes (initializes `ToCSharpEscapeChars`) + +--- + +### StringExtensions.Parsing.cs (~150 lines) + +**Methods:** +- `IsNullOrWhiteSpace`, `IfNullOrWhiteSpace`, `OrIfNullOrWhiteSpace`, `NullOrWhiteSpaceAsNull` +- `DetectIsJson`, `DetectIsEmptyJson` +- `ParseInto` (both overloads) +- `EnumTryParse`, `EnumParse` +- `ToDelimitedList` +- `EscapedSplit` +- `ContainsAny`, `CsvContains`, `CountOccurrences` +- `GetIdsFromPathReversed` + +**Static Fields:** +- `JsonEmpties` +- `DefaultEscapedStringEscapeChar` (const) + +--- + +### StringExtensions.Sanitization.cs (~150 lines) + +**Methods:** +- `CleanForXss`, `StripHtml`, `ToValidXmlString` +- `EscapeRegexSpecialCharacters` +- `StripFileExtension`, `GetFileExtension` +- `NormaliseDirectoryPath`, `IsFullPath` +- `AppendQueryStringToUrl` +- `ToFriendlyName` +- `IsEmail` +- `GenerateStreamFromString` (internal) + +**Static Fields:** +- `CleanForXssChars` +- `InvalidXmlChars` (Lazy) + +--- + +## 4. Performance Fixes + +Applied during Phase 3: + +### 4.1 Regex Caching + +| Method | Current Issue | Fix | +|--------|---------------|-----| +| `StripWhitespace` | `Regex.Replace(txt, @"\s", ...)` each call | Create dedicated cached `Lazy` | +| `GetFileExtension` | `new Regex(pattern)` each call | Create cached `Lazy` | +| `StripHtml` | `Regex.Replace(..., Compiled)` each call | Create cached `Lazy` | + +### 4.2 Char Case Checks + +```csharp +// Current (allocates string) +public static bool IsLowerCase(this char ch) => + ch.ToString(CultureInfo.InvariantCulture) == + ch.ToString(CultureInfo.InvariantCulture).ToLowerInvariant(); + +// Fixed (no allocation) +public static bool IsLowerCase(this char ch) => char.IsLower(ch); +public static bool IsUpperCase(this char ch) => char.IsUpper(ch); +``` + +### 4.3 ReplaceNonAlphanumericChars(string) + +Only the `string` overload needs fixing (the `char` overload is already optimized): + +```csharp +// Current (LINQ + string allocations in loop) +foreach (var c in mName.ToCharArray().Where(c => !char.IsLetterOrDigit(c))) +{ + mName = mName.Replace(c.ToString(...), replacement); +} + +// Fixed (single pass with StringBuilder for multi-char replacement) +``` + +--- + +## 5. Testing Strategy + +### Baseline (Phase 1) +- Run existing tests, record pass/fail state +- Run existing benchmarks, record baseline numbers +- Add missing unit tests for methods being optimized + +### Verification (Phase 4) +- Run all unit tests - must match Phase 1 results +- Run benchmarks - compare against baseline +- Expected improvements: + - Regex methods: significant (no repeated compilation) + - Char case checks: ~10-100x faster (no allocation) + - ReplaceNonAlphanumericChars: moderate + +### New Test File +`tests/Umbraco.Tests.UnitTests/Umbraco.Core/Extensions/StringExtensionsRefactorTests.cs` + +### Benchmark Additions +Add to `StringExtensionsBenchmarks.cs`: +- `StripWhitespace_Benchmark` +- `GetFileExtension_Benchmark` +- `StripHtml_Benchmark` +- `IsLowerCase_Benchmark` +- `ReplaceNonAlphanumericChars_Benchmark` + +--- + +## 6. Implementation Phases + +### Workflow for Each Phase + +1. **Create Implementation Plan** - Detailed step-by-step plan +2. **User Review** - User approves the plan +3. **Save Plan** - Write to `docs/plans/` folder +4. **Execute** - Use subagent-driven development to implement +5. **Completion Summary** - Copy plan, update with results, save to `docs/plans/` + +--- + +### Phase 1: Baseline Testing + +**Objective:** Establish baseline test results and performance metrics before any changes. + +**Deliverables:** +- Run existing StringExtensions unit tests +- Run existing benchmarks and record results +- Create new unit tests for methods being optimized +- Create new benchmarks for methods being optimized +- Document baseline metrics + +**Plan Document:** `docs/plans/phase-1-baseline-testing-plan.md` +**Summary Document:** `docs/plans/phase-1-baseline-testing-summary.md` + +--- + +### Phase 2: File Split + +**Objective:** Split `StringExtensions.cs` into 5 partial class files with no functional changes. + +**Deliverables:** +- Create 5 new partial class files +- Delete original `StringExtensions.cs` +- Verify all tests still pass (no behavioral changes) +- Single atomic commit + +**Plan Document:** `docs/plans/phase-2-file-split-plan.md` +**Summary Document:** `docs/plans/phase-2-file-split-summary.md` + +--- + +### Phase 3: Performance Fixes + +**Objective:** Apply performance optimizations to identified methods. + +**Deliverables:** +- Cache regex patterns for `StripWhitespace`, `GetFileExtension`, `StripHtml` +- Optimize `IsLowerCase`, `IsUpperCase` +- Optimize `ReplaceNonAlphanumericChars(string)` +- Verify all tests still pass + +**Plan Document:** `docs/plans/phase-3-performance-fixes-plan.md` +**Summary Document:** `docs/plans/phase-3-performance-fixes-summary.md` + +--- + +### Phase 4: Verification + +**Objective:** Confirm refactor success through testing and benchmarking. + +**Deliverables:** +- Run all unit tests - compare to Phase 1 baseline +- Run all benchmarks - compare to Phase 1 baseline +- Document performance improvements +- Final review and cleanup + +**Plan Document:** `docs/plans/phase-4-verification-plan.md` +**Summary Document:** `docs/plans/phase-4-verification-summary.md` + +--- + +## 7. Risk Mitigation + +| Risk | Mitigation | +|------|------------| +| Breaking changes | Partial class approach preserves class name | +| Method duplication | Atomic commit: create new files + delete old file together | +| Behavioral changes | Comprehensive baseline tests before changes | +| Performance regression | Benchmark before/after comparison | + +--- + +## 8. Files Unchanged + +These files remain untouched: + +| File | Reason | +|------|--------| +| `src/Umbraco.Web.Common/Extensions/StringExtensions.cs` | Internal, project-specific | +| `src/Umbraco.Cms.Persistence.EFCore/StringExtensions.cs` | Internal, project-specific | +| `tests/.../StringExtensions.cs` (various) | Test utilities, separate concern | + +--- + +## Approval + +- [x] Design reviewed and approved +- [ ] Phase 1 plan approved +- [ ] Phase 2 plan approved +- [ ] Phase 3 plan approved +- [ ] Phase 4 plan approved