Comprehensive design for splitting the 1,600-line StringExtensions.cs into 5 focused partial class files with performance optimizations. Key decisions: - Partial class approach to prevent breaking changes - 5 files: Culture, Manipulation, Encoding, Parsing, Sanitization - Performance fixes for regex caching and char operations - 4-phase implementation with individual plans and summaries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9.2 KiB
StringExtensions Refactor Design
Date: 2025-12-07
Branch: refactor/StringExtensions
Status: Approved
1. Overview
Problem Statement
The StringExtensions.cs file in Umbraco.Core is a 1,600-line "utility dumping ground" with:
- Many unrelated string operations mixed together
- Performance issues in several methods
- Difficult to navigate and maintain
Goals
- Consolidate & Organize: Split into logical partial class files
- Reduce Complexity: Each file focuses on one category of operations
- Fix Performance: Optimize methods with known inefficiencies
- Maintain Compatibility: Zero breaking changes
Scope
In Scope:
src/Umbraco.Core/Extensions/StringExtensions.cs(1,600 lines) - split and optimize
Out of Scope (no changes):
src/Umbraco.Web.Common/Extensions/StringExtensions.cs- internal, project-specificsrc/Umbraco.Cms.Persistence.EFCore/StringExtensions.cs- internal, project-specific- Test project StringExtensions files - separate concern
2. File Structure
Using partial class approach to prevent breaking changes:
src/Umbraco.Core/Extensions/
├── StringExtensions.cs → DELETE
├── StringExtensions.Culture.cs → NEW
├── StringExtensions.Manipulation.cs → NEW
├── StringExtensions.Encoding.cs → NEW
├── StringExtensions.Parsing.cs → NEW
└── StringExtensions.Sanitization.cs → NEW
Each file follows this pattern:
// Copyright (c) Umbraco.
// See LICENSE for more details.
namespace Umbraco.Extensions;
public static partial class StringExtensions
{
// Methods for this category
}
3. Method Assignments
StringExtensions.Culture.cs (~80 lines)
Methods:
InvariantEqualsInvariantStartsWithInvariantEndsWithInvariantContains(both overloads)InvariantIndexOfInvariantLastIndexOfInvariantFormatToInvariantString(int and long overloads)EnsureCultureCode
Static Fields: None
StringExtensions.Manipulation.cs (~300 lines)
Methods:
Trim,TrimStart,TrimEnd(string overloads)EnsureStartsWith,EnsureEndsWith(all overloads)ToFirstUpper,ToFirstLower(all overloads including culture/invariant)ReplaceMany,ReplaceFirst,Replace(StringComparison overload)ReplaceNonAlphanumericChars(both overloads)ExceptCharsTruncate,StripWhitespace,StripNewLines,ToSingleLineMakePluralName,IsVowelIsLowerCase,IsUpperCase- All
IShortStringHelperwrappers:ToSafeAlias(3 overloads)ToUrlSegment(2 overloads)ToCleanString(4 overloads)SplitPascalCasingToSafeFileName(2 overloads)SpaceCamelCasing(internal)
Static Fields:
Whitespace(Lazy) - shared, used byStripWhitespace
StringExtensions.Encoding.cs (~200 lines)
Methods:
GenerateHash(both overloads)ToSHA1ToUrlBase64,FromUrlBase64UrlTokenEncode(note: extendsbyte[], notstring)UrlTokenDecodeConvertToHex,DecodeFromHexEncodeAsGuid,ToGuidCreateGuidFromHash(internal)SwapByteOrder(internal)ToCSharpStringEncodeJsString- Private
GenerateHash(string, string?)helper
Static Fields:
ToCSharpHexDigitLowerToCSharpEscapeCharsUrlNamespace
Static Constructor: Yes (initializes ToCSharpEscapeChars)
StringExtensions.Parsing.cs (~150 lines)
Methods:
IsNullOrWhiteSpace,IfNullOrWhiteSpace,OrIfNullOrWhiteSpace,NullOrWhiteSpaceAsNullDetectIsJson,DetectIsEmptyJsonParseInto(both overloads)EnumTryParse,EnumParseToDelimitedListEscapedSplitContainsAny,CsvContains,CountOccurrencesGetIdsFromPathReversed
Static Fields:
JsonEmptiesDefaultEscapedStringEscapeChar(const)
StringExtensions.Sanitization.cs (~150 lines)
Methods:
CleanForXss,StripHtml,ToValidXmlStringEscapeRegexSpecialCharactersStripFileExtension,GetFileExtensionNormaliseDirectoryPath,IsFullPathAppendQueryStringToUrlToFriendlyNameIsEmailGenerateStreamFromString(internal)
Static Fields:
CleanForXssCharsInvalidXmlChars(Lazy)
4. Performance Fixes
Applied during Phase 3:
4.1 Regex Caching
| Method | Current Issue | Fix |
|---|---|---|
StripWhitespace |
Regex.Replace(txt, @"\s", ...) each call |
Create dedicated cached Lazy<Regex> |
GetFileExtension |
new Regex(pattern) each call |
Create cached Lazy<Regex> |
StripHtml |
Regex.Replace(..., Compiled) each call |
Create cached Lazy<Regex> |
4.2 Char Case Checks
// Current (allocates string)
public static bool IsLowerCase(this char ch) =>
ch.ToString(CultureInfo.InvariantCulture) ==
ch.ToString(CultureInfo.InvariantCulture).ToLowerInvariant();
// Fixed (no allocation)
public static bool IsLowerCase(this char ch) => char.IsLower(ch);
public static bool IsUpperCase(this char ch) => char.IsUpper(ch);
4.3 ReplaceNonAlphanumericChars(string)
Only the string overload needs fixing (the char overload is already optimized):
// Current (LINQ + string allocations in loop)
foreach (var c in mName.ToCharArray().Where(c => !char.IsLetterOrDigit(c)))
{
mName = mName.Replace(c.ToString(...), replacement);
}
// Fixed (single pass with StringBuilder for multi-char replacement)
5. Testing Strategy
Baseline (Phase 1)
- Run existing tests, record pass/fail state
- Run existing benchmarks, record baseline numbers
- Add missing unit tests for methods being optimized
Verification (Phase 4)
- Run all unit tests - must match Phase 1 results
- Run benchmarks - compare against baseline
- Expected improvements:
- Regex methods: significant (no repeated compilation)
- Char case checks: ~10-100x faster (no allocation)
- ReplaceNonAlphanumericChars: moderate
New Test File
tests/Umbraco.Tests.UnitTests/Umbraco.Core/Extensions/StringExtensionsRefactorTests.cs
Benchmark Additions
Add to StringExtensionsBenchmarks.cs:
StripWhitespace_BenchmarkGetFileExtension_BenchmarkStripHtml_BenchmarkIsLowerCase_BenchmarkReplaceNonAlphanumericChars_Benchmark
6. Implementation Phases
Workflow for Each Phase
- Create Implementation Plan - Detailed step-by-step plan
- User Review - User approves the plan
- Save Plan - Write to
docs/plans/folder - Execute - Use subagent-driven development to implement
- Completion Summary - Copy plan, update with results, save to
docs/plans/
Phase 1: Baseline Testing
Objective: Establish baseline test results and performance metrics before any changes.
Deliverables:
- Run existing StringExtensions unit tests
- Run existing benchmarks and record results
- Create new unit tests for methods being optimized
- Create new benchmarks for methods being optimized
- Document baseline metrics
Plan Document: docs/plans/phase-1-baseline-testing-plan.md
Summary Document: docs/plans/phase-1-baseline-testing-summary.md
Phase 2: File Split
Objective: Split StringExtensions.cs into 5 partial class files with no functional changes.
Deliverables:
- Create 5 new partial class files
- Delete original
StringExtensions.cs - Verify all tests still pass (no behavioral changes)
- Single atomic commit
Plan Document: docs/plans/phase-2-file-split-plan.md
Summary Document: docs/plans/phase-2-file-split-summary.md
Phase 3: Performance Fixes
Objective: Apply performance optimizations to identified methods.
Deliverables:
- Cache regex patterns for
StripWhitespace,GetFileExtension,StripHtml - Optimize
IsLowerCase,IsUpperCase - Optimize
ReplaceNonAlphanumericChars(string) - Verify all tests still pass
Plan Document: docs/plans/phase-3-performance-fixes-plan.md
Summary Document: docs/plans/phase-3-performance-fixes-summary.md
Phase 4: Verification
Objective: Confirm refactor success through testing and benchmarking.
Deliverables:
- Run all unit tests - compare to Phase 1 baseline
- Run all benchmarks - compare to Phase 1 baseline
- Document performance improvements
- Final review and cleanup
Plan Document: docs/plans/phase-4-verification-plan.md
Summary Document: docs/plans/phase-4-verification-summary.md
7. Risk Mitigation
| Risk | Mitigation |
|---|---|
| Breaking changes | Partial class approach preserves class name |
| Method duplication | Atomic commit: create new files + delete old file together |
| Behavioral changes | Comprehensive baseline tests before changes |
| Performance regression | Benchmark before/after comparison |
8. Files Unchanged
These files remain untouched:
| File | Reason |
|---|---|
src/Umbraco.Web.Common/Extensions/StringExtensions.cs |
Internal, project-specific |
src/Umbraco.Cms.Persistence.EFCore/StringExtensions.cs |
Internal, project-specific |
tests/.../StringExtensions.cs (various) |
Test utilities, separate concern |
Approval
- Design reviewed and approved
- Phase 1 plan approved
- Phase 2 plan approved
- Phase 3 plan approved
- Phase 4 plan approved