From 43031ca472c8895be650883cc530496ad4156502 Mon Sep 17 00:00:00 2001 From: yv01p Date: Thu, 25 Dec 2025 00:03:35 +0000 Subject: [PATCH] docs: add further refactoring recommendations for ContentService MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Post-Phase 8 analysis identifying improvement opportunities: - High priority: batch HasChildren to fix 142% regression - Medium priority: N+1 query fixes (GetIdsForKeys, GetSchedulesByContentIds) - Low priority: split ContentPublishOperationService (1758 lines) Includes implementation roadmap and success metrics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../FurtherRefactoringRecommendations.md | 325 ++++++++++++++++++ 1 file changed, 325 insertions(+) create mode 100644 docs/plans/FurtherRefactoringRecommendations.md diff --git a/docs/plans/FurtherRefactoringRecommendations.md b/docs/plans/FurtherRefactoringRecommendations.md new file mode 100644 index 0000000000..87d39a82f9 --- /dev/null +++ b/docs/plans/FurtherRefactoringRecommendations.md @@ -0,0 +1,325 @@ +# ContentService Refactoring - Further Recommendations + +**Generated:** 2025-12-25 +**Branch:** `refactor/ContentService` +**Status:** Post-Phase 8 analysis + +--- + +## Executive Summary + +The ContentService refactoring successfully reduced a 3800-line monolithic class to a 923-line facade with 7 specialized services. Performance benchmarks show an overall 4.1% improvement, but identified specific areas for further optimization. + +This document outlines actionable recommendations to address: +- One significant performance regression (HasChildren +142%) +- Unimplemented N+1 query optimizations from the original design +- Architectural improvements for long-term maintainability + +--- + +## High Priority Recommendations + +### 1. Fix HasChildren Regression + +**Problem:** The `HasChildren_100Nodes` benchmark shows a +142% regression (65ms → 157ms). + +**Root Cause:** Each `HasChildren(int id)` call creates a new scope and executes a separate COUNT query: + +```csharp +// Current implementation in ContentCrudService.cs:380-388 +public bool HasChildren(int id) +{ + using (ICoreScope scope = ScopeProvider.CreateCoreScope(autoComplete: true)) + { + scope.ReadLock(Constants.Locks.ContentTree); + IQuery? query = Query()?.Where(x => x.ParentId == id); + var count = DocumentRepository.Count(query); + return count > 0; + } +} +``` + +When called 100 times (as in the benchmark), this creates 100 scopes, acquires 100 read locks, and executes 100 database queries. + +**Solution:** Add a batch `HasChildren` overload: + +```csharp +// Add to IContentCrudService.cs +/// +/// Checks if multiple content items have children. +/// +/// The content IDs to check. +/// Dictionary mapping each ID to whether it has children. +/// +/// Performance: Single database query regardless of input size. +/// Use this instead of calling HasChildren(int) in a loop. +/// +IReadOnlyDictionary HasChildren(IEnumerable ids); +``` + +```csharp +// Implementation in ContentCrudService.cs +public IReadOnlyDictionary HasChildren(IEnumerable ids) +{ + var idList = ids.ToList(); + if (idList.Count == 0) + return new Dictionary(); + + using var scope = ScopeProvider.CreateCoreScope(autoComplete: true); + scope.ReadLock(Constants.Locks.ContentTree); + + // Single query: SELECT ParentId, COUNT(*) FROM umbracoNode + // WHERE ParentId IN (...) GROUP BY ParentId + var childCounts = DocumentRepository.CountChildrenByParentIds(idList); + + return idList.ToDictionary( + id => id, + id => childCounts.GetValueOrDefault(id, 0) > 0 + ); +} +``` + +**Repository addition required:** + +```csharp +// Add to IDocumentRepository.cs +IReadOnlyDictionary CountChildrenByParentIds(IEnumerable parentIds); +``` + +**Effort:** 2-4 hours +**Impact:** Reduces 100 database round-trips to 1, fixing the 142% regression + +--- + +## Medium Priority Recommendations + +### 2. Implement Planned N+1 Query Fixes + +The original design document (`2025-12-19-contentservice-refactor-design.md`) lists batch operations that were planned but not implemented: + +| Planned Method | Purpose | Location | +|----------------|---------|----------| +| `GetIdsForKeys(Guid[] keys)` | Batch key-to-id resolution | IIdKeyMap | +| `GetSchedulesByContentIds(int[] ids)` | Batch schedule lookups | ContentScheduleRepository | +| `ArePathsPublished(int[] contentIds)` | Batch path validation | ContentCrudService | +| `GetParents(int[] contentIds)` | Batch ancestor lookups | ContentCrudService | + +**Key hotspots identified in design doc:** +- `GetContentSchedulesByIds` (line 1025-1049): N+1 in `_idKeyMap.GetIdForKey` calls +- `IsPathPublishable` (line 1070): Repeated single-item lookups +- `GetAncestors` (line 792): Repeated single-item lookups + +**Recommended implementation order:** +1. `GetIdsForKeys` - Used across multiple services +2. `GetSchedulesByContentIds` - Direct performance impact +3. `ArePathsPublished` - Affects publish operations +4. `GetParents` - Affects tree operations + +**Effort:** 4-8 hours per method +**Impact:** Eliminates N+1 query patterns in identified hotspots + +--- + +### 3. Reduce Scope Creation Overhead + +Many single-item operations create a new scope per call. For repeated operations, this adds measurable overhead. + +**Current pattern:** +```csharp +public bool HasChildren(int id) +{ + using (ICoreScope scope = ScopeProvider.CreateCoreScope(autoComplete: true)) + { + // ... operation + } +} +``` + +**Improved pattern - add internal overloads:** +```csharp +// Public API - creates scope +public bool HasChildren(int id) +{ + using var scope = ScopeProvider.CreateCoreScope(autoComplete: true); + scope.ReadLock(Constants.Locks.ContentTree); + return HasChildrenInternal(id); +} + +// Internal - reuses existing scope (for batch operations) +internal bool HasChildrenInternal(int id) +{ + IQuery? query = Query()?.Where(x => x.ParentId == id); + return DocumentRepository.Count(query) > 0; +} + +// Batch operation reuses scope for all items +public IReadOnlyDictionary HasChildren(IEnumerable ids) +{ + using var scope = ScopeProvider.CreateCoreScope(autoComplete: true); + scope.ReadLock(Constants.Locks.ContentTree); + + // Prefer single query, but fallback shows the pattern + return ids.ToDictionary(id => id, HasChildrenInternal); +} +``` + +**Effort:** 2-4 hours +**Impact:** Reduces overhead for internal batch operations + +--- + +### 4. Add Missing Lock Documentation + +The design document specifies that each service should document its lock contracts. This was partially implemented but could be more comprehensive. + +**Current state:** Some methods have lock documentation in XML comments. + +**Recommended additions:** + +```csharp +/// +/// Saves content items. +/// +/// +/// Lock contract: Acquires WriteLock(ContentTree) before any modifications. +/// Cache invalidation: Fires ContentCacheRefresher for all saved items. +/// Notifications: ContentSavingNotification (cancellable), ContentSavedNotification. +/// +OperationResult Save(IContent content, int userId = Constants.Security.SuperUserId); +``` + +**Effort:** 2-4 hours +**Impact:** Improved developer experience and debugging + +--- + +## Low Priority Recommendations + +### 5. Consider Splitting ContentPublishOperationService + +At 1758 lines, `ContentPublishOperationService` is the largest service - exceeding the original goal of keeping services under 800 lines. + +**Current line counts:** +| Service | Lines | +|---------|------:| +| ContentService (facade) | 923 | +| ContentCrudService | 806 | +| ContentPublishOperationService | **1758** | +| ContentMoveOperationService | 605 | +| ContentVersionOperationService | 230 | +| ContentQueryOperationService | 169 | + +**Potential extractions:** + +1. **ContentScheduleService** (~200-300 lines) + - `PerformScheduledPublish` + - `GetContentSchedulesByIds` + - Schedule-related validation + +2. **ContentPublishBranchService** (~300-400 lines) + - `PublishBranch` + - `PublishBranchInternal` + - Branch validation logic + +**Effort:** 8-16 hours +**Impact:** Better maintainability, clearer separation of concerns + +--- + +### 6. Add Performance Documentation to Interfaces + +Help developers choose the right methods by documenting performance characteristics: + +```csharp +/// +/// Gets a single content item by ID. +/// +/// +/// Performance: O(1) database query. +/// For multiple items, use +/// to avoid N+1 queries. +/// +IContent? GetById(int id); + +/// +/// Gets multiple content items by ID. +/// +/// +/// Performance: Single database query regardless of input size. +/// Preferred over calling GetById in a loop. +/// +IEnumerable GetByIds(IEnumerable ids); +``` + +**Effort:** 2-4 hours +**Impact:** Helps developers avoid performance pitfalls + +--- + +### 7. Address Remaining TODO Comments + +One TODO was found in the codebase: + +```csharp +// src/Umbraco.Infrastructure/Services/ContentListViewServiceBase.cs:227 +// TODO: Optimize the way we filter out only the nodes the user is allowed to see - +// instead of checking one by one +``` + +This suggests a similar N+1 pattern in permission checking that could benefit from batch optimization. + +**Effort:** 4-8 hours +**Impact:** Improved list view performance + +--- + +## Implementation Roadmap + +### Phase 1: Quick Wins (1-2 days) +- [ ] Implement batch `HasChildren(IEnumerable)` +- [ ] Add `CountChildrenByParentIds` to repository +- [ ] Update benchmark to use batch method where appropriate + +### Phase 2: N+1 Elimination (3-5 days) +- [ ] Implement `GetIdsForKeys(Guid[] keys)` in IIdKeyMap +- [ ] Implement `GetSchedulesByContentIds(int[] ids)` +- [ ] Add internal scope-reusing method overloads + +### Phase 3: Documentation & Polish (1-2 days) +- [ ] Add lock contract documentation to all public methods +- [ ] Add performance documentation to interfaces +- [ ] Update design document with lessons learned + +### Phase 4: Architectural (Optional, 1-2 weeks) +- [ ] Evaluate splitting ContentPublishOperationService +- [ ] Address ContentListViewServiceBase TODO +- [ ] Implement remaining batch methods (ArePathsPublished, GetParents) + +--- + +## Success Metrics + +After implementing these recommendations, re-run benchmarks and verify: + +| Metric | Current | Target | +|--------|---------|--------| +| HasChildren_100Nodes | 157ms | <70ms (back to baseline) | +| Overall regression count | 5 | 0 (none >20%) | +| ContentPublishOperationService size | 1758 lines | <1000 lines | +| N+1 hotspots | 4 identified | 0 | + +--- + +## References + +- **Benchmark Results:** `docs/plans/PerformanceBenchmarks.md` +- **Original Design:** `docs/plans/2025-12-19-contentservice-refactor-design.md` +- **Phase Summaries:** `docs/plans/2025-12-2*-contentservice-refactor-phase*-implementation-summary*.md` + +--- + +## Conclusion + +The ContentService refactoring achieved its primary goals of improved maintainability and code organization. The recommendations in this document represent "Phase 2" optimizations that would further improve performance and address technical debt identified during the refactoring process. + +The highest-impact change is implementing batch `HasChildren`, which would resolve the only significant performance regression and establish a pattern for eliminating other N+1 queries throughout the codebase.