Files
Umbraco-CMS/docs/plans/2025-12-19-contentservice-refactor-design-critical-review-3.md
yv01p 9962df50ee feat(core): register IContentCrudService in DI container
Adds IContentCrudService registration to UmbracoBuilder alongside
IContentService. Both services are now resolvable from DI.

Includes integration test verifying successful resolution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-21 03:31:23 +00:00

23 KiB

Critical Architectural Review: ContentService Refactoring Design v1.6

Reviewed Document: docs/plans/2025-12-19-contentservice-refactor-design.md Review Date: 2025-12-20 Reviewer Role: Senior Principal Software Architect Document Revision: 1.6 (post-critical-review changes applied) Review Revision: 3.0


Executive Summary

The ContentService refactoring design v1.6 is approved with changes. The design has matured through two prior reviews and addresses the core architectural challenges. This review identifies remaining refinements and documents resolutions for all outstanding issues.

Resolution Summary

Issue Decision
Facade permanence Keep facade, mark for future deprecation
Regression threshold Add automated guard script
Cross-service failures Add failure mode documentation
Phase 9 scope Split into sub-phases 9a-9d
Phase 0 deliverables Add test/benchmark file creation
ContentServiceBase testing Add unit tests
Async expansion Defer to future initiative
Rollback triggers Document for future reference (not required)

1. Overall Assessment

Strengths

  • Mature design - Two prior reviews have addressed major structural issues (naming, transactions, method mapping)
  • Comprehensive documentation - Lock contracts, notification matrix, 80+ method mapping all present
  • Risk-conscious phasing - Clear separation of extraction (0-8), locking (9), optimization (10+)
  • Test-first approach - 15 targeted tests + 33 benchmarks with phase gates
  • Architectural alignment - Follows Umbraco Core patterns (interface-first, notification system, scoping)

Issues Addressed in This Review

Category Severity Issue Resolution
Strategic Medium Facade permanence Future deprecation noted
Operational Medium Manual regression threshold Automated guard added
Technical Low-Medium Cross-service failure modes Documentation added
Process Low Phase 9 scope Split into sub-phases

2. Critical Issues and Resolutions

2.1 Facade Deprecation Strategy

Issue: The design declared IContentService as "permanent public API" with no deprecation path, creating indefinite dual-maintenance burden.

Resolution: The facade remains the stable public API for this refactoring but will be marked for future deprecation.

Required Design Update:

### Facade Deprecation Strategy

> The `IContentService` interface and its facade implementation remain the **stable public API**
> for this refactoring initiative. External consumers should continue using `IContentService`.
>
> **Future Deprecation:** In a future major version, `IContentService` will be marked as
> `[Obsolete]` with guidance to migrate to granular services (`IContentCrudService`,
> `IContentPublishOperationService`, etc.). The deprecation timeline will be announced
> separately and will include:
> - Minimum 2 major versions warning period
> - Migration guide documentation
> - Analyzer rules to identify usage patterns
>
> For now, the granular services are available for:
> - Internal Umbraco code
> - Advanced scenarios requiring fine-grained control
> - Early adopters willing to accept API changes

2.2 Automated Regression Guard

Issue: The design specified "20% degradation triggers investigation (manual, not automated)" which risks regressions slipping through unnoticed.

Resolution: Add an automated guard script to the phase gate process.

Required Design Update - Add to "Benchmark Execution Commands" section:

### Automated Regression Detection

Add this script to CI or use at each phase gate:

```bash
#!/bin/bash
# regression-guard.sh - Fails if any benchmark regresses >20%

set -e

BASELINE_FILE="${1:-baseline.json}"
CURRENT_FILE="${2:-current.json}"
THRESHOLD=${3:-20}

if [ ! -f "$BASELINE_FILE" ] || [ ! -f "$CURRENT_FILE" ]; then
    echo "Usage: regression-guard.sh <baseline.json> <current.json> [threshold]"
    exit 1
fi

# Calculate regressions and count violations
REGRESSION_COUNT=$(jq -s --argjson threshold "$THRESHOLD" '
  .[0].results as $base |
  .[1].results |
  to_entries |
  map(
    select(
      $base[.key] != null and
      ((.value.elapsedMs - $base[.key].elapsedMs) / $base[.key].elapsedMs * 100) > $threshold
    ) |
    {
      name: .key,
      baseline: $base[.key].elapsedMs,
      current: .value.elapsedMs,
      change: (((.value.elapsedMs - $base[.key].elapsedMs) / $base[.key].elapsedMs * 100) | . * 10 | round / 10)
    }
  )
' "$BASELINE_FILE" "$CURRENT_FILE" | jq 'length')

if [ "$REGRESSION_COUNT" -gt 0 ]; then
    echo "ERROR: $REGRESSION_COUNT benchmark(s) regressed more than ${THRESHOLD}%"
    echo ""
    echo "Regressions detected:"
    jq -s --argjson threshold "$THRESHOLD" '
      .[0].results as $base |
      .[1].results |
      to_entries |
      map(
        select(
          $base[.key] != null and
          ((.value.elapsedMs - $base[.key].elapsedMs) / $base[.key].elapsedMs * 100) > $threshold
        )
      ) |
      .[] |
      "  - \(.key): \($base[.key].elapsedMs)ms -> \(.value.elapsedMs)ms (+\(((.value.elapsedMs - $base[.key].elapsedMs) / $base[.key].elapsedMs * 100) | . * 10 | round / 10)%)"
    ' "$BASELINE_FILE" "$CURRENT_FILE" -r
    exit 1
fi

echo "SUCCESS: No regressions exceeding ${THRESHOLD}% threshold"
exit 0

Usage at Phase Gates:

# After running benchmarks at a phase gate
./regression-guard.sh baseline.json current.json 20

# Returns exit code 0 if all benchmarks within threshold
# Returns exit code 1 if any benchmark regresses >20%

---

### 2.3 Cross-Service Failure Mode Documentation

**Issue:** The design documented happy-path orchestration but didn't specify failure scenarios for facade methods that coordinate multiple services.

**Resolution:** Add explicit failure mode documentation.

**Required Design Update - Add new section:**

```markdown
## Failure Mode Documentation

### Orchestrated Operation Failure Behavior

The facade methods that coordinate multiple services use ambient scope transactions. All operations within a scope are atomic - if any operation fails, the entire transaction rolls back.

| Facade Method | Step 1 | Step 2 | Failure During Step 1 | Failure During Step 2 |
|---------------|--------|--------|----------------------|----------------------|
| `MoveToRecycleBin` | Unpublish (if published) | Move to bin | No changes persisted, content remains published | Entire operation rolls back, content remains published at original location |
| `DeleteOfType` | Move descendants to bin | Delete type content | No changes persisted | Entire operation rolls back, descendants remain at original locations |
| `DeleteOfTypes` | Move descendants to bin | Delete types content | No changes persisted | Entire operation rolls back, descendants remain at original locations |

### Transaction Rollback Guarantees

```csharp
// Example: MoveToRecycleBin transaction boundary
public OperationResult MoveToRecycleBin(IContent content, int userId)
{
    using ICoreScope scope = ScopeProvider.CreateCoreScope();

    try
    {
        // Step 1: Unpublish if published
        if (content.Published)
        {
            var unpublishResult = _publishOperationService.Unpublish(content, "*", userId);
            if (!unpublishResult.Success)
            {
                // Scope NOT completed - transaction rolls back
                return OperationResult.Failed(...);
            }
        }

        // Step 2: Move to recycle bin
        var moveResult = _moveService.MoveToRecycleBinInternal(content, userId);
        if (!moveResult.Success)
        {
            // Scope NOT completed - Unpublish also rolls back
            return moveResult;
        }

        scope.Complete(); // Only now does transaction commit
        return moveResult;
    }
    catch (Exception)
    {
        // Scope disposed without Complete() - full rollback
        throw;
    }
}

Notification Failure Handling

If a notification handler throws an exception:

  1. Cancellable notifications (*Saving, *Publishing, etc.): Operation is aborted, transaction rolls back
  2. Post-operation notifications (*Saved, *Published, etc.): Exception propagates, but database changes are already committed

Important: Post-operation notification failures can leave the system in an inconsistent state where database changes are persisted but downstream effects (cache invalidation, webhooks) may be incomplete. This is existing behavior that is preserved in the refactoring.

State After Failed Operations

Scenario Content State Cache State Notifications Fired
Unpublish fails during MoveToRecycleBin Unchanged (published, original location) Unchanged ContentUnpublishingNotification only
Move fails after successful unpublish Unchanged (published, original location)* Unchanged None persisted
Notification handler throws on ContentMovedNotification Moved to bin May be stale Partial

*Due to transaction rollback, the unpublish is also reverted.

// Consumer code handling orchestrated operations
var result = contentService.MoveToRecycleBin(content, userId);
if (!result.Success)
{
    // Content is guaranteed to be in original state
    // No partial changes have been persisted
    logger.LogWarning("MoveToRecycleBin failed: {Status}", result.OperationStatus);
}

---

### 2.4 Phase 9 Split Into Sub-Phases

**Issue:** Phase 9 (Hierarchical Locking) was described as a single phase but encompasses significant infrastructure changes with 58 lock acquisition points to migrate.

**Resolution:** Split Phase 9 into sub-phases for better risk management.

**Required Design Update - Replace Phase 9 in implementation order:**

```markdown
### Phase 9: Locking Optimization (Split)

Phase 9 is split into sub-phases for risk management:

| Sub-Phase | Focus | Gate | Benchmarks |
|-----------|-------|------|------------|
| 9a | Lock hierarchy design + scope infrastructure changes | Design reviewed, infrastructure tests pass | - |
| 9b | Migrate services with existing lock semantics (behavioral parity) | All tests pass, no behavioral change | - |
| 9c | Optimize hot paths (Move, PublishBranch) with granular locks | All tests pass | **Run benchmarks** |
| 9d | Final validation and documentation | All tests pass, lock contracts updated | **Run benchmarks** |

#### Phase 9a: Lock Hierarchy Design

**Deliverables:**
- Lock hierarchy design document (path-based vs operation-based vs hybrid)
- Scope infrastructure changes (if needed)
- Unit tests for new locking primitives

**Possible Approaches:**
```csharp
// Option A: Path-based locks (lock subtree only)
scope.WriteLock(Constants.Locks.ContentTree, content.Path);

// Option B: Operation-specific locks
scope.WriteLock(Constants.Locks.ContentTreeMove);
scope.WriteLock(Constants.Locks.ContentTreePublish);

// Option C: Hybrid (operation + path)
scope.WriteLock(Constants.Locks.ContentTreeMove, content.Path);

Phase 9b: Behavioral Parity Migration

Deliverables:

  • Each service updated to use new lock infrastructure
  • Existing lock behavior preserved (same scope, same duration)
  • All integration tests pass without modification

Key Rule: No optimization in this phase. Goal is to prove the new infrastructure works.

Phase 9c: Hot Path Optimization

Deliverables:

  • PerformMoveLocked optimized (batch updates, reduced lock duration)
  • PublishBranch optimized (subtree locks where applicable)
  • Lock contention tests added

Target Improvements:

  • Move operations: Lock duration reduced from O(n) to O(batch)
  • Branch publish: Concurrent non-overlapping branches enabled

Phase 9d: Validation and Documentation

Deliverables:

  • All lock contracts in interface documentation updated
  • Performance comparison report (baseline vs post-optimization)
  • Lock contention metrics documented

Git Checkpoints:

git tag phase-9a-complete -m "Lock infrastructure ready"
git tag phase-9b-complete -m "Services migrated to new locks (parity)"
git tag phase-9c-complete -m "Hot paths optimized"
git tag phase-9d-complete -m "Phase 9 complete - locking optimization done"

---

### 2.5 Phase 0 Deliverables Expansion

**Issue:** Phase 0 was described as "Write Tests" but didn't explicitly include benchmark file creation or baseline capture.

**Resolution:** Expand Phase 0 deliverables.

**Required Design Update - Replace Phase 0 details:**

```markdown
### Phase 0: Test and Benchmark Infrastructure

**Deliverables Checklist:**

- [ ] Create `ContentServiceRefactoringTests.cs` (15 integration tests)
  - 2 notification ordering tests
  - 3 sort operation tests
  - 3 DeleteOfType tests
  - 4 permission tests
  - 3 transaction boundary tests

- [ ] Create `ContentServiceRefactoringBenchmarks.cs` (33 benchmarks)
  - 7 CRUD operation benchmarks
  - 6 query operation benchmarks
  - 7 publish operation benchmarks
  - 8 move operation benchmarks
  - 4 version operation benchmarks
  - 1 baseline comparison test

- [ ] Create `ContentServiceBaseTests.cs` (unit tests for shared infrastructure)
  - Audit helper method tests
  - Scope provider access pattern tests
  - Logger injection tests

- [ ] Run baseline capture
  ```bash
  dotnet test tests/Umbraco.Tests.Integration \
    --filter "Category=Benchmark&FullyQualifiedName~ContentServiceRefactoringBenchmarks" \
    --logger "console;verbosity=detailed" | tee benchmark-baseline.txt

  # Extract JSON
  grep -oP '\[BENCHMARK_JSON\]\K.*(?=\[/BENCHMARK_JSON\])' benchmark-baseline.txt > baseline.json

  # Tag baseline commit
  git tag phase-0-baseline -m "Baseline benchmarks captured"
  • Verify all 15 tests pass against current ContentService
  • Verify all 33 benchmarks complete without error
  • Commit baseline.json to repository for comparison

Gate: All tests pass, baseline captured, tagged commit created.


---

### 2.6 ContentServiceBase Unit Tests

**Issue:** The design introduces `ContentServiceBase` as shared infrastructure but the 15 new tests focus on behavior validation, not base class coverage.

**Resolution:** Add unit tests for ContentServiceBase.

**Required Design Update - Add to Phase 0 deliverables and Test Strategy section:**

```markdown
### ContentServiceBase Unit Tests

Create `tests/Umbraco.Tests.UnitTests/Umbraco.Infrastructure/Services/ContentServiceBaseTests.cs`:

```csharp
[TestFixture]
public class ContentServiceBaseTests
{
    // Audit helper method tests
    [Test]
    public void Audit_WithValidParameters_CreatesAuditEntry()
    {
        // Arrange
        var auditService = Substitute.For<IAuditService>();
        var service = CreateTestService(auditService: auditService);

        // Act
        service.TestAudit(AuditType.Save, userId: 1, objectId: 100, message: "Test");

        // Assert
        auditService.Received(1).Write(
            Arg.Is<int>(1),
            Arg.Is<string>("Test"),
            Arg.Any<string>(),
            Arg.Is<int>(100));
    }

    [Test]
    public void Audit_WithNullMessage_UsesDefaultMessage()
    {
        // Verify default audit message behavior
    }

    // Scope provider access pattern tests
    [Test]
    public void CreateScope_ReturnsValidCoreScope()
    {
        // Verify scope creation works correctly
    }

    [Test]
    public void CreateScope_WithAmbientScope_ReusesExisting()
    {
        // Verify ambient scope detection
    }

    // Logger injection tests
    [Test]
    public void Logger_IsInjectedCorrectly()
    {
        // Verify logger is accessible and functional
    }

    [Test]
    public void Logger_UsesCorrectCategoryName()
    {
        // Verify log category matches service type
    }

    // Repository access tests
    [Test]
    public void DocumentRepository_IsAccessibleWithinScope()
    {
        // Verify repository access pattern
    }
}

Test Count: 7 unit tests for ContentServiceBase infrastructure.

Updated Test Summary:

Test File Test Count Purpose
ContentServiceRefactoringTests.cs 15 Integration tests for behavior validation
ContentServiceRefactoringBenchmarks.cs 33 Performance benchmarks
ContentServiceBaseTests.cs 7 Unit tests for shared infrastructure
Total 55

---

### 2.7 Async Expansion Strategy

**Issue:** The design mentioned async overloads for long-running operations but interface definitions showed only `EmptyRecycleBinAsync`.

**Resolution:** Async expansion is deferred to a future initiative.

**Required Design Update - Replace Async Considerations section:**

```markdown
## Async Considerations

### Current State

Only `EmptyRecycleBinAsync` currently has an async variant. This refactoring preserves the existing async surface.

### Future Async Expansion (Deferred)

The following operations are candidates for async variants in a future initiative:

| Operation | Reason for Async | Priority |
|-----------|-----------------|----------|
| `PublishBranch` | Tree traversal with many DB operations | High |
| `DeleteOfType`/`DeleteOfTypes` | Bulk operations | Medium |
| `Save` (batch) | Large batch operations | Medium |
| `Copy` (recursive) | Deep copy operations | Low |

**Decision:** Async expansion is out of scope for this refactoring. A separate initiative should address async patterns across all Umbraco services consistently.

**Tracking:** Create issue "Async service methods initiative" post-refactoring.

2.8 Production Rollback Triggers (Future Reference)

Issue: No guidance for production deployment rollback decisions.

Resolution: Document rollback triggers for future reference, but they are not required for this initiative.

Required Design Update - Add new section:

## Production Deployment Guidance (Future Reference)

This section documents rollback considerations for production deployments. These are **not required** for this refactoring initiative but are preserved for future reference.

### Rollback Trigger Indicators

If deploying phases incrementally to production, consider rollback if:

| Indicator | Threshold | Action |
|-----------|-----------|--------|
| Lock timeout rate | >5% increase | Investigate, consider pausing |
| P99 latency | >50% degradation | Rollback to previous phase |
| Error rate | >1% increase | Rollback immediately |
| CPU utilization | >30% increase sustained | Investigate before proceeding |

### Rollback Procedure

```bash
# Rollback to previous phase
git checkout phase-{N-1}-complete

# Rebuild and redeploy
dotnet build -c Release
# ... deployment steps ...

Phase-Specific Risks

Phase Risk Level Special Considerations
1-3 Low Read-heavy, minimal locking changes
4 Medium Move/Copy affect tree structure
5 High Publishing is core functionality
6-7 Low Permissions and blueprints are isolated
8 Medium Facade wiring affects all operations
9a-9d High Locking changes affect concurrency

Monitoring Recommendations

For production deployments, ensure monitoring covers:

  • Content operation latency (Save, Publish, Move)
  • Lock acquisition time
  • Database connection pool utilization
  • Cache hit/miss rates
  • Notification handler execution time

Note: This guidance is for future production deployments. The refactoring should be thoroughly tested in non-production environments before any production consideration.


---

## 3. Alternative Architectural Approaches

### Considered: Event-Sourced State Changes

An alternative approach using event sourcing was considered:
- All content mutations emit domain events
- Repository changes become projections
- Notifications become event handlers
- Locking becomes optimistic concurrency

**Decision:** The current approach (decomposition with existing patterns) is appropriate. Event sourcing would require a massive architectural shift that doesn't fit Umbraco's existing patterns. If Phase 9 locking optimization proves insufficient for high-concurrency scenarios, event sourcing could be revisited as a longer-term solution.

---

## 4. Final Recommendation

### Approved With Required Changes

The design is approved for implementation after incorporating the changes documented in this review.

### Required Changes Summary

| Priority | Change | Section |
|----------|--------|---------|
| **P0** | Add facade deprecation strategy | 2.1 |
| **P0** | Add automated regression guard script | 2.2 |
| **P0** | Add failure mode documentation | 2.3 |
| **P0** | Split Phase 9 into 9a-9d | 2.4 |
| **P0** | Expand Phase 0 deliverables | 2.5 |
| **P1** | Add ContentServiceBase unit tests | 2.6 |
| **P1** | Document async expansion as future work | 2.7 |
| **P2** | Add production rollback guidance | 2.8 |

### Implementation Readiness

Once the above changes are incorporated into the design document (resulting in version 1.7), implementation can proceed with Phase 0.

---

## Appendix A: Review History

| Review | Date | Key Issues | Resolution |
|--------|------|------------|------------|
| 1.0 | 2025-12-19 | Naming collision, transaction boundaries, method mapping | Addressed in v1.1-1.3 |
| 2.0 | 2025-12-20 | Lock contention, benchmark infrastructure, Phase 9 scope | Addressed in v1.4-1.6 |
| 3.0 | 2025-12-20 | Facade strategy, regression automation, failure modes | This review |

## Appendix B: Cumulative Issue Tracker

| Issue | Review Identified | Resolution | Design Version |
|-------|------------------|------------|----------------|
| Naming collision | 1.0 | Renamed to IContentPublishOperationService | 1.1 |
| Transaction boundaries | 1.0 | Ambient scope pattern documented | 1.2 |
| Method mapping | 1.0 | Complete 80+ method mapping | 1.2 |
| Public/internal classification | 1.0 | Query and Versioning promoted to public | 1.2 |
| Notification matrix | 1.0 | Matrix added | 1.3 |
| Lock contention | 2.0 | Deferred to Phase 9 | 1.5 |
| Benchmark infrastructure | 2.0 | Integration test pattern | 1.5 |
| Git checkpoints | 2.0 | Tagging convention added | 1.6 |
| Facade permanence | 2.0 | Future deprecation noted | **1.7** |
| Regression automation | 3.0 | Guard script added | **1.7** |
| Failure modes | 3.0 | Documentation added | **1.7** |
| Phase 9 scope | 3.0 | Split into 9a-9d | **1.7** |
| Phase 0 deliverables | 3.0 | Expanded checklist | **1.7** |
| ContentServiceBase tests | 3.0 | Unit tests added | **1.7** |
| Async expansion | 3.0 | Deferred to future | **1.7** |
| Rollback triggers | 3.0 | Documented for reference | **1.7** |

## Appendix C: Updated Phase Structure

| Phase | Description | Gate | Benchmarks |
|-------|-------------|------|------------|
| 0 | Test and benchmark infrastructure | All tests pass, baseline captured | **Baseline** |
| 1 | CRUD Service | All tests pass | **Run** |
| 2 | Query Service | All tests pass | - |
| 3 | Version Service | All tests pass | - |
| 4 | Move Service | All tests pass | - |
| 5 | Publish Operation Service | All tests pass | **Run** |
| 6 | Permission Manager | All tests pass | - |
| 7 | Blueprint Manager | All tests pass | - |
| 8 | Facade | Full suite passes | **Run** |
| 9a | Lock hierarchy design | Design reviewed | - |
| 9b | Lock migration (parity) | All tests pass | - |
| 9c | Hot path optimization | All tests pass | **Run** |
| 9d | Validation and documentation | All tests pass | **Run** |
| 10+ | Performance optimization | Per-optimization | Per-optimization |

---

*End of Critical Architectural Review v3.0*