Performance & Scalability Guide
This guide covers performance optimization strategies, scalability patterns, and tuning recommendations for Tenant Operator.
Performance Architecture
Three-Layer Reconciliation Strategy
Tenant Operator uses a sophisticated multi-layer approach for optimal performance:
┌─────────────────────────────────────────────────┐
│ Layer 1: Event-Driven (Immediate) │
│ - Watch predicates filter changes │
│ - Only Generation/Annotation changes trigger │
│ - Namespace changes via labels │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Layer 2: Periodic Reconciliation (30 seconds) │
│ - Fast status reflection │
│ - Child resource status changes │
│ - Drift detection │
└─────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────┐
│ Layer 3: Database Sync (Configurable, 1 min) │
│ - Registry syncs with datasource │
│ - Create/Update/Delete Tenant CRs │
└─────────────────────────────────────────────────┘Key Optimizations
1. Smart Watch Predicates ✅
Filters unnecessary reconciliations by watching only meaningful changes:
// Only reconcile on:
// - Generation changes (spec updates)
// - Annotation changes
// - Excludes status-only updates
ownedResourcePredicate := predicate.Or(
predicate.GenerationChangedPredicate{},
predicate.AnnotationChangedPredicate{},
)Impact:
- 70-80% reduction in reconciliation overhead
- Eliminates status update loops
- CPU usage reduced by ~50%
2. Fast Requeue Interval ✅
Changed from 5 minutes to 30 seconds:
# Before: 5 minute requeue
return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
# After: 30 second requeue
return ctrl.Result{RequeueAfter: 30 * time.Second}, nilImpact:
- Child resource status reflected within 30s
- 10x faster status updates
- Maintains balance with cluster load
3. Namespace Tracking ✅
Label-based tracking for Namespaces (no ownerReferences needed):
// Automatic labels added to Namespaces
labels["kubernetes-tenants.org/tenant"] = tenant.Name
labels["kubernetes-tenants.org/tenant-namespace"] = tenant.NamespaceImpact:
- Immediate namespace change detection
- No polling required
- Efficient label-based queries
4. Server-Side Apply (SSA)
Default patch strategy uses Kubernetes SSA:
patchStrategy: apply # DefaultBenefits:
- Conflict-free updates
- Field-level ownership
- Efficient diffs
- Preserves other controllers' changes
Scalability Benchmarks
Tested Configurations
| Tenants | Templates | Resources/Tenant | Total Resources | Reconciliation Time | Memory Usage |
|---|---|---|---|---|---|
| TODO | TODO | TODO | TODO | TODO | TODO |
Data needed
Benchmark figures are placeholders—capture real metrics from staging clusters before relying on these numbers.
Resource Optimization
1. Template Efficiency
Good - Efficient template:
nameTemplate: "{{ .uid }}-app"Bad - Complex template:
nameTemplate: "{{ .uid }}-{{ .region }}-{{ .planId }}-{{ .timestamp }}"
# Avoid: timestamp, random values, complex logicTips:
- Keep templates simple
- Avoid random/timestamp values (breaks caching)
- Use consistent naming patterns
2. Dependency Graph Optimization
Good - Shallow dependency tree:
resources:
- id: ns # No dependencies
- id: app # Depends on: ns
- id: svc # Depends on: app
# Depth: 3Bad - Deep dependency tree:
resources:
- id: a # Depends on: none
- id: b # Depends on: a
- id: c # Depends on: b
- id: d # Depends on: c
- id: e # Depends on: d
# Depth: 5 (slow)Impact:
- Shallow trees = Parallel execution
- Deep trees = Sequential execution
Monitoring Performance
Key Metrics
Work in progress
Document recommended alert thresholds and dashboards after validating metrics in production.
Performance Alerts
Work in progress
Define actionable alert thresholds (latency, failure rates) once production benchmarks are finalized.
Advanced Optimization Techniques
Sharding
Planned feature
Sharding support is under design for v1.2 to scale across multiple controller replicas.
Note: Not yet implemented, planned for v1.2
See Also
- Monitoring Guide - Metrics and alerting
- Configuration Guide - Operator configuration
- Troubleshooting Guide - Common issues
