Troubleshooting Guide
Common issues and solutions for Tenant Operator.
General Debugging
Check Operator Status
# Check if operator is running
kubectl get pods -n tenant-operator-system
# View operator logs
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager -f
# Check operator events
kubectl get events -n tenant-operator-system --sort-by='.lastTimestamp'Check CRD Status
# List all Tenant CRs
kubectl get tenants --all-namespaces
# Describe a specific Tenant
kubectl describe tenant <tenant-name>
# Get Tenant status
kubectl get tenant <tenant-name> -o jsonpath='{.status}'Common Issues
1. Webhook TLS Certificate Errors
Error:
open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directoryCause: Webhook TLS certificates not found
Solutions:
Install cert-manager:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml2. Tenant Not Creating Resources
Symptoms:
- Tenant CR exists
- Status shows
desiredResources > 0 - But
readyResources = 0
Diagnosis:
# Check Tenant status
kubectl get tenant <name> -o yaml
# Check events
kubectl describe tenant <name>
# Check operator logs
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager | grep <tenant-name>Common Causes:
A. Template Rendering Error
# Look for: "Failed to render resource"
kubectl describe tenant <name> | grep -A5 "TemplateRenderError"Solution: Fix template syntax in TenantTemplate
B. Missing Variable
# Look for: "map has no entry for key"
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager | grep "missing"Solution: Add missing variable to extraValueMappings
C. Resource Conflict
# Look for: "ResourceConflict"
kubectl describe tenant <name> | grep "ResourceConflict"Solution: Delete conflicting resource or use conflictPolicy: Force
3. Database Connection Failures
Error:
Failed to query database: dial tcp: connect: connection refusedDiagnosis:
# Check secret exists
kubectl get secret <mysql-secret> -o yaml
# Check Registry status
kubectl get tenantregistry <name> -o yaml
# Test database connection from a pod
kubectl run -it --rm mysql-test --image=mysql:8 --restart=Never -- \
mysql -h <host> -u <user> -p<password> -e "SELECT 1"Solutions:
A. Verify credentials:
kubectl get secret <mysql-secret> -o jsonpath='{.data.password}' | base64 -dB. Check network connectivity:
kubectl exec -n tenant-operator-system deployment/tenant-operator-controller-manager -- \
nc -zv <mysql-host> 3306C. Verify TenantRegistry configuration:
spec:
source:
mysql:
host: mysql.default.svc.cluster.local # Correct FQDN
port: 3306
database: tenants4. Tenant Status Not Updating
Symptoms:
- Resources are ready in cluster
- Tenant status shows
readyResources = 0
Causes:
- Reconciliation not triggered
- Readiness check failing
Solutions:
A. Force reconciliation:
# Add annotation to trigger reconciliation
kubectl annotate tenant <name> force-sync="$(date +%s)" --overwriteB. Check readiness logic:
# For Deployments
kubectl get deployment <name> -o jsonpath='{.status}'
# Check if replicas match
kubectl get deployment <name> -o jsonpath='{.spec.replicas} {.status.availableReplicas}'C. Wait longer (resources take time to become ready):
- Deployments: 30s - 2min
- Jobs: Variable
- Ingresses: 10s - 1min
5. Template Variables Not Substituting
Symptoms:
- Template shows
{{ .uid }}literally in resources - Variables not replaced
Cause: Templates not rendered correctly
Diagnosis:
# Check rendered Tenant spec
kubectl get tenant <name> -o jsonpath='{.spec.deployments[0].nameTemplate}'Solution:
- Ensure Registry has correct
valueMappings - Check database column names match mappings
- Verify tenant row has non-empty values
6. Slow Tenant Provisioning
Symptoms:
- Tenants taking > 5 minutes to provision
- High operator CPU usage
Diagnosis:
# Check reconciliation times
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager | \
grep "Reconciliation completed" | tail -20
# Check resource counts
kubectl get tenants -o json | jq '.items[] | {name: .metadata.name, desired: .status.desiredResources}'Solutions:
A. Disable readiness waits:
waitForReady: falseB. Increase concurrency:
args:
- --tenant-concurrency=20C. Optimize templates (see Performance Guide)
7. Memory/CPU Issues
Symptoms:
- Operator pod OOMKilled
- High CPU usage
Diagnosis:
# Check resource usage
kubectl top pod -n tenant-operator-system
# Check for memory leaks
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager --previousSolutions:
A. Increase resource limits:
resources:
limits:
cpu: 2000m
memory: 2GiB. Reduce concurrency:
args:
- --tenant-concurrency=5C. Increase requeue interval:
args:
- --requeue-interval=1m8. Finalizer Stuck
Symptoms:
- Tenant CR stuck in
Terminatingstate - Can't delete Tenant
Diagnosis:
# Check finalizers
kubectl get tenant <name> -o jsonpath='{.metadata.finalizers}'
# Check deletion timestamp
kubectl get tenant <name> -o jsonpath='{.metadata.deletionTimestamp}'Solutions:
A. Check operator logs for deletion errors:
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager | \
grep "Failed to delete"B. Force remove finalizer (last resort):
kubectl patch tenant <name> -p '{"metadata":{"finalizers":[]}}' --type=mergeWarning: This may leave orphaned resources!
9. Registry Not Syncing
Symptoms:
- Database has active rows
- No Tenant CRs created
Diagnosis:
# Check Registry status
kubectl get tenantregistry <name> -o yaml
# Check operator logs
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager | \
grep "Registry"Common Causes:
A. Incorrect valueMappings:
# Must match database columns exactly
valueMappings:
uid: tenant_id # Column must exist
hostOrUrl: tenant_url # Column must exist
activate: is_active # Column must existB. No active rows:
-- Check for active tenants
SELECT COUNT(*) FROM tenants WHERE is_active = TRUE;C. Database query error:
# Check logs for SQL errors
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager | \
grep "Failed to query"10. Multi-Template Issues
Symptoms:
- Expected 2× tenants, only seeing 1×
- Wrong desired count
Diagnosis:
# Check Registry status
kubectl get tenantregistry <name> -o jsonpath='{.status}'
# Should show:
# referencingTemplates: 2
# desired: <templates> × <rows>
# Check templates reference same registry
kubectl get tenanttemplates -o jsonpath='{.items[*].spec.registryId}'Solution: Ensure all templates correctly reference the registry:
spec:
registryId: my-registry # Must match exactlyDebugging Workflows
Debug Template Rendering
- Create test Tenant manually:
apiVersion: operator.kubernetes-tenants.org/v1
kind: Tenant
metadata:
name: test-tenant
annotations:
kubernetes-tenants.org/uid: "test-123"
kubernetes-tenants.org/host: "test.example.com"
spec:
# ... copy from template- Check rendered resources:
kubectl get tenant test-tenant -o yaml- Check operator logs:
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager -fDebug Database Connection
- Create test pod:
kubectl run -it --rm mysql-test --image=mysql:8 --restart=Never -- bash- Inside pod:
mysql -h <host> -u <user> -p<password> <database> -e "SELECT * FROM tenants LIMIT 5"Debug Reconciliation
- Enable debug logging:
# config/manager/manager.yaml
args:
- --zap-log-level=debug- Watch reconciliation:
kubectl logs -n tenant-operator-system deployment/tenant-operator-controller-manager -f | \
grep "Reconciling"Getting Help
- Check operator logs
- Check Tenant events:
kubectl describe tenant <name> - Check Registry status:
kubectl get tenantregistry <name> -o yaml - Review Performance Guide
- Open issue: https://github.com/kubernetes-tenants/tenant-operator/issues
Include in bug reports:
- Operator version
- Kubernetes version
- Operator logs
- Tenant/Registry/Template YAML
- Steps to reproduce
