Saltearse al contenido

Operaciones de Kubernetes

Tabla de Contenidos

  1. Propósito
  2. ¿Para quién es esto?
  3. Descripción General del Cluster AKS
  4. Estrategia de Namespaces
  5. Arquitectura de Workloads
  6. Configuración de Ingress y TLS
  7. Gestión de Secrets
  8. Operaciones Comunes con kubectl
  9. Escalamiento y Gestión de Recursos
  10. Solución de Problemas

Propósito

Este documento describe las operaciones del cluster Kubernetes (AKS) para la plataforma Algesta. Cubre arquitectura del cluster, organización de namespaces, configuración de ingress, gestión de secrets y procedimientos operacionales comunes.

Siguiendo esta guía, entenderás:

  • Diseño del cluster AKS y configuración del node pool
  • Aislamiento de entornos basado en namespaces (development, pruebas, production, monitoring)
  • Enrutamiento de ingress con certificados TLS vía cert-manager
  • Secrets de Kubernetes e integración con Azure Key Vault
  • Comandos kubectl comunes para despliegues, servicios y solución de problemas

¿Para quién es esto?

Esta guía es para ingenieros DevOps gestionando el cluster AKS, SREs solucionando problemas de producción e ingenieros de plataforma desplegando aplicaciones. Asume familiaridad con Kubernetes, kubectl y AKS.


AKS Cluster Descripción General

Cluster Details

PropertyValorNotes
Cluster Nameaks-algesta-{environment}Per-environment clusters (dev, production)
Azure RegionEast USConfigurable in Terraform
Kubernetes Version1.28+Managed by Azure (auto-upgrade available)
SKU TierFreeProduction should use Standard tier for SLA
Network PluginkubenetDefault AKS networking (consider Azure CNI for advanced Funcionalidades)
DNS Prefixalgesta-{environment}Used for API server FQDN
Identity TypeSystem-assigned Managed IdentityFor Azure resource access (ACR, Key Vault)

Node Pools

System Node Pool (default):

  • Propósito: Runs AKS system Componentes (CoreDNS, Métricas-server, tunnelfront)
  • VM Size: Standard_B2s (2 vCPU, 4 GB RAM)
  • Node Count: 1 (auto-scaling: min 1, max 1)
  • OS: Linux (Ubuntu)
  • Mode: System
  • Taints: None (workloads can schedule here if needed)

User Node Pool (stdar{environment}):

  • Propósito: Runs application workloads (Microservicios, monitoring)
  • VM Size: Standard_B2s (2 vCPU, 4 GB RAM)
  • Node Count: 1 (auto-scaling: min 1, max 3)
  • OS: Linux (Ubuntu)
  • Mode: User
  • Taints: None

Scaling Behavior:

  • Development: Scales down to 1 node during idle periods
  • Production: Maintains min 1 node, scales up to 3 under load
  • Métricas: CPU utilization > 80% triggers scale-up

Cluster Add-ons

Add-onPropósitoConfiguration
Web App RoutingManaged ingress controller (nginx)Enabled via Terraform (web_app_routing block)
MonitoringAzure Monitor integrationOptional (currently using Prometheus/Grafana)
Azure PolicyEnforce security policiesNot enabled (consider for compliance)
Secrets Store CSI DriverAzure Key Vault integrationNot enabled (future Implementación)

Namespace Strategy

Environment Isolation

Each environment has dedicated namespaces for isolation and RBAC:

NamespacePropósitoIngress HostResource Quotas
developmentDevelopment Desplieguesalgesta-api-dev.3astronautas.comNo limits (small cluster)
PruebasQA and integration Pruebasalgesta-api-test.3astronautas.comNo limits
productionLive customer trafficalgesta-api-prod.3astronautas.comCPU: 4 cores, Memory: 8 GB (recommended)
monitoringGrafana, Prometheus, Lokialgesta.grafana.3astronautas.comCPU: 2 cores, Memory: 4 GB
cert-managerCertificate managementN/AMinimal resources
connect-devopsCI/CD service accountsN/AMinimal resources

Namespace Conventions:

  • Environment namespaces (development, Pruebas, production) host Microservicios
  • Shared services (monitoring, cert-manager) in dedicated namespaces
  • No default namespace usage (all resources in named namespaces)

Creating Namespaces

Development Namespace:

Ventana de terminal
kubectl create namespace development
# Add labels for organization
kubectl label namespace development environment=dev team=backend

With Resource Quotas (production):

Ventana de terminal
kubectl create namespace production
# Apply resource quota
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
pods: "50"
EOF

Workload Arquitectura

Microservicios Despliegue Pattern

Each Microservicio follows this standard Despliegue structure:

{microservice-name}-{environment}/
├── Deployment
├── Service (ClusterIP)
├── HorizontalPodAutoscaler (HPA)
└── ConfigMap / Secret (env vars)

Example: API Gateway Despliegue

apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway-production
namespace: production
labels:
app: api-gateway
environment: production
spec:
replicas: 2
selector:
matchLabels:
app: api-gateway
environment: production
template:
metadata:
labels:
app: api-gateway
environment: production
spec:
containers:
- name: api-gateway
image: acralgestaproduction.azurecr.io/api-gateway:latest
imagePullPolicy: Always
ports:
- containerPort: 3000
name: http
env:
- name: NODE_ENV
value: "production"
- name: PORT
value: "3000"
- name: MONGODB_URI
valueFrom:
secretKeyRef:
name: mongodb-credentials
key: uri
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: jwt-credentials
key: secret
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
name: api-gateway-production
namespace: production
spec:
type: ClusterIP
selector:
app: api-gateway
environment: production
ports:
- port: 80
targetPort: 3000
protocol: TCP
name: http
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-production-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway-production
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80

Current Deployed Microservicios

MicroservicioNamespaceDespliegue NameService NamePort
API Gatewaydevelopment, Pruebas, productionapi-gateway-{env}api-gateway-{env}3000
Orders Servicedevelopment, Pruebas, productionms-orders-{env}ms-orders-{env}3001
Notifications Servicedevelopment, Pruebas, productionms-notifications-{env}ms-notifications-{env}3002
Provider Servicedevelopment, Pruebas, productionms-provider-{env}ms-provider-{env}3003

Viewing Deployed Workloads:

Ventana de terminal
# List all deployments in production namespace
kubectl get deployments -n production
# List all services
kubectl get services -n production
# List all pods with labels
kubectl get pods -n production --show-labels

Ingress and TLS Configuration

Ingress Controller

Type: Azure Web App Routing (managed nginx ingress)

Funcionalidades:

  • Managed by AKS (automatic updates)
  • Integrated with Azure DNS (optional)
  • Supports cert-manager for TLS automation

Ingress Class:

ingressClassName: webapprouting.kubernetes.azure.com

Alternative: Manual nginx-ingress Despliegue (more control, requires maintenance)

Ingress Resources

Development Environment (ops-algesta/resources-k8s/ingress-nginx/ingress-aks/ingress-development.yaml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-development
namespace: development
annotations:
kubernetes.io/ingress.class: webapprouting.kubernetes.azure.com
cert-manager.io/cluster-issuer: letsencrypt-prod-webapprouting
nginx.ingress.kubernetes.io/proxy-connect-timeout: "600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
nginx.ingress.kubernetes.io/proxy-body-size: "300m"
spec:
tls:
- hosts:
- algesta-api-dev.3astronautas.com
secretName: algesta-api-dev-tls
ingressClassName: webapprouting.kubernetes.azure.com
rules:
- host: algesta-api-dev.3astronautas.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-gateway-development
port:
number: 80

Key Configuration:

  • TLS Certificate: Automatically provisioned by cert-manager via Let’s Encrypt
  • Timeout Settings: Extended to 600s for long-running requests (PDF generation)
  • Body Size: 300MB limit for file uploads (development), 10MB (production/Pruebas)

Pruebas Environment (ingress-Pruebas.yaml):

  • Host: algesta-api-test.3astronautas.com
  • TLS Secret: algesta-api-test-tls
  • Backend Service: api-gateway-Pruebas

Production Environment (ingress-production.yaml):

  • Host: algesta-api-prod.3astronautas.com
  • TLS Secret: algesta-api-prod-tls
  • Backend Service: api-gateway-production

Monitoring Ingress (ingress-monitoring.yaml):

  • Host: algesta.grafana.3astronautas.com
  • Backend Service: grafana-service (monitoring namespace)

TLS Certificate Management

cert-manager Configuration:

ClusterIssuer (ops-algesta/resources-k8s/cert-manager/Issuer.yaml):

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
namespace: cert-manager
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: j.leon@tresastronautas.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
ingressClassName: nginx

ClusterIssuer for Web App Routing (ops-algesta/resources-k8s/cert-manager/Issuer-webapprouting.yaml):

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod-webapprouting
namespace: cert-manager
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: j.leon@tresastronautas.com
privateKeySecretRef:
name: letsencrypt-prod-webapprouting
solvers:
- http01:
ingress:
ingressClassName: webapprouting.kubernetes.azure.com

Certificate Lifecycle:

  1. Ingress created with cert-manager.io/cluster-issuer annotation
  2. cert-manager detects annotation, creates Certificate resource
  3. cert-manager initiates ACME challenge (HTTP-01)
  4. Let’s Encrypt validates domain ownership
  5. Certificate issued and stored in Secret (e.g., algesta-api-prod-tls)
  6. Ingress controller uses Secret for TLS termination
  7. Auto-renewal 30 days before expiration

Checking Certificate Estado:

Ventana de terminal
# List certificates in namespace
kubectl get certificates -n production
# Check certificate details
kubectl describe certificate algesta-api-prod-tls -n production
# Check certificate expiration
kubectl get secret algesta-api-prod-tls -n production -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -enddate

Manual Certificate Renewal (if auto-renewal fails):

Ventana de terminal
# Delete certificate to trigger re-issuance
kubectl delete certificate algesta-api-prod-tls -n production
# cert-manager will automatically recreate it
kubectl get certificate algesta-api-prod-tls -n production -w

Secrets Management

Kubernetes Secrets

Current Approach: Manual secret creation via kubectl

Production MongoDB Secret:

Ventana de terminal
kubectl create secret generic mongodb-credentials \
--from-literal=uri="mongodb+srv://admin:SecurePassword@cluster.mongodb.net/algesta?retryWrites=true&w=majority" \
--namespace=production
# Verify secret created
kubectl get secret mongodb-credentials -n production

JWT Secret:

Ventana de terminal
kubectl create secret generic jwt-credentials \
--from-literal=secret="supersecurejwtkey12345" \
--namespace=production

Listing Secrets:

Ventana de terminal
# List all secrets in namespace
kubectl get secrets -n production
# View secret details (encoded)
kubectl get secret mongodb-credentials -n production -o yaml
# Decode secret value
kubectl get secret mongodb-credentials -n production -o jsonpath='{.data.uri}' | base64 -d

Azure Key Vault Integration (Future)

Recommended: Use Azure Key Vault Provider for Secrets Store CSI Driver

Benefits:

  • Centralized secret management in Azure Key Vault
  • Automatic secret rotation
  • Audit logging
  • Integration with Azure RBAC

Implementación (future):

  1. Enable CSI Driver in AKS:
Ventana de terminal
az aks enable-addons --addons azure-keyvault-secrets-provider \
--resource-group rg-algesta-production \
--name aks-algesta-production
  1. Create SecretProviderClass:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-keyvault-provider
namespace: production
spec:
provider: azure
parameters:
usePodIdentity: "false"
useVMManagedIdentity: "true"
userAssignedIdentityID: "<managed-identity-client-id>"
keyvaultName: "akv-algesta-production"
objects: |
array:
- |
objectName: mongodb-uri
objectType: secret
objectVersion: ""
- |
objectName: jwt-secret
objectType: secret
objectVersion: ""
tenantId: "<azure-tenant-id>"
  1. Mount Secrets in Pods:
spec:
containers:
- name: api-gateway
volumeMounts:
- name: secrets-store
mountPath: "/mnt/secrets"
readOnly: true
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "azure-keyvault-provider"

Common kubectl Operaciones

Cluster Access

Configure kubectl:

Ventana de terminal
# Get credentials for AKS cluster
az aks get-credentials \
--resource-group rg-algesta-production \
--name aks-algesta-production \
--overwrite-existing
# Verify connection
kubectl cluster-info
kubectl get nodes

Set Default Namespace:

Ventana de terminal
# Set default namespace to production
kubectl config set-context --current --namespace=production
# Verify current namespace
kubectl config view --minify | grep namespace:

Despliegue Operaciones

Deploy New Version:

Ventana de terminal
# Update deployment image
kubectl set image deployment/api-gateway-production \
api-gateway=acralgestaproduction.azurecr.io/api-gateway:12345 \
--namespace=production
# Watch rollout progress
kubectl rollout status deployment/api-gateway-production -n production
# Check rollout history
kubectl rollout history deployment/api-gateway-production -n production

Rollback Despliegue:

Ventana de terminal
# Rollback to previous version
kubectl rollout undo deployment/api-gateway-production -n production
# Rollback to specific revision
kubectl rollout undo deployment/api-gateway-production --to-revision=3 -n production
# Verify rollback
kubectl get pods -n production -l app=api-gateway

Scale Despliegue:

Ventana de terminal
# Scale to 5 replicas
kubectl scale deployment/api-gateway-production --replicas=5 -n production
# Verify scaling
kubectl get pods -n production -l app=api-gateway

Restart Despliegue (forces pod recreation):

Ventana de terminal
kubectl rollout restart deployment/api-gateway-production -n production

Pod Operaciones

Viewing Pods:

Ventana de terminal
# List all pods in namespace
kubectl get pods -n production
# List pods with more details
kubectl get pods -n production -o wide
# Watch pods in real-time
kubectl get pods -n production -w
# Filter by label
kubectl get pods -n production -l app=api-gateway

Pod Logs:

Ventana de terminal
# View logs for single pod
kubectl logs api-gateway-production-xxxxx-yyyyy -n production
# Follow logs (tail -f)
kubectl logs -f api-gateway-production-xxxxx-yyyyy -n production
# View logs from previous container (after crash)
kubectl logs api-gateway-production-xxxxx-yyyyy -n production --previous
# View logs from all pods in deployment
kubectl logs -l app=api-gateway -n production --tail=100

Execute Commands in Pod:

Ventana de terminal
# Open shell in pod
kubectl exec -it api-gateway-production-xxxxx-yyyyy -n production -- /bin/sh
# Run single command
kubectl exec api-gateway-production-xxxxx-yyyyy -n production -- env | grep NODE_ENV
# Check application health
kubectl exec api-gateway-production-xxxxx-yyyyy -n production -- curl localhost:3000/health

Debugging Pods:

Ventana de terminal
# Describe pod (shows events, status, volumes)
kubectl describe pod api-gateway-production-xxxxx-yyyyy -n production
# Check resource usage
kubectl top pod api-gateway-production-xxxxx-yyyyy -n production
# Check pod IP and node assignment
kubectl get pod api-gateway-production-xxxxx-yyyyy -n production -o jsonpath='{.status.podIP}{"\n"}{.spec.nodeName}{"\n"}'

Service Operaciones

Viewing Services:

Ventana de terminal
# List services
kubectl get services -n production
# Describe service
kubectl describe service api-gateway-production -n production
# Get service endpoints
kubectl get endpoints api-gateway-production -n production

Pruebas Service Connectivity:

Ventana de terminal
# From within cluster (create debug pod)
kubectl run -it --rm debug --image=curlimages/curl:latest --restart=Never -n production -- sh
# Inside pod:
curl http://api-gateway-production.production.svc.cluster.local/health

Ingress Operaciones

Viewing Ingress:

Ventana de terminal
# List ingress resources
kubectl get ingress -n production
# Describe ingress (shows rules, backends)
kubectl describe ingress ingress-production -n production
# Check ingress external IP
kubectl get ingress ingress-production -n production -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Pruebas Ingress:

Ventana de terminal
# Test HTTP (should redirect to HTTPS)
curl -v http://algesta-api-prod.3astronautas.com/health
# Test HTTPS
curl -v https://algesta-api-prod.3astronautas.com/health
# Check TLS certificate
openssl s_client -connect algesta-api-prod.3astronautas.com:443 -servername algesta-api-prod.3astronautas.com < /dev/null 2>/dev/null | openssl x509 -noout -dates

Scaling and Resource Management

Horizontal Pod Autoscaling (HPA)

Current HPA Configuration:

Ventana de terminal
# List HPAs
kubectl get hpa -n production
# Describe HPA
kubectl describe hpa api-gateway-production-hpa -n production

HPA Métricas:

  • CPU Target: 70% utilization
  • Memory Target: 80% utilization
  • Min Replicas: 2 (production), 1 (dev/test)
  • Max Replicas: 10 (production), 3 (dev/test)

Manually Disable HPA (for Pruebas):

Ventana de terminal
# Scale HPA to 0 (disables autoscaling)
kubectl patch hpa api-gateway-production-hpa -n production --patch '{"spec":{"minReplicas":1,"maxReplicas":1}}'

Cluster Autoscaling

Node Pool Autoscaling:

  • Managed by Azure AKS (configured in Terraform)
  • Trigger: Pods in Pendiente state due to insufficient resources
  • Scale Up: Add nodes to user node pool (max 3)
  • Scale Down: Remove idle nodes after 10 minutes

Check Node Estado:

Ventana de terminal
# List nodes
kubectl get nodes
# Check node resource usage
kubectl top nodes
# Check pods per node
kubectl get pods -A -o wide | awk '{print $8}' | sort | uniq -c

Resource Quotas and Limits

Namespace Resource Quota (production):

Ventana de terminal
# View quota
kubectl get resourcequota -n production
# Check quota usage
kubectl describe resourcequota production-quota -n production

Pod Resource Requests/Limits:

  • Requests: Minimum resources guaranteed (used for scheduling)
  • Limits: Maximum resources allowed (enforced by kubelet)

Recommended Settings:

WorkloadCPU RequestMemory RequestCPU LimitMemory Limit
API Gateway250m512Mi1000m1Gi
Microservicios100m256Mi500m512Mi
Monitoring (Grafana)500m1Gi1000m2Gi

Troubleshooting

Pod Not Starting (ImagePullBackOff)

Symptoms:

Ventana de terminal
kubectl get pods -n production
# NAME READY STATUS RESTARTS AGE
# api-gateway-prod-xxxxx-yyyyy 0/1 ImagePullBackOff 0 2m

Diagnosis:

Ventana de terminal
kubectl describe pod api-gateway-prod-xxxxx-yyyyy -n production
# Events:
# Failed to pull image "acralgestaproduction.azurecr.io/api-gateway:12345": rpc error: code = Unknown desc = Error response from daemon: unauthorized: authentication required

Solutions:

  1. Verify ACR Integration:
Ventana de terminal
az aks check-acr --name aks-algesta-production \
--resource-group rg-algesta-production \
--acr acralgestaproduction.azurecr.io
  1. Grant AcrPull Role to AKS:
Ventana de terminal
ACR_ID=$(az acr show --name acralgestaproduction --query id --output tsv)
AKS_IDENTITY=$(az aks show --name aks-algesta-production --resource-group rg-algesta-production --query identityProfile.kubeletidentity.objectId --output tsv)
az role assignment create --assignee $AKS_IDENTITY --role AcrPull --scope $ACR_ID
  1. Verify Image Exists:
Ventana de terminal
az acr repository show-tags --name acralgestaproduction --repository api-gateway

Pod Crashing (CrashLoopBackOff)

Symptoms:

Ventana de terminal
kubectl get pods -n production
# NAME READY STATUS RESTARTS AGE
# api-gateway-prod-xxxxx-yyyyy 0/1 CrashLoopBackOff 5 10m

Diagnosis:

Ventana de terminal
# Check logs
kubectl logs api-gateway-prod-xxxxx-yyyyy -n production
# Check previous container logs (after crash)
kubectl logs api-gateway-prod-xxxxx-yyyyy -n production --previous
# Check events
kubectl describe pod api-gateway-prod-xxxxx-yyyyy -n production

Common Causes:

  1. Missing Environment Variables:
Ventana de terminal
kubectl exec api-gateway-prod-xxxxx-yyyyy -n production -- env | grep -E "(MONGODB|JWT|NODE_ENV)"
  1. Base de datos Connection Failure:
Ventana de terminal
# Test MongoDB connection from pod
kubectl exec api-gateway-prod-xxxxx-yyyyy -n production -- nc -zv cluster.mongodb.net 27017
  1. Insufficient Resources:
Ventana de terminal
kubectl describe pod api-gateway-prod-xxxxx-yyyyy -n production | grep -A 5 "Resource"

Service Not Reachable

Symptoms:

curl: (7) Failed to connect to algesta-api-prod.3astronautas.com port 443: Connection refused

Diagnosis:

  1. Check Ingress:
Ventana de terminal
kubectl get ingress ingress-production -n production
# Ensure ADDRESS column shows IP
  1. Check Service:
Ventana de terminal
kubectl get service api-gateway-production -n production
# Ensure CLUSTER-IP is assigned
# Check endpoints (should list pod IPs)
kubectl get endpoints api-gateway-production -n production
  1. Test Service from within Cluster:
Ventana de terminal
kubectl run -it --rm debug --image=curlimages/curl:latest --restart=Never -n production -- \
curl http://api-gateway-production.production.svc.cluster.local/health
  1. Check DNS Resolution:
Ventana de terminal
# From Azure VM or local machine with VPN
nslookup algesta-api-prod.3astronautas.com
# Should resolve to ingress external IP

Related Documentoation:

For Support:

  • Check pod Estado: kubectl get pods -n production
  • Review logs: kubectl logs -f <pod-name> -n production
  • Describe resources: kubectl describe Despliegue/service/ingress <name> -n production
  • Contact DevOps team for AKS access and credentials