Solución de Problemas
Tabla de Contenidos
- Propósito
- ¿Para quién es esto?
- Metodología de Solución de Problemas
- Problemas de Entorno de Desarrollo
- Problemas de Base de Datos
- Problemas de Docker y Contenedores
- Problemas de Pruebas
- Problemas de Despliegue
- Problemas de Runtime y Producción
- Integración y Servicios Externos
- Problemas de Rendimiento
Propósito
Esta guía proporciona procedimientos sistemáticos de solución de problemas para issues comunes encontrados en la plataforma Algesta a través de entornos de desarrollo, pruebas, despliegue y producción. Ofrece comandos de diagnóstico, análisis de causa raíz y soluciones paso a paso.
Siguiendo esta guía, podrás:
- Diagnosticar problemas metódicamente usando técnicas probadas
- Resolver problemas comunes de desarrollo y despliegue
- Usar comandos de diagnóstico para recopilar información
- Escalar problemas no resueltos efectivamente
¿Para quién es esto?
Esta guía es para desarrolladores depurando problemas locales, ingenieros DevOps resolviendo problemas de despliegue e ingenieros de soporte solucionando incidentes de producción. Asume familiaridad con herramientas de línea de comandos, análisis de logs y la arquitectura de Algesta.
Metodología de Resolución de Problemas
Enfoque Sistemático
graph LR
A[1. Identify Symptoms<br/>Error messages?<br/>Failed checks?] --> B[2. Gather Info<br/>Logs, errors<br/>Environment]
B --> C[3. Isolate Problem<br/>Which component?<br/>Which environment?]
C --> D[4. Hypothesize Cause<br/>Analyze symptoms<br/>Check logs]
D --> E[5. Test Hypothesis<br/>Run diagnostics<br/>Reproduce issue]
E --> F[6. Apply Solution<br/>Fix and verify]
F --> G[7. Document<br/>Update guide<br/>Share knowledge]
style A fill:#ffebee,stroke:#c62828
style B fill:#e1f5fe,stroke:#0277bd
style C fill:#fff9c4,stroke:#f9a825
style D fill:#f3e5f5,stroke:#6a1b9a
style E fill:#e8f5e9,stroke:#2e7d32
style F fill:#fce4ec,stroke:#ad1457
style G fill:#e0f2f1,stroke:#00695c
Pasos:
- Identify Symptoms: What is the observed behavior? Error messages? Failed health checks?
- Gather Information: Logs, error stacks, environment state
- Isolate the Problem: Which Componente? Which environment?
- Hypothesize Cause: Based on symptoms and logs
- Test Hypothesis: Apply diagnostic commands, reproduce issue
- Apply Solution: Fix and verify
- Documento: Update this Guía if new issue discovered
Hoja de Referencia de Comandos de Diagnóstico
# Check service healthcurl http://localhost:3001/health
# View logsdocker logs algesta-orders-ms -f# ornpm run start:dev # Watch console output
# Check running processesps aux | grep node
# Check port usagelsof -i :3001
# Test database connectionmongosh "mongodb://localhost:27017/orders_ms" --eval "db.runCommand({ ping: 1 })"
# Test Redis connectionredis-cli -h localhost -p 6379 ping
# Check environment variablesprintenv | grep MONGODB_URI
# Check Docker containersdocker psdocker-compose ps
# Check disk spacedf -h
# Check memory usagefree -h # Linuxvm_stat # macOSDevelopment Environment Issues
Problema: Service fails to start with “Cannot find module”
Síntomas:
Error: Cannot find module '@nestjs/common'Diagnóstico:
# Check if node_modules existsls -la node_modules/
# Verify package.json existscat package.jsonCausa Raíz: Dependencies not installed or node_modules deleted.
Solución:
# Reinstall dependenciesnpm install --force
# Or with clean cacherm -rf node_modules package-lock.jsonnpm cache clean --forcenpm installVerificación:
npm run start:dev# Service should start successfullyProblema: Port already in use
Síntomas:
Error: listen EADDRINUSE: address already in use :::3001Diagnóstico:
# Find process using port 3001lsof -i :3001# ornetstat -tuln | grep 3001Salida de Ejemplo:
COMMAND PID USER FD TYPE DEVICE NODE NAMEnode 1234 user 22u IPv6 0x... 0t0 TCP *:3001 (LISTEN)Causa Raíz: Another instance of service or different service using same port.
Solución:
Option 1: Kill process
kill -9 1234 # Replace with actual PIDOption 2: Change port
# Edit .envPORT=3011Option 3: Find and stop other instance
# If it's a Docker containerdocker ps | grep 3001docker stop <container-id>Verificación:
lsof -i :3001# No output = port is freeProblema: Environment variables not loading
Síntomas:
Configuration validation error: MONGODB_URI is requiredDiagnóstico:
# Check if .env file existsls -la .env
# Verify file permissionsls -l .env
# Check variable in environmentprintenv MONGODB_URICausa Raíz: .env file missing, incorrect location, or not loaded.
Solución:
1. Create .env file:
cp .env.example .env2. Verify location:
# .env must be in repo root (e.g., algesta-ms-orders-nestjs/.env)ls -la .env3. Check dotenv loading:
// In main.ts or config moduleimport * as dotenv from "dotenv";dotenv.config();console.log("MONGODB_URI:", process.env.MONGODB_URI);4. Restart service:
# Kill all node processespkill -f node
# Restartnpm run start:devVerificación:
# Service logs should show no config errorscurl http://localhost:3001/healthProblema: TypeScript compilation errors
Síntomas:
error TS2307: Cannot find module '@nestjs/core' or its corresponding type declarationsDiagnóstico:
# Check TypeScript versionnpx tsc --version
# Verify tsconfig.json existscat tsconfig.jsonCausa Raíz: Missing type definitions or incompatible TypeScript version.
Solución:
# Install type definitionsnpm install --save-dev @types/node
# Reinstall dependencies (ensures compatible versions)rm -rf node_modules package-lock.jsonnpm install
# Clean build cacherm -rf distnpm run buildVerificación:
npm run build# Should compile without errorsBase de datos Issues
Problema: Cannot connect to MongoDB
Síntomas:
MongoServerError: connect ECONNREFUSED 127.0.0.1:27017Diagnóstico:
# Check if MongoDB is runningmongosh --eval "db.version()"
# Check portlsof -i :27017
# Test connection with URImongosh "mongodb://localhost:27017/orders_ms" --eval "db.runCommand({ ping: 1 })"Causa Raíz:
- MongoDB not running
- Wrong connection string
- Firewall blocking connection
Solución:
1. Start MongoDB:
# macOSbrew services start mongodb-community@7.0
# Linuxsudo systemctl start mongod
# Dockerdocker start algesta-mongodb2. Verify connection string:
# Correct formatMONGODB_URI=mongodb://localhost:27017/orders_ms
# With authMONGODB_URI=mongodb://user:pass@localhost:27017/orders_ms?authSource=admin3. Check firewall (production):
# Allow port 27017sudo ufw allow 27017Verificación:
mongosh "mongodb://localhost:27017/orders_ms"# Should connect successfully
# Restart servicenpm run start:devProblema: Slow Base de datos queries
Síntomas:
- API responses take > 5 seconds
- Logs show “Slow query detected”
Diagnóstico:
// In mongoshdb.setProfilingLevel(1, { slowms: 100 });
// View slow queriesdb.system.profile .find({ millis: { $gt: 100 } }) .sort({ ts: -1 }) .limit(10);Salida de Ejemplo:
{ "op": "query", "ns": "orders_ms.orders", "command": { "find": "orders", "filter": { "status": "NEW" } }, "millis": 3245, "planSummary": "COLLSCAN" // ❌ Collection scan (no index)}Causa Raíz: Missing indexes, inefficient queries.
Solución:
1. Create missing indexes:
db.orders.createIndex({ status: 1 });2. Verify index usage:
db.orders.find({ status: "NEW" }).explain("executionStats");Esperado:
{ "winningPlan": { "stage": "IXSCAN", // ✅ Index scan "indexName": "status_1" }, "executionStats": { "executionTimeMillis": 5 // Fast! }}3. Optimize query:
// Bad: Fetch all fieldsawait this.orderModel.find({ status: "NEW" });
// Good: Project only needed fieldsawait this.orderModel.find({ status: "NEW" }, "orderId service address");Verificación:
# API response time should improvecurl -w "\nTime: %{time_total}s\n" http://localhost:3001/orders?status=NEWProblema: Base de datos connection pool exhausted
Síntomas:
MongoServerError: connection pool is full, waiting for connectionDiagnóstico:
// Check current connectionsdb.serverStatus().connections;Salida de Ejemplo:
{ "current": 100, // Max reached! "available": 0}Causa Raíz:
maxPoolSizetoo low for traffic- Connection leaks (not closing cursors)
Solución:
1. Increase pool size:
maxPoolSize: parseInt(process.env.DB_POOL_MAX ?? '50', 10), // Increase from 202. Find connection leaks:
// Bad: Cursor not closedconst orders = await this.orderModel.find().cursor();// ... (never closed)
// Good: Use toArray() or explicit closeconst orders = await this.orderModel.find().toArray();3. Restart service:
docker restart algesta-orders-msVerificación:
db.serverStatus().connections;// current should be < maxPoolSizeDocker and Container Issues
Problema: Docker build fails with bcrypt error
Síntomas:
error /app/node_modules/bcrypt: Command failed.Diagnóstico:
docker logs <container-id>Causa Raíz: bcrypt native module compiled for wrong Arquitectura.
Solución:
1. Rebuild in Dockerfile:
# In final stageRUN npm install --force --only=production bcrypt2. Rebuild image:
docker-compose build --no-cache orders-msVerificación:
docker run -it algesta-orders-ms node -e "require('bcrypt').hash('test', 10, console.log)"# Should output hash without errorsProblema: Container starts then exits immediately
Síntomas:
docker ps# Container not listed (exited)
docker ps -a# Shows "Exited (1) 2 seconds ago"Diagnóstico:
docker logs algesta-orders-msSalida de Ejemplo:
Error: Cannot connect to MongoDBCausa Raíz: Application crashes on startup (DB connection, missing env vars).
Solución:
1. Check environment variables:
docker exec algesta-orders-ms printenv | grep MONGODB_URI2. Fix docker-compose.yml:
environment: - MONGODB_URI=mongodb://mongodb:27017/orders_ms # Use service name, not localhost3. Ensure dependencies healthy:
depends_on: mongodb: condition: service_healthy4. Restart:
docker-compose downdocker-compose up -dVerificación:
docker-compose ps# All services should show "Up"Problema: Puppeteer/Chromium not found in container
Síntomas:
Error: Failed to launch Chrome!Diagnóstico:
docker exec algesta-orders-ms which chromium# Should return pathCausa Raíz: Missing Chromium dependencies in Dockerfile.
Solución:
1. Verify Dockerfile has dependencies:
RUN apt-get update && \ apt-get install -y wget ca-certificates fonts-liberation libappindicator3-1 ...2. Set environment variable:
environment: - PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=false3. Rebuild:
docker-compose build --no-cache orders-msVerificación:
docker exec algesta-orders-ms chromium --version# Chrome/Chromium 120.x.xPruebas Issues
Problema: Pruebas fail with “Jest has detected the following 1 open handle”
Síntomas:
Jest has detected the following 1 open handle potentially keeping Jest from exiting: ● TCPSERVERWRAPDiagnóstico:
npm run test:cov# Look for open handles warningCausa Raíz: Base de datos connection or server not closed after Pruebas.
Solución:
1. Add global teardown:
export default async function globalTeardown() { await mongoose.disconnect();}2. Update jest.config.js:
module.exports = { globalTeardown: "<rootDir>/test/global-teardown.ts",};3. Close connections in Pruebas:
afterAll(async () => { await app.close(); await mongoose.disconnect();});Verificación:
npm run test# Should exit cleanly without "open handle" warningProblema: E2E Pruebas fail with “Timeout waiting for selector”
Symptoms (Playwright):
TimeoutError: Waiting for selector ".orders-table" timed out after 30000msDiagnóstico:
# Run with debug modepnpm test:e2e:debugCausa Raíz: Element not rendered in time, network delay.
Solución:
1. Increase timeout:
await page.waitForSelector(".orders-table", { timeout: 60000 });2. Use better wait strategy:
// Bad: Fixed timeoutawait page.waitForTimeout(5000);
// Good: Wait for network idleawait page.waitForLoadState("networkidle");await page.waitForSelector(".orders-table");3. Check API is running:
curl http://localhost:3000/healthVerificación:
pnpm test:e2e# Tests should passDespliegue Issues
Problema: Azure Pipeline fails at Docker push stage
Síntomas:
Error: unauthorized: authentication requiredDiagnóstico: Check Azure Pipeline logs for “Push Docker Image” step.
Causa Raíz: GCP service connection expired or misconfigured.
Solución:
1. Verify service connection:
- Azure DevOps → Project Settings → Service Connections
- Select “GCR_ServiceConnection”
- Test connection
2. Update service account key:
# Generate new key in GCPgcloud iam service-accounts keys create key.json \ --iam-account=azure-pipelines@project.iam.gserviceaccount.com
# Update KEY_GCP secret variable in Azure Pipeline3. Retry pipeline:
# In Azure DevOps UI, click "Retry" on failed stageVerificación: Pipeline should Completo successfully, image pushed to Artifact Registry.
Problema: Terraform apply fails with “resource already exists”
Síntomas:
Error: Error creating Service: googleapi: Error 409: Resource 'algesta-orders-ms-dev' already existsDiagnóstico:
cd infrastructure/terraform-devterraform state listCausa Raíz: Resource exists in GCP but not in Terraform state.
Solución:
1. Import existing resource:
terraform import google_cloud_run_service.orders_ms \ projects/PROJECT_ID/locations/us-east1/services/algesta-orders-ms-dev2. Re-run apply:
terraform planterraform applyVerificación:
terraform state show google_cloud_run_service.orders_ms# Should show resource detailsRuntime and Production Issues
Problema: Service returns 502 Bad Gateway
Síntomas: Users report 502 errors when accessing API.
Diagnóstico:
# Check Cloud Run service logsgcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=algesta-orders-ms-prod" --limit 50Salida de Ejemplo:
ERROR: Unhandled rejection: MongoServerError: connection timeoutCausa Raíz: Service crashes on startup or unhealthy.
Solución:
1. Check service health:
gcloud run services describe algesta-orders-ms-prod --region=us-east1# Look for "Ready: False"2. Verify environment variables:
gcloud run services describe algesta-orders-ms-prod --region=us-east1 --format="value(spec.template.spec.containers[0].env)"3. Check Base de datos connectivity:
# Test from Cloud Shellmongosh "$MONGODB_URI" --eval "db.runCommand({ ping: 1 })"4. Rollback to previous version:
gcloud run services update-traffic algesta-orders-ms-prod \ --to-revisions=algesta-orders-ms-prod-00005-abc=100 \ --region=us-east1Verificación:
curl https://algesta-orders-ms-prod-xyz.run.app/health# Should return 200 OKProblema: High memory usage, OOM kills
Síntomas:
Error: Process out of memoryDiagnóstico:
# Check Cloud Run metricsgcloud run services describe algesta-orders-ms-prod --region=us-east1
# View memory usagegcloud logging read "resource.type=cloud_run_revision AND textPayload=~'out of memory'" --limit 10Causa Raíz: Memory leaks, large data processing, insufficient memory allocation.
Solución:
1. Increase memory limit (Terraform):
resource "google_cloud_run_service" "orders_ms" { template { spec { containers { resources { limits = { memory = "1Gi" # Increase from 512Mi } } } } }}2. Profile memory:
// Add heap snapshotimport * as v8 from "v8";v8.writeHeapSnapshot();3. Check for leaks:
# Use clinic.js or node --inspectnode --inspect dist/main.jsVerificación: Monitor Cloud Run Métricas after Despliegue, memory usage should stabilize.
Integration and External Services
Problema: SendGrid emails not sending
Síntomas: Users report not receiving emails.
Diagnóstico:
# Check logs for SendGrid errorsgrep -i sendgrid logs/app.log
# Test SendGrid API keycurl -X POST https://api.sendgrid.com/v3/mail/send \ -H "Authorization: Bearer $SENDGRID_API_KEY" \ -H "Content-Type: application/json" \ -d '{"personalizations":[{"to":[{"email":"test@example.com"}]}],"from":{"email":"noreply@algesta.com"},"subject":"Test","content":[{"type":"text/plain","value":"Test"}]}'Causa Raíz:
- Invalid API key
- Email quota exceeded
- Sender not verified
Solución:
1. Verify API key:
# In SendGrid dashboard, check API key status2. Check quota:
- SendGrid free tier: 100 emails/day
- Upgrade if needed
3. Verify sender:
- SendGrid → Sender Authentication
- Verify “noreply@algesta.com”
Verificación:
// Send test emailawait this.emailService.sendEmail({ to: "test@yourmail.com", subject: "Test", body: "Test email",});Performance Issues
Problema: API response times > 5 seconds
Síntomas: Dashboard feels slow, API requests timeout.
Diagnóstico:
# Profile API requestcurl -w "\nTotal time: %{time_total}s\n" http://localhost:3001/orders
# Check database query timesdb.system.profile.find({ millis: { $gt: 1000 } })Causa Raíz:
- Missing indexes
- N+1 queries
- Large data fetches
Solución:
1. Add indexes (see Base de datos Issues):
db.orders.createIndex({ status: 1, createdAt: -1 });2. Optimize queries:
// Bad: N+1 queryfor (const order of orders) { const provider = await this.providerModel.findOne({ providerId: order.providerId, });}
// Good: Batch fetchconst providerIds = orders.map((o) => o.providerId);const providers = await this.providerModel.find({ providerId: { $in: providerIds },});3. Add caching (Redis):
const cached = await this.redis.get(`order:${orderId}`);if (cached) return JSON.parse(cached);
const order = await this.orderModel.findOne({ orderId });await this.redis.setex(`order:${orderId}`, 3600, JSON.stringify(order));Verificación:
curl -w "\nTotal time: %{time_total}s\n" http://localhost:3001/orders# Time should be < 1sRelated Guías:
- Local Development Setup: Setup-related issues
- [Base de datos Setup](/04-guides/Base de datos-setup/): Base de datos troubleshooting
- Docker Setup: Container issues
- Pruebas Guía: Test failures
- Despliegue Guía: CI/CD and production issues
For Support:
- Review service logs:
docker logs <container>or Cloud Run logs - Check Azure Pipeline logs for build/deploy failures
- Contact team lead for production access and escalation