Solución de Problemas

Tabla de Contenidos

Propósito
¿Para quién es esto?
Metodología de Solución de Problemas
Problemas de Entorno de Desarrollo
Problemas de Base de Datos
Problemas de Docker y Contenedores
Problemas de Pruebas
Problemas de Despliegue
Problemas de Runtime y Producción
Integración y Servicios Externos
Problemas de Rendimiento

Propósito

Esta guía proporciona procedimientos sistemáticos de solución de problemas para issues comunes encontrados en la plataforma Algesta a través de entornos de desarrollo, pruebas, despliegue y producción. Ofrece comandos de diagnóstico, análisis de causa raíz y soluciones paso a paso.

Siguiendo esta guía, podrás:

Diagnosticar problemas metódicamente usando técnicas probadas
Resolver problemas comunes de desarrollo y despliegue
Usar comandos de diagnóstico para recopilar información
Escalar problemas no resueltos efectivamente

¿Para quién es esto?

Esta guía es para desarrolladores depurando problemas locales, ingenieros DevOps resolviendo problemas de despliegue e ingenieros de soporte solucionando incidentes de producción. Asume familiaridad con herramientas de línea de comandos, análisis de logs y la arquitectura de Algesta.

Metodología de Resolución de Problemas

Enfoque Sistemático

graph LR
    A[1. Identify Symptoms<br/>Error messages?<br/>Failed checks?] --> B[2. Gather Info<br/>Logs, errors<br/>Environment]
    B --> C[3. Isolate Problem<br/>Which component?<br/>Which environment?]
    C --> D[4. Hypothesize Cause<br/>Analyze symptoms<br/>Check logs]
    D --> E[5. Test Hypothesis<br/>Run diagnostics<br/>Reproduce issue]
    E --> F[6. Apply Solution<br/>Fix and verify]
    F --> G[7. Document<br/>Update guide<br/>Share knowledge]

    style A fill:#ffebee,stroke:#c62828
    style B fill:#e1f5fe,stroke:#0277bd
    style C fill:#fff9c4,stroke:#f9a825
    style D fill:#f3e5f5,stroke:#6a1b9a
    style E fill:#e8f5e9,stroke:#2e7d32
    style F fill:#fce4ec,stroke:#ad1457
    style G fill:#e0f2f1,stroke:#00695c

Pasos:

Identify Symptoms: What is the observed behavior? Error messages? Failed health checks?
Gather Information: Logs, error stacks, environment state
Isolate the Problem: Which Componente? Which environment?
Hypothesize Cause: Based on symptoms and logs
Test Hypothesis: Apply diagnostic commands, reproduce issue
Apply Solution: Fix and verify
Documento: Update this Guía if new issue discovered

Hoja de Referencia de Comandos de Diagnóstico

# Check service health
curl http://localhost:3001/health

# View logs
docker logs algesta-orders-ms -f
# or
npm run start:dev  # Watch console output

# Check running processes
ps aux | grep node

# Check port usage
lsof -i :3001

# Test database connection
mongosh "mongodb://localhost:27017/orders_ms" --eval "db.runCommand({ ping: 1 })"

# Test Redis connection
redis-cli -h localhost -p 6379 ping

# Check environment variables
printenv | grep MONGODB_URI

# Check Docker containers
docker ps
docker-compose ps

# Check disk space
df -h

# Check memory usage
free -h  # Linux
vm_stat  # macOS

Development Environment Issues

Problema: Service fails to start with “Cannot find module”

Síntomas:

Error: Cannot find module '@nestjs/common'

Diagnóstico:

# Check if node_modules exists
ls -la node_modules/

# Verify package.json exists
cat package.json

Causa Raíz: Dependencies not installed or node_modules deleted.

Solución:

# Reinstall dependencies
npm install --force

# Or with clean cache
rm -rf node_modules package-lock.json
npm cache clean --force
npm install

Verificación:

npm run start:dev
# Service should start successfully

Problema: Port already in use

Síntomas:

Error: listen EADDRINUSE: address already in use :::3001

Diagnóstico:

# Find process using port 3001
lsof -i :3001
# or
netstat -tuln | grep 3001

Salida de Ejemplo:

COMMAND   PID   USER   FD   TYPE  DEVICE  NODE NAME
node      1234  user   22u  IPv6  0x...   0t0  TCP *:3001 (LISTEN)

Causa Raíz: Another instance of service or different service using same port.

Solución:

Option 1: Kill process

kill -9 1234  # Replace with actual PID

Option 2: Change port

# Edit .env
PORT=3011

Option 3: Find and stop other instance

# If it's a Docker container
docker ps | grep 3001
docker stop <container-id>

Verificación:

lsof -i :3001
# No output = port is free

Problema: Environment variables not loading

Síntomas:

Configuration validation error: MONGODB_URI is required

Diagnóstico:

# Check if .env file exists
ls -la .env

# Verify file permissions
ls -l .env

# Check variable in environment
printenv MONGODB_URI

Causa Raíz: .env file missing, incorrect location, or not loaded.

Solución:

1. Create .env file:

cp .env.example .env

2. Verify location:

# .env must be in repo root (e.g., algesta-ms-orders-nestjs/.env)
ls -la .env

3. Check dotenv loading:

// In main.ts or config module
import * as dotenv from "dotenv";
dotenv.config();
console.log("MONGODB_URI:", process.env.MONGODB_URI);

4. Restart service:

# Kill all node processes
pkill -f node

# Restart
npm run start:dev

Verificación:

# Service logs should show no config errors
curl http://localhost:3001/health

Problema: TypeScript compilation errors

Síntomas:

error TS2307: Cannot find module '@nestjs/core' or its corresponding type declarations

Diagnóstico:

# Check TypeScript version
npx tsc --version

# Verify tsconfig.json exists
cat tsconfig.json

Causa Raíz: Missing type definitions or incompatible TypeScript version.

Solución:

# Install type definitions
npm install --save-dev @types/node

# Reinstall dependencies (ensures compatible versions)
rm -rf node_modules package-lock.json
npm install

# Clean build cache
rm -rf dist
npm run build

Verificación:

npm run build
# Should compile without errors

Base de datos Issues

Problema: Cannot connect to MongoDB

Síntomas:

MongoServerError: connect ECONNREFUSED 127.0.0.1:27017

Diagnóstico:

# Check if MongoDB is running
mongosh --eval "db.version()"

# Check port
lsof -i :27017

# Test connection with URI
mongosh "mongodb://localhost:27017/orders_ms" --eval "db.runCommand({ ping: 1 })"

Causa Raíz:

MongoDB not running
Wrong connection string
Firewall blocking connection

Solución:

1. Start MongoDB:

# macOS
brew services start mongodb-community@7.0

# Linux
sudo systemctl start mongod

# Docker
docker start algesta-mongodb

2. Verify connection string:

# Correct format
MONGODB_URI=mongodb://localhost:27017/orders_ms

# With auth
MONGODB_URI=mongodb://user:pass@localhost:27017/orders_ms?authSource=admin

3. Check firewall (production):

# Allow port 27017
sudo ufw allow 27017

Verificación:

mongosh "mongodb://localhost:27017/orders_ms"
# Should connect successfully

# Restart service
npm run start:dev

Problema: Slow Base de datos queries

Síntomas:

API responses take > 5 seconds
Logs show “Slow query detected”

Diagnóstico:

// In mongosh
db.setProfilingLevel(1, { slowms: 100 });

// View slow queries
db.system.profile
  .find({ millis: { $gt: 100 } })
  .sort({ ts: -1 })
  .limit(10);

Salida de Ejemplo:

{
  "op": "query",
  "ns": "orders_ms.orders",
  "command": {
    "find": "orders",
    "filter": { "status": "NEW" }
  },
  "millis": 3245,
  "planSummary": "COLLSCAN" // ❌ Collection scan (no index)
}

Causa Raíz: Missing indexes, inefficient queries.

Solución:

1. Create missing indexes:

db.orders.createIndex({ status: 1 });

2. Verify index usage:

db.orders.find({ status: "NEW" }).explain("executionStats");

Esperado:

{
  "winningPlan": {
    "stage": "IXSCAN", // ✅ Index scan
    "indexName": "status_1"
  },
  "executionStats": {
    "executionTimeMillis": 5 // Fast!
  }
}

3. Optimize query:

// Bad: Fetch all fields
await this.orderModel.find({ status: "NEW" });

// Good: Project only needed fields
await this.orderModel.find({ status: "NEW" }, "orderId service address");

Verificación:

# API response time should improve
curl -w "\nTime: %{time_total}s\n" http://localhost:3001/orders?status=NEW

Problema: Base de datos connection pool exhausted

Síntomas:

MongoServerError: connection pool is full, waiting for connection

Diagnóstico:

// Check current connections
db.serverStatus().connections;

Salida de Ejemplo:

{
  "current": 100, // Max reached!
  "available": 0
}

Causa Raíz:

maxPoolSize too low for traffic
Connection leaks (not closing cursors)

Solución:

1. Increase pool size:

maxPoolSize: parseInt(process.env.DB_POOL_MAX ?? '50', 10),  // Increase from 20

2. Find connection leaks:

// Bad: Cursor not closed
const orders = await this.orderModel.find().cursor();
// ... (never closed)

// Good: Use toArray() or explicit close
const orders = await this.orderModel.find().toArray();

3. Restart service:

docker restart algesta-orders-ms

Verificación:

db.serverStatus().connections;
// current should be < maxPoolSize

Docker and Container Issues

Problema: Docker build fails with bcrypt error

Síntomas:

error /app/node_modules/bcrypt: Command failed.

Diagnóstico:

docker logs <container-id>

Causa Raíz: bcrypt native module compiled for wrong Arquitectura.

Solución:

1. Rebuild in Dockerfile:

# In final stage
RUN npm install --force --only=production bcrypt

2. Rebuild image:

docker-compose build --no-cache orders-ms

Verificación:

docker run -it algesta-orders-ms node -e "require('bcrypt').hash('test', 10, console.log)"
# Should output hash without errors

Problema: Container starts then exits immediately

Síntomas:

docker ps
# Container not listed (exited)

docker ps -a
# Shows "Exited (1) 2 seconds ago"

Diagnóstico:

docker logs algesta-orders-ms

Salida de Ejemplo:

Error: Cannot connect to MongoDB

Causa Raíz: Application crashes on startup (DB connection, missing env vars).

Solución:

1. Check environment variables:

docker exec algesta-orders-ms printenv | grep MONGODB_URI

2. Fix docker-compose.yml:

environment:
  - MONGODB_URI=mongodb://mongodb:27017/orders_ms # Use service name, not localhost

3. Ensure dependencies healthy:

depends_on:
  mongodb:
    condition: service_healthy

4. Restart:

docker-compose down
docker-compose up -d

Verificación:

docker-compose ps
# All services should show "Up"

Problema: Puppeteer/Chromium not found in container

Síntomas:

Error: Failed to launch Chrome!

Diagnóstico:

docker exec algesta-orders-ms which chromium
# Should return path

Causa Raíz: Missing Chromium dependencies in Dockerfile.

Solución:

1. Verify Dockerfile has dependencies:

RUN apt-get update && \
    apt-get install -y wget ca-certificates fonts-liberation libappindicator3-1 ...

2. Set environment variable:

environment:
  - PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=false

3. Rebuild:

docker-compose build --no-cache orders-ms

Verificación:

docker exec algesta-orders-ms chromium --version
# Chrome/Chromium 120.x.x

Pruebas Issues

Problema: Pruebas fail with “Jest has detected the following 1 open handle”

Síntomas:

Jest has detected the following 1 open handle potentially keeping Jest from exiting:
  ● TCPSERVERWRAP

Diagnóstico:

npm run test:cov
# Look for open handles warning

Causa Raíz: Base de datos connection or server not closed after Pruebas.

Solución:

1. Add global teardown:

export default async function globalTeardown() {
  await mongoose.disconnect();
}

2. Update jest.config.js:

module.exports = {
  globalTeardown: "<rootDir>/test/global-teardown.ts",
};

3. Close connections in Pruebas:

afterAll(async () => {
  await app.close();
  await mongoose.disconnect();
});

Verificación:

npm run test
# Should exit cleanly without "open handle" warning

Problema: E2E Pruebas fail with “Timeout waiting for selector”

Symptoms (Playwright):

TimeoutError: Waiting for selector ".orders-table" timed out after 30000ms

Diagnóstico:

# Run with debug mode
pnpm test:e2e:debug

Causa Raíz: Element not rendered in time, network delay.

Solución:

1. Increase timeout:

await page.waitForSelector(".orders-table", { timeout: 60000 });

2. Use better wait strategy:

// Bad: Fixed timeout
await page.waitForTimeout(5000);

// Good: Wait for network idle
await page.waitForLoadState("networkidle");
await page.waitForSelector(".orders-table");

3. Check API is running:

curl http://localhost:3000/health

Verificación:

pnpm test:e2e
# Tests should pass

Despliegue Issues

Problema: Azure Pipeline fails at Docker push stage

Síntomas:

Error: unauthorized: authentication required

Diagnóstico: Check Azure Pipeline logs for “Push Docker Image” step.

Causa Raíz: GCP service connection expired or misconfigured.

Solución:

1. Verify service connection:

Azure DevOps → Project Settings → Service Connections
Select “GCR_ServiceConnection”
Test connection

2. Update service account key:

# Generate new key in GCP
gcloud iam service-accounts keys create key.json \
  --iam-account=azure-pipelines@project.iam.gserviceaccount.com

# Update KEY_GCP secret variable in Azure Pipeline

3. Retry pipeline:

# In Azure DevOps UI, click "Retry" on failed stage

Verificación: Pipeline should Completo successfully, image pushed to Artifact Registry.

Problema: Terraform apply fails with “resource already exists”

Síntomas:

Error: Error creating Service: googleapi: Error 409: Resource 'algesta-orders-ms-dev' already exists

Diagnóstico:

cd infrastructure/terraform-dev
terraform state list

Causa Raíz: Resource exists in GCP but not in Terraform state.

Solución:

1. Import existing resource:

terraform import google_cloud_run_service.orders_ms \
  projects/PROJECT_ID/locations/us-east1/services/algesta-orders-ms-dev

2. Re-run apply:

terraform plan
terraform apply

Verificación:

terraform state show google_cloud_run_service.orders_ms
# Should show resource details

Runtime and Production Issues

Problema: Service returns 502 Bad Gateway

Síntomas: Users report 502 errors when accessing API.

Diagnóstico:

# Check Cloud Run service logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=algesta-orders-ms-prod" --limit 50

Salida de Ejemplo:

ERROR: Unhandled rejection: MongoServerError: connection timeout

Causa Raíz: Service crashes on startup or unhealthy.

Solución:

1. Check service health:

gcloud run services describe algesta-orders-ms-prod --region=us-east1
# Look for "Ready: False"

2. Verify environment variables:

gcloud run services describe algesta-orders-ms-prod --region=us-east1 --format="value(spec.template.spec.containers[0].env)"

3. Check Base de datos connectivity:

# Test from Cloud Shell
mongosh "$MONGODB_URI" --eval "db.runCommand({ ping: 1 })"

4. Rollback to previous version:

gcloud run services update-traffic algesta-orders-ms-prod \
  --to-revisions=algesta-orders-ms-prod-00005-abc=100 \
  --region=us-east1

Verificación:

curl https://algesta-orders-ms-prod-xyz.run.app/health
# Should return 200 OK

Problema: High memory usage, OOM kills

Síntomas:

Error: Process out of memory

Diagnóstico:

# Check Cloud Run metrics
gcloud run services describe algesta-orders-ms-prod --region=us-east1

# View memory usage
gcloud logging read "resource.type=cloud_run_revision AND textPayload=~'out of memory'" --limit 10

Causa Raíz: Memory leaks, large data processing, insufficient memory allocation.

Solución:

1. Increase memory limit (Terraform):

resource "google_cloud_run_service" "orders_ms" {
  template {
    spec {
      containers {
        resources {
          limits = {
            memory = "1Gi"  # Increase from 512Mi
          }
        }
      }
    }
  }
}

2. Profile memory:

// Add heap snapshot
import * as v8 from "v8";
v8.writeHeapSnapshot();

3. Check for leaks:

# Use clinic.js or node --inspect
node --inspect dist/main.js

Verificación: Monitor Cloud Run Métricas after Despliegue, memory usage should stabilize.

Integration and External Services

Problema: SendGrid emails not sending

Síntomas: Users report not receiving emails.

Diagnóstico:

# Check logs for SendGrid errors
grep -i sendgrid logs/app.log

# Test SendGrid API key
curl -X POST https://api.sendgrid.com/v3/mail/send \
  -H "Authorization: Bearer $SENDGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"personalizations":[{"to":[{"email":"test@example.com"}]}],"from":{"email":"noreply@algesta.com"},"subject":"Test","content":[{"type":"text/plain","value":"Test"}]}'

Causa Raíz:

Invalid API key
Email quota exceeded
Sender not verified

Solución:

1. Verify API key:

# In SendGrid dashboard, check API key status

2. Check quota:

SendGrid free tier: 100 emails/day
Upgrade if needed

3. Verify sender:

SendGrid → Sender Authentication
Verify “noreply@algesta.com”

Verificación:

// Send test email
await this.emailService.sendEmail({
  to: "test@yourmail.com",
  subject: "Test",
  body: "Test email",
});

Performance Issues

Problema: API response times > 5 seconds

Síntomas: Dashboard feels slow, API requests timeout.

Diagnóstico:

# Profile API request
curl -w "\nTotal time: %{time_total}s\n" http://localhost:3001/orders

# Check database query times
db.system.profile.find({ millis: { $gt: 1000 } })

Causa Raíz:

Missing indexes
N+1 queries
Large data fetches

Solución:

1. Add indexes (see Base de datos Issues):

db.orders.createIndex({ status: 1, createdAt: -1 });

2. Optimize queries:

// Bad: N+1 query
for (const order of orders) {
  const provider = await this.providerModel.findOne({
    providerId: order.providerId,
  });
}

// Good: Batch fetch
const providerIds = orders.map((o) => o.providerId);
const providers = await this.providerModel.find({
  providerId: { $in: providerIds },
});

3. Add caching (Redis):

const cached = await this.redis.get(`order:${orderId}`);
if (cached) return JSON.parse(cached);

const order = await this.orderModel.findOne({ orderId });
await this.redis.setex(`order:${orderId}`, 3600, JSON.stringify(order));

Verificación:

curl -w "\nTotal time: %{time_total}s\n" http://localhost:3001/orders
# Time should be < 1s

Related Guías:

Local Development Setup: Setup-related issues
[Base de datos Setup](/04-guides/Base de datos-setup/): Base de datos troubleshooting
Docker Setup: Container issues
Pruebas Guía: Test failures
Despliegue Guía: CI/CD and production issues

For Support:

Review service logs: docker logs <container> or Cloud Run logs
Check Azure Pipeline logs for build/deploy failures
Contact team lead for production access and escalation