Saltearse al contenido

Solución de Problemas

Tabla de Contenidos

  1. Propósito
  2. ¿Para quién es esto?
  3. Metodología de Solución de Problemas
  4. Problemas de Entorno de Desarrollo
  5. Problemas de Base de Datos
  6. Problemas de Docker y Contenedores
  7. Problemas de Pruebas
  8. Problemas de Despliegue
  9. Problemas de Runtime y Producción
  10. Integración y Servicios Externos
  11. Problemas de Rendimiento

Propósito

Esta guía proporciona procedimientos sistemáticos de solución de problemas para issues comunes encontrados en la plataforma Algesta a través de entornos de desarrollo, pruebas, despliegue y producción. Ofrece comandos de diagnóstico, análisis de causa raíz y soluciones paso a paso.

Siguiendo esta guía, podrás:

  • Diagnosticar problemas metódicamente usando técnicas probadas
  • Resolver problemas comunes de desarrollo y despliegue
  • Usar comandos de diagnóstico para recopilar información
  • Escalar problemas no resueltos efectivamente

¿Para quién es esto?

Esta guía es para desarrolladores depurando problemas locales, ingenieros DevOps resolviendo problemas de despliegue e ingenieros de soporte solucionando incidentes de producción. Asume familiaridad con herramientas de línea de comandos, análisis de logs y la arquitectura de Algesta.


Metodología de Resolución de Problemas

Enfoque Sistemático

graph LR
    A[1. Identify Symptoms<br/>Error messages?<br/>Failed checks?] --> B[2. Gather Info<br/>Logs, errors<br/>Environment]
    B --> C[3. Isolate Problem<br/>Which component?<br/>Which environment?]
    C --> D[4. Hypothesize Cause<br/>Analyze symptoms<br/>Check logs]
    D --> E[5. Test Hypothesis<br/>Run diagnostics<br/>Reproduce issue]
    E --> F[6. Apply Solution<br/>Fix and verify]
    F --> G[7. Document<br/>Update guide<br/>Share knowledge]

    style A fill:#ffebee,stroke:#c62828
    style B fill:#e1f5fe,stroke:#0277bd
    style C fill:#fff9c4,stroke:#f9a825
    style D fill:#f3e5f5,stroke:#6a1b9a
    style E fill:#e8f5e9,stroke:#2e7d32
    style F fill:#fce4ec,stroke:#ad1457
    style G fill:#e0f2f1,stroke:#00695c

Pasos:

  1. Identify Symptoms: What is the observed behavior? Error messages? Failed health checks?
  2. Gather Information: Logs, error stacks, environment state
  3. Isolate the Problem: Which Componente? Which environment?
  4. Hypothesize Cause: Based on symptoms and logs
  5. Test Hypothesis: Apply diagnostic commands, reproduce issue
  6. Apply Solution: Fix and verify
  7. Documento: Update this Guía if new issue discovered

Hoja de Referencia de Comandos de Diagnóstico

Ventana de terminal
# Check service health
curl http://localhost:3001/health
# View logs
docker logs algesta-orders-ms -f
# or
npm run start:dev # Watch console output
# Check running processes
ps aux | grep node
# Check port usage
lsof -i :3001
# Test database connection
mongosh "mongodb://localhost:27017/orders_ms" --eval "db.runCommand({ ping: 1 })"
# Test Redis connection
redis-cli -h localhost -p 6379 ping
# Check environment variables
printenv | grep MONGODB_URI
# Check Docker containers
docker ps
docker-compose ps
# Check disk space
df -h
# Check memory usage
free -h # Linux
vm_stat # macOS

Development Environment Issues

Problema: Service fails to start with “Cannot find module”

Síntomas:

Error: Cannot find module '@nestjs/common'

Diagnóstico:

Ventana de terminal
# Check if node_modules exists
ls -la node_modules/
# Verify package.json exists
cat package.json

Causa Raíz: Dependencies not installed or node_modules deleted.

Solución:

Ventana de terminal
# Reinstall dependencies
npm install --force
# Or with clean cache
rm -rf node_modules package-lock.json
npm cache clean --force
npm install

Verificación:

Ventana de terminal
npm run start:dev
# Service should start successfully

Problema: Port already in use

Síntomas:

Error: listen EADDRINUSE: address already in use :::3001

Diagnóstico:

Ventana de terminal
# Find process using port 3001
lsof -i :3001
# or
netstat -tuln | grep 3001

Salida de Ejemplo:

COMMAND PID USER FD TYPE DEVICE NODE NAME
node 1234 user 22u IPv6 0x... 0t0 TCP *:3001 (LISTEN)

Causa Raíz: Another instance of service or different service using same port.

Solución:

Option 1: Kill process

Ventana de terminal
kill -9 1234 # Replace with actual PID

Option 2: Change port

# Edit .env
PORT=3011

Option 3: Find and stop other instance

Ventana de terminal
# If it's a Docker container
docker ps | grep 3001
docker stop <container-id>

Verificación:

Ventana de terminal
lsof -i :3001
# No output = port is free

Problema: Environment variables not loading

Síntomas:

Configuration validation error: MONGODB_URI is required

Diagnóstico:

Ventana de terminal
# Check if .env file exists
ls -la .env
# Verify file permissions
ls -l .env
# Check variable in environment
printenv MONGODB_URI

Causa Raíz: .env file missing, incorrect location, or not loaded.

Solución:

1. Create .env file:

Ventana de terminal
cp .env.example .env

2. Verify location:

Ventana de terminal
# .env must be in repo root (e.g., algesta-ms-orders-nestjs/.env)
ls -la .env

3. Check dotenv loading:

// In main.ts or config module
import * as dotenv from "dotenv";
dotenv.config();
console.log("MONGODB_URI:", process.env.MONGODB_URI);

4. Restart service:

Ventana de terminal
# Kill all node processes
pkill -f node
# Restart
npm run start:dev

Verificación:

Ventana de terminal
# Service logs should show no config errors
curl http://localhost:3001/health

Problema: TypeScript compilation errors

Síntomas:

error TS2307: Cannot find module '@nestjs/core' or its corresponding type declarations

Diagnóstico:

Ventana de terminal
# Check TypeScript version
npx tsc --version
# Verify tsconfig.json exists
cat tsconfig.json

Causa Raíz: Missing type definitions or incompatible TypeScript version.

Solución:

Ventana de terminal
# Install type definitions
npm install --save-dev @types/node
# Reinstall dependencies (ensures compatible versions)
rm -rf node_modules package-lock.json
npm install
# Clean build cache
rm -rf dist
npm run build

Verificación:

Ventana de terminal
npm run build
# Should compile without errors

Base de datos Issues

Problema: Cannot connect to MongoDB

Síntomas:

MongoServerError: connect ECONNREFUSED 127.0.0.1:27017

Diagnóstico:

Ventana de terminal
# Check if MongoDB is running
mongosh --eval "db.version()"
# Check port
lsof -i :27017
# Test connection with URI
mongosh "mongodb://localhost:27017/orders_ms" --eval "db.runCommand({ ping: 1 })"

Causa Raíz:

  • MongoDB not running
  • Wrong connection string
  • Firewall blocking connection

Solución:

1. Start MongoDB:

Ventana de terminal
# macOS
brew services start mongodb-community@7.0
# Linux
sudo systemctl start mongod
# Docker
docker start algesta-mongodb

2. Verify connection string:

# Correct format
MONGODB_URI=mongodb://localhost:27017/orders_ms
# With auth
MONGODB_URI=mongodb://user:pass@localhost:27017/orders_ms?authSource=admin

3. Check firewall (production):

Ventana de terminal
# Allow port 27017
sudo ufw allow 27017

Verificación:

Ventana de terminal
mongosh "mongodb://localhost:27017/orders_ms"
# Should connect successfully
# Restart service
npm run start:dev

Problema: Slow Base de datos queries

Síntomas:

  • API responses take > 5 seconds
  • Logs show “Slow query detected”

Diagnóstico:

// In mongosh
db.setProfilingLevel(1, { slowms: 100 });
// View slow queries
db.system.profile
.find({ millis: { $gt: 100 } })
.sort({ ts: -1 })
.limit(10);

Salida de Ejemplo:

{
"op": "query",
"ns": "orders_ms.orders",
"command": {
"find": "orders",
"filter": { "status": "NEW" }
},
"millis": 3245,
"planSummary": "COLLSCAN" // ❌ Collection scan (no index)
}

Causa Raíz: Missing indexes, inefficient queries.

Solución:

1. Create missing indexes:

db.orders.createIndex({ status: 1 });

2. Verify index usage:

db.orders.find({ status: "NEW" }).explain("executionStats");

Esperado:

{
"winningPlan": {
"stage": "IXSCAN", // ✅ Index scan
"indexName": "status_1"
},
"executionStats": {
"executionTimeMillis": 5 // Fast!
}
}

3. Optimize query:

// Bad: Fetch all fields
await this.orderModel.find({ status: "NEW" });
// Good: Project only needed fields
await this.orderModel.find({ status: "NEW" }, "orderId service address");

Verificación:

Ventana de terminal
# API response time should improve
curl -w "\nTime: %{time_total}s\n" http://localhost:3001/orders?status=NEW

Problema: Base de datos connection pool exhausted

Síntomas:

MongoServerError: connection pool is full, waiting for connection

Diagnóstico:

// Check current connections
db.serverStatus().connections;

Salida de Ejemplo:

{
"current": 100, // Max reached!
"available": 0
}

Causa Raíz:

  • maxPoolSize too low for traffic
  • Connection leaks (not closing cursors)

Solución:

1. Increase pool size:

database.config.ts
maxPoolSize: parseInt(process.env.DB_POOL_MAX ?? '50', 10), // Increase from 20

2. Find connection leaks:

// Bad: Cursor not closed
const orders = await this.orderModel.find().cursor();
// ... (never closed)
// Good: Use toArray() or explicit close
const orders = await this.orderModel.find().toArray();

3. Restart service:

Ventana de terminal
docker restart algesta-orders-ms

Verificación:

db.serverStatus().connections;
// current should be < maxPoolSize

Docker and Container Issues

Problema: Docker build fails with bcrypt error

Síntomas:

error /app/node_modules/bcrypt: Command failed.

Diagnóstico:

Ventana de terminal
docker logs <container-id>

Causa Raíz: bcrypt native module compiled for wrong Arquitectura.

Solución:

1. Rebuild in Dockerfile:

# In final stage
RUN npm install --force --only=production bcrypt

2. Rebuild image:

Ventana de terminal
docker-compose build --no-cache orders-ms

Verificación:

Ventana de terminal
docker run -it algesta-orders-ms node -e "require('bcrypt').hash('test', 10, console.log)"
# Should output hash without errors

Problema: Container starts then exits immediately

Síntomas:

Ventana de terminal
docker ps
# Container not listed (exited)
docker ps -a
# Shows "Exited (1) 2 seconds ago"

Diagnóstico:

Ventana de terminal
docker logs algesta-orders-ms

Salida de Ejemplo:

Error: Cannot connect to MongoDB

Causa Raíz: Application crashes on startup (DB connection, missing env vars).

Solución:

1. Check environment variables:

Ventana de terminal
docker exec algesta-orders-ms printenv | grep MONGODB_URI

2. Fix docker-compose.yml:

environment:
- MONGODB_URI=mongodb://mongodb:27017/orders_ms # Use service name, not localhost

3. Ensure dependencies healthy:

depends_on:
mongodb:
condition: service_healthy

4. Restart:

Ventana de terminal
docker-compose down
docker-compose up -d

Verificación:

Ventana de terminal
docker-compose ps
# All services should show "Up"

Problema: Puppeteer/Chromium not found in container

Síntomas:

Error: Failed to launch Chrome!

Diagnóstico:

Ventana de terminal
docker exec algesta-orders-ms which chromium
# Should return path

Causa Raíz: Missing Chromium dependencies in Dockerfile.

Solución:

1. Verify Dockerfile has dependencies:

RUN apt-get update && \
apt-get install -y wget ca-certificates fonts-liberation libappindicator3-1 ...

2. Set environment variable:

environment:
- PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=false

3. Rebuild:

Ventana de terminal
docker-compose build --no-cache orders-ms

Verificación:

Ventana de terminal
docker exec algesta-orders-ms chromium --version
# Chrome/Chromium 120.x.x

Pruebas Issues

Problema: Pruebas fail with “Jest has detected the following 1 open handle”

Síntomas:

Jest has detected the following 1 open handle potentially keeping Jest from exiting:
● TCPSERVERWRAP

Diagnóstico:

Ventana de terminal
npm run test:cov
# Look for open handles warning

Causa Raíz: Base de datos connection or server not closed after Pruebas.

Solución:

1. Add global teardown:

test/global-teardown.ts
export default async function globalTeardown() {
await mongoose.disconnect();
}

2. Update jest.config.js:

module.exports = {
globalTeardown: "<rootDir>/test/global-teardown.ts",
};

3. Close connections in Pruebas:

afterAll(async () => {
await app.close();
await mongoose.disconnect();
});

Verificación:

Ventana de terminal
npm run test
# Should exit cleanly without "open handle" warning

Problema: E2E Pruebas fail with “Timeout waiting for selector”

Symptoms (Playwright):

TimeoutError: Waiting for selector ".orders-table" timed out after 30000ms

Diagnóstico:

Ventana de terminal
# Run with debug mode
pnpm test:e2e:debug

Causa Raíz: Element not rendered in time, network delay.

Solución:

1. Increase timeout:

await page.waitForSelector(".orders-table", { timeout: 60000 });

2. Use better wait strategy:

// Bad: Fixed timeout
await page.waitForTimeout(5000);
// Good: Wait for network idle
await page.waitForLoadState("networkidle");
await page.waitForSelector(".orders-table");

3. Check API is running:

Ventana de terminal
curl http://localhost:3000/health

Verificación:

Ventana de terminal
pnpm test:e2e
# Tests should pass

Despliegue Issues

Problema: Azure Pipeline fails at Docker push stage

Síntomas:

Error: unauthorized: authentication required

Diagnóstico: Check Azure Pipeline logs for “Push Docker Image” step.

Causa Raíz: GCP service connection expired or misconfigured.

Solución:

1. Verify service connection:

  • Azure DevOps → Project Settings → Service Connections
  • Select “GCR_ServiceConnection”
  • Test connection

2. Update service account key:

Ventana de terminal
# Generate new key in GCP
gcloud iam service-accounts keys create key.json \
--iam-account=azure-pipelines@project.iam.gserviceaccount.com
# Update KEY_GCP secret variable in Azure Pipeline

3. Retry pipeline:

Ventana de terminal
# In Azure DevOps UI, click "Retry" on failed stage

Verificación: Pipeline should Completo successfully, image pushed to Artifact Registry.


Problema: Terraform apply fails with “resource already exists”

Síntomas:

Error: Error creating Service: googleapi: Error 409: Resource 'algesta-orders-ms-dev' already exists

Diagnóstico:

Ventana de terminal
cd infrastructure/terraform-dev
terraform state list

Causa Raíz: Resource exists in GCP but not in Terraform state.

Solución:

1. Import existing resource:

Ventana de terminal
terraform import google_cloud_run_service.orders_ms \
projects/PROJECT_ID/locations/us-east1/services/algesta-orders-ms-dev

2. Re-run apply:

Ventana de terminal
terraform plan
terraform apply

Verificación:

Ventana de terminal
terraform state show google_cloud_run_service.orders_ms
# Should show resource details

Runtime and Production Issues

Problema: Service returns 502 Bad Gateway

Síntomas: Users report 502 errors when accessing API.

Diagnóstico:

Ventana de terminal
# Check Cloud Run service logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=algesta-orders-ms-prod" --limit 50

Salida de Ejemplo:

ERROR: Unhandled rejection: MongoServerError: connection timeout

Causa Raíz: Service crashes on startup or unhealthy.

Solución:

1. Check service health:

Ventana de terminal
gcloud run services describe algesta-orders-ms-prod --region=us-east1
# Look for "Ready: False"

2. Verify environment variables:

Ventana de terminal
gcloud run services describe algesta-orders-ms-prod --region=us-east1 --format="value(spec.template.spec.containers[0].env)"

3. Check Base de datos connectivity:

Ventana de terminal
# Test from Cloud Shell
mongosh "$MONGODB_URI" --eval "db.runCommand({ ping: 1 })"

4. Rollback to previous version:

Ventana de terminal
gcloud run services update-traffic algesta-orders-ms-prod \
--to-revisions=algesta-orders-ms-prod-00005-abc=100 \
--region=us-east1

Verificación:

Ventana de terminal
curl https://algesta-orders-ms-prod-xyz.run.app/health
# Should return 200 OK

Problema: High memory usage, OOM kills

Síntomas:

Error: Process out of memory

Diagnóstico:

Ventana de terminal
# Check Cloud Run metrics
gcloud run services describe algesta-orders-ms-prod --region=us-east1
# View memory usage
gcloud logging read "resource.type=cloud_run_revision AND textPayload=~'out of memory'" --limit 10

Causa Raíz: Memory leaks, large data processing, insufficient memory allocation.

Solución:

1. Increase memory limit (Terraform):

resource "google_cloud_run_service" "orders_ms" {
template {
spec {
containers {
resources {
limits = {
memory = "1Gi" # Increase from 512Mi
}
}
}
}
}
}

2. Profile memory:

// Add heap snapshot
import * as v8 from "v8";
v8.writeHeapSnapshot();

3. Check for leaks:

Ventana de terminal
# Use clinic.js or node --inspect
node --inspect dist/main.js

Verificación: Monitor Cloud Run Métricas after Despliegue, memory usage should stabilize.


Integration and External Services

Problema: SendGrid emails not sending

Síntomas: Users report not receiving emails.

Diagnóstico:

Ventana de terminal
# Check logs for SendGrid errors
grep -i sendgrid logs/app.log
# Test SendGrid API key
curl -X POST https://api.sendgrid.com/v3/mail/send \
-H "Authorization: Bearer $SENDGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{"personalizations":[{"to":[{"email":"test@example.com"}]}],"from":{"email":"noreply@algesta.com"},"subject":"Test","content":[{"type":"text/plain","value":"Test"}]}'

Causa Raíz:

  • Invalid API key
  • Email quota exceeded
  • Sender not verified

Solución:

1. Verify API key:

Ventana de terminal
# In SendGrid dashboard, check API key status

2. Check quota:

  • SendGrid free tier: 100 emails/day
  • Upgrade if needed

3. Verify sender:

Verificación:

// Send test email
await this.emailService.sendEmail({
to: "test@yourmail.com",
subject: "Test",
body: "Test email",
});

Performance Issues

Problema: API response times > 5 seconds

Síntomas: Dashboard feels slow, API requests timeout.

Diagnóstico:

Ventana de terminal
# Profile API request
curl -w "\nTotal time: %{time_total}s\n" http://localhost:3001/orders
# Check database query times
db.system.profile.find({ millis: { $gt: 1000 } })

Causa Raíz:

  • Missing indexes
  • N+1 queries
  • Large data fetches

Solución:

1. Add indexes (see Base de datos Issues):

db.orders.createIndex({ status: 1, createdAt: -1 });

2. Optimize queries:

// Bad: N+1 query
for (const order of orders) {
const provider = await this.providerModel.findOne({
providerId: order.providerId,
});
}
// Good: Batch fetch
const providerIds = orders.map((o) => o.providerId);
const providers = await this.providerModel.find({
providerId: { $in: providerIds },
});

3. Add caching (Redis):

const cached = await this.redis.get(`order:${orderId}`);
if (cached) return JSON.parse(cached);
const order = await this.orderModel.findOne({ orderId });
await this.redis.setex(`order:${orderId}`, 3600, JSON.stringify(order));

Verificación:

Ventana de terminal
curl -w "\nTotal time: %{time_total}s\n" http://localhost:3001/orders
# Time should be < 1s

Related Guías:

For Support:

  • Review service logs: docker logs <container> or Cloud Run logs
  • Check Azure Pipeline logs for build/deploy failures
  • Contact team lead for production access and escalation