Saltearse al contenido

Infraestructura como Código

Tabla de Contenidos

  1. Propósito
  2. ¿Para quién es esto?
  3. Descripción General de Infraestructura
  4. Estructura de Terraform
  5. Referencia de Módulos
  6. Configuración de Entornos
  7. Gestión de Estado de Terraform
  8. Operaciones Comunes
  9. Mejores Prácticas
  10. Diagrama de Arquitectura

Propósito

Este documento describe la implementación de Infraestructura como Código (IaC) para la plataforma Algesta usando Terraform. Toda la infraestructura de Azure es provisionada y gestionada a través de módulos Terraform, incluyendo clusters AKS, Azure Container Registry, Key Vault, Storage Accounts y bases de datos MongoDB Atlas.

Siguiendo esta guía, entenderás:

  • La estructura modular de Terraform usada a través de entornos
  • Cómo provisionar y actualizar infraestructura de Azure
  • Gestión de estado y configuración de backend
  • Configuraciones específicas por entorno y diferencias
  • Operaciones comunes y solución de problemas

¿Para quién es esto?

Esta guía es para ingenieros DevOps gestionando infraestructura, ingenieros de plataforma provisionando recursos y SREs solucionando problemas de infraestructura. Asume familiaridad con Terraform, Azure, Kubernetes y conceptos de infraestructura.


Infrastructure Descripción General

The Algesta platform infrastructure consists of:

ComponentTechnologyPurposeManaged By
Kubernetes ClusterAzure AKSContainer orchestrationTerraform
Container RegistryAzure ACRDocker image storageTerraform
Secrets ManagementAzure Key VaultCredentials and secretsTerraform
Object StorageAzure Storage AccountTerraform state, application dataTerraform
Base de datosMongoDB AtlasPrimary Base de datos (M0 free tier)Terraform
Load BalancingAKS Ingress (nginx)External traffic routingKubernetes manifests
TLS Certificatescert-manager + Let’s EncryptHTTPS encryptionKubernetes manifests
MonitoringGrafana + Prometheus + LokiObservability stackHelm charts

Regions:

  • Azure Primary Region: East US (configurable)
  • MongoDB Atlas Region: US-EAST-1 (AWS)

Terraform Structure

All infrastructure code is located in the ops-algesta/infrastructure/ directory:

ops-algesta/infrastructure/
├── envs/
│ ├── dev/
│ │ └── Azure/
│ │ ├── main.tf # Dev environment resources
│ │ ├── variables.tf # Dev-specific variables
│ │ ├── provider.tf # Azure provider config
│ │ └── outputs.tf # Resource outputs
│ └── production/
│ └── Azure/
│ ├── main.tf # Production environment resources
│ ├── variables.tf # Production-specific variables
│ ├── provider.tf # Azure provider config
│ └── outputs.tf # Resource outputs
└── modules/
├── aks/ # Azure Kubernetes Service
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── acr/ # Azure Container Registry
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── secrets/ # Azure Key Vault
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── storage/ # Azure Storage Account
│ ├── main.tf
│ └── variables.tf
├── database/
│ └── MongoDB/ # MongoDB Atlas
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
└── events/ # Azure Event Grid (optional)
├── main.tf
├── variables.tf
└── outputs.tf

Design Principles:

  • Environment Isolation: Each environment (dev, production) has separate Terraform state and resources
  • Module Reusability: Common infrastructure patterns abstracted into reusable modules
  • Configuration Management: Environment-specific Valors in variables.tf, not hardcoded
  • State Separation: Each environment maintains independent state to prevent cross-environment drift

Modules Reference

1. AKS Module (modules/aks/)

Propósito: Provisions Azure Kubernetes Service cluster with configurable node pools

Key Resources:

  • azurerm_kubernetes_cluster: Main AKS cluster
  • azurerm_kubernetes_cluster_node_pool: Additional node pools (user workloads)

Configuration Example (from envs/production/Azure/main.tf:20-71):

module "aks" {
source = "../../../modules/aks"
cluster_name = "aks-${var.project_name}-${var.environment}"
location = var.location
resource_group_name = var.rg_name
dns_prefix = "${var.project_name}-${var.environment}"
sku_tier = "Free"
tags = {
environment = "${var.environment}"
}
default_node_pool = {
name = "default"
vm_size = "Standard_B2s"
node_count = 1
os_disk_size_gb = 30
os_type = "Linux"
mode = "System"
enable_auto_scaling = true
max_count = 1
min_count = 1
}
node_pools = {
stdar_pool = {
name = "stdar${var.environment}"
vm_size = "Standard_B2s"
node_count = 1
mode = "User"
os_disk_size_gb = 30
os_type = "Linux"
enable_auto_scaling = true
min_count = 1
max_count = 3
}
}
}

Key Funcionalidades:

  • System Node Pool: Required for AKS system services (CoreDNS, Métricas-server)
  • User Node Pool: Runs application workloads, supports auto-scaling (1-3 nodes)
  • Web App Routing: Enabled for ingress controller integration
  • Managed Identity: System-assigned identity for Azure resource access

Outputs:

  • aks_id: Cluster resource ID
  • kube_config: Kubernetes admin config (sensitive)
  • kubelet_identity_object_id: Identity for ACR pull permissions

2. ACR Module (modules/acr/)

Propósito: Provisions Azure Container Registry for Docker images

Configuration Example (from envs/production/Azure/main.tf:73-84):

module "acr" {
source = "../../../modules/acr"
acr_name = "acr${var.project_name}${var.environment}"
resource_group_name = var.rg_name
location = var.location
sku = "Basic"
admin_enabled = true
tags = {
environment = "${var.environment}"
}
}

SKU Comparison:

FeatureBasicStandardPremium
Storage10 GB100 GB500 GB
Webhook210500
Geo-replicationNoNoYes
Content TrustNoNoYes
Cost$$$$$$

Current Setup: Basic tier (dev/prod), upgradeable as needed

ACR Integration with AKS:

resource "azurerm_role_assignment" "aks_to_acr" {
scope = module.acr.acr_id
role_definition_name = "AcrPull"
principal_id = module.aks.kubelet_identity_object_id
}

This grants AKS nodes permission to pull images from ACR without manual credentials.


3. Key Vault Module (modules/secrets/)

Propósito: Manages secrets, keys, and certificates for the platform

Configuration Example (from envs/production/Azure/main.tf:86-94):

module "keyvault" {
source = "../../../modules/secrets"
name = "akv-${var.project_name}-${var.environment}"
location = var.location
resource_group_name = var.rg_name
tenant_id = var.tenant_id
sku_name = "standard"
}

Key Funcionalidades (from modules/secrets/main.tf):

  • RBAC Authorization: enable_rbac_authorization = true
  • Soft Delete: 7-day retention for deleted secrets
  • Disk Encryption: Enabled for VM disk encryption keys
  • Secret Management: Supports bulk secret creation via for_each

Typical Secrets Stored:

  • MongoDB connection strings (MONGODB_URI)
  • JWT signing keys (JWT_SECRET)
  • API keys for external services
  • TLS certificates (if not using cert-manager)

Access Control:

  • Use Azure RBAC roles: Key Vault Secrets Officer, Key Vault Secrets User
  • Service principals for CI/CD pipelines
  • Managed identities for AKS pods (via Azure Key Vault Provider for Secrets Store CSI Driver)

4. Storage Module (modules/storage/)

Propósito: Provisions Azure Storage Account for Terraform state and application data

Configuration Example (from envs/production/Azure/main.tf:1-17):

module "storage" {
source = "../../../modules/storage"
name_storage_account = "storage${var.project_name}${var.environment}"
resource_group_name = var.rg_name
location = var.location
containers = [
{
container_name = "tf-state"
container_access_type = "private"
},
{
container_name = "data"
container_access_type = "private"
}
]
}

Containers:

  • tf-state: Stores Terraform state files (backend)
  • data: Application file uploads, logs, backups

Storage Configuration (from modules/storage/main.tf):

  • Account Tier: Standard (default, configurable to Premium)
  • Replication: LRS (Locally Redundant Storage) by default, upgradeable to GRS for production
  • CORS: Configurable for web application access
  • Access Tier: Hot (for frequently accessed data)

5. MongoDB Atlas Module (modules/Base de datos/MongoDB/)

Propósito: Provisions MongoDB Atlas cluster (M0 free tier)

Configuration Example:

module "mongodb" {
source = "../../../modules/database/MongoDB"
project_id = var.mongodb_project_id
cluster_name_mongodbatlas = var.project_name
current_environment = var.environment
cluster_type = "REPLICASET"
replication_specs = [
{
num_shards = 1
regions_config = [
{
region_name = "US_EAST_1"
electable_nodes = 3
priority = 7
read_only_nodes = 0
}
]
}
]
cloud_backup = true
auto_scaling_disk_gb_enabled = true
mongo_db_major_version = "7.0"
provider_name = "TENANT" # M0 free tier
provider_instance_size_name = "M0"
admin_username = var.mongodb_admin_username
admin_password = var.mongodb_admin_password
cidr_block_mongodbatlas_project_ip_access_list = "0.0.0.0/0" # Restrict in production
comment_mongodbatlas_project_ip_access_list = "Allow all (for dev)"
}

Key Funcionalidades (from modules/Base de datos/MongoDB/main.tf):

  • Cluster Type: ReplicaSet (3 nodes for high availability)
  • Cloud Backup: Automated snapshots (free tier includes point-in-time restore for 2 days)
  • IP Whitelist: mongodbatlas_project_ip_access_list (currently open, should be restricted)
  • Base de datos User: Admin user with readWrite and atlasAdmin roles

Production Recommendations:

  1. Upgrade to M10+ for dedicated resources and advanced Funcionalidades
  2. Restrict IP whitelist to AKS cluster egress IPs
  3. Enable VPC peering for private connectivity
  4. Configure backup schedules and retention policies

Environment Configuration

Development Environment (envs/dev/Azure/)

Characteristics:

  • Propósito: Pruebas, development, CI/CD integration
  • AKS: 1 system node, 1-3 user nodes (auto-scaling)
  • ACR: Basic tier
  • MongoDB: M0 free tier, 0.0.0.0/0 IP whitelist
  • Cost: Minimal (~$50-100/month)

Namespace Conventions:

  • development: Application Despliegues
  • monitoring: Grafana, Prometheus, Loki

Production Environment (envs/production/Azure/)

Characteristics:

  • Propósito: Live customer traffic
  • AKS: 1 system node, 1-3 user nodes (auto-scaling)
  • ACR: Basic tier (consider Standard for webhooks)
  • MongoDB: M0 free tier (upgrade to M10+ recommended)
  • Cost: ~$100-200/month (depends on usage)

Namespace Conventions:

  • production: Application Despliegues
  • monitoring: Observability stack

Production-Specific Settings:

  • TLS certificates from Let’s Encrypt (via cert-manager)
  • Ingress hosts: algesta-api-prod.3astronautas.com
  • Auto-scaling enabled for resilience

Environment Variables

Each environment requires these variables (in variables.tf):

variable "project_name" {
description = "Project name"
type = string
default = "algesta"
}
variable "environment" {
description = "Environment (dev, production)"
type = string
}
variable "location" {
description = "Azure region"
type = string
default = "East US"
}
variable "rg_name" {
description = "Resource group name"
type = string
}
variable "tenant_id" {
description = "Azure AD tenant ID"
type = string
}
variable "mongodb_project_id" {
description = "MongoDB Atlas project ID"
type = string
sensitive = true
}
variable "mongodb_admin_username" {
description = "MongoDB admin username"
type = string
sensitive = true
}
variable "mongodb_admin_password" {
description = "MongoDB admin password"
type = string
sensitive = true
}

Setting Variables:

  1. Create terraform.tfvars (git-ignored):

    environment = "production"
    rg_name = "rg-algesta-production"
    tenant_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    mongodb_project_id = "xxxxxxxxxxxxxxxxxxxxxxxx"
    mongodb_admin_username = "admin"
    mongodb_admin_password = "SecurePassword123!"
  2. Or set via environment variables:

    Ventana de terminal
    export TF_VAR_mongodb_project_id="xxx"
    export TF_VAR_mongodb_admin_password="xxx"

Terraform State Management

Backend Configuration

Terraform state is stored in Azure Storage Account (created by the storage module).

Backend Config (in provider.tf):

terraform {
backend "azurerm" {
resource_group_name = "rg-algesta-production"
storage_account_name = "storagealgesta production"
container_name = "tf-state"
key = "production.tfstate"
}
}

State File Naming:

  • Dev: dev.tfstate
  • Production: production.tfstate

State Operaciones

View Current State:

Ventana de terminal
terraform state list

Inspect Resource:

Ventana de terminal
terraform state show module.aks.azurerm_kubernetes_cluster.aks

Remove Resource from State (dangerous):

Ventana de terminal
terraform state rm module.aks.azurerm_kubernetes_cluster.aks

Import Existing Resource:

Ventana de terminal
terraform import module.aks.azurerm_kubernetes_cluster.aks \
/subscriptions/xxx/resourceGroups/rg-algesta-production/providers/Microsoft.ContainerService/managedClusters/aks-algesta-production

State Locking

Azure Storage backend supports state locking automatically (via blob lease mechanism). If a lock is stuck:

Ventana de terminal
terraform force-unlock LOCK_ID

Best Practices:

  • Never edit state files manually
  • Use terraform state mv for refactoring
  • Enable versioning on storage account (Azure Blob versioning)
  • Restrict access to state storage (RBAC + private Endpoints)

Common Operaciones

Initial Infrastructure Provisioning

1. Clone Repositorio:

Ventana de terminal
git clone https://dev.azure.com/tres-astronautas/Algesta/_git/ops-algesta
cd ops-algesta/infrastructure/envs/production/Azure

2. Authenticate to Azure:

Ventana de terminal
az login
az account set --subscription "Algesta Production"

3. Authenticate to MongoDB Atlas:

Ventana de terminal
export MONGODB_ATLAS_PUBLIC_KEY="xxx"
export MONGODB_ATLAS_PRIVATE_KEY="xxx"

4. Initialize Terraform:

Ventana de terminal
terraform init

5. Plan Changes:

Ventana de terminal
terraform plan -out=tfplan

6. Apply Infrastructure:

Ventana de terminal
terraform apply tfplan

7. Save Outputs:

Ventana de terminal
terraform output -json > outputs.json

Updating Infrastructure

Scenario: Increase AKS node pool max_count from 3 to 5

1. Edit Configuration:

envs/production/Azure/main.tf
node_pools = {
stdar_pool = {
...
max_count = 5 # Changed from 3
}
}

2. Plan and Review:

Ventana de terminal
terraform plan
# Review changes, ensure only node pool updated

3. Apply Changes:

Ventana de terminal
terraform apply

4. Verify in AKS:

Ventana de terminal
az aks show --resource-group rg-algesta-production --name aks-algesta-production \
--query agentPoolProfiles[].maxCount

Adding New Secrets to Key Vault

1. Update Module Call:

module "keyvault" {
source = "../../../modules/secrets"
...
secrets = {
"mongodb-uri" = var.mongodb_uri
"jwt-secret" = var.jwt_secret
"new-api-key" = var.new_api_key # New secret
}
}

2. Add Variable:

variables.tf
variable "new_api_key" {
description = "API key for external service"
type = string
sensitive = true
}

3. Set Valor in terraform.tfvars:

new_api_key = "sk_live_xxxxxxxxxxxx"

4. Apply:

Ventana de terminal
terraform apply

Destroying Infrastructure (USE WITH CAUTION)

Development Environment Only:

Ventana de terminal
cd envs/dev/Azure
terraform plan -destroy
terraform destroy # Requires confirmation

Production: Never run terraform destroy on production without explicit approval and backup verification.


Best Practices

Security

  1. Never Commit Secrets:

    • Use .gitignore for terraform.tfvars, *.tfstate
    • Store sensitive variables in Azure Key Vault or environment variables
  2. Use Managed Identities:

    • Avoid service principal credentials in code
    • Use AKS managed identity for Azure resource access
  3. Restrict Network Access:

    • Configure MongoDB IP whitelist to AKS egress IPs only
    • Use Azure Private Link for Key Vault, ACR, Storage
  4. Enable RBAC:

    • Use Azure RBAC for resource access control
    • Kubernetes RBAC for pod-level permissions

Operaciones

  1. Always Plan Before Apply:

    Ventana de terminal
    terraform plan -out=tfplan
    # Review changes thoroughly
    terraform apply tfplan
  2. Use Workspaces for Isolation:

    Ventana de terminal
    terraform workspace new staging
    terraform workspace select production
  3. Version Pin Providers:

    terraform {
    required_providers {
    azurerm = {
    source = "hashicorp/azurerm"
    version = "~> 3.0"
    }
    mongodbatlas = {
    source = "mongodb/mongodbatlas"
    version = "~> 1.0"
    }
    }
    }
  4. Documento Changes:

    • Add comments in main.tf for complex logic
    • Update this wiki when infrastructure evolves

CI/CD Integration

  • Terraform apply runs in Azure Pipeline DeployToK8S stage (future Implementación)
  • Use pipeline secret variables for credentials
  • Implement approval gates for production Terraform changes

Arquitectura Diagram

graph TB
    subgraph "Azure Cloud"
        subgraph "Resource Group: rg-algesta-production"
            AKS[AKS Cluster<br/>aks-algesta-production<br/>Free Tier]
            ACR[Azure Container Registry<br/>acralgesta production<br/>Basic SKU]
            KV[Key Vault<br/>akv-algesta-production<br/>Secrets + Certificates]
            SA[Storage Account<br/>storagealgesta production<br/>LRS]

            subgraph "AKS Node Pools"
                SYS[System Pool<br/>default<br/>1 node<br/>Standard_B2s]
                USER[User Pool<br/>stdarproduction<br/>1-3 nodes<br/>Auto-scaling]
            end

            subgraph "Storage Containers"
                TFS[tf-state<br/>Terraform State]
                DATA[data<br/>Application Data]
            end
        end
    end

    subgraph "MongoDB Atlas"
        MONGO[MongoDB Cluster<br/>M0 Free Tier<br/>US-EAST-1<br/>3-node ReplicaSet]
    end

    subgraph "Kubernetes Workloads"
        NS_PROD[Namespace: production<br/>Microservices]
        NS_MON[Namespace: monitoring<br/>Grafana + Prometheus + Loki]
        INGRESS[Ingress<br/>nginx + cert-manager<br/>TLS via Let's Encrypt]
    end

    AKS --> SYS
    AKS --> USER
    USER --> NS_PROD
    USER --> NS_MON
    USER --> INGRESS

    ACR -->|Image Pull| USER
    KV -->|Secrets| USER
    SA --> TFS
    SA --> DATA

    NS_PROD -->|Database Queries| MONGO

    INGRESS -->|HTTPS Traffic| NS_PROD

    style AKS fill:#0078d4,stroke:#004578,color:#fff
    style ACR fill:#0078d4,stroke:#004578,color:#fff
    style KV fill:#ffb900,stroke:#c29400,color:#000
    style SA fill:#0078d4,stroke:#004578,color:#fff
    style MONGO fill:#13aa52,stroke:#0e7a3a,color:#fff

Key Resource Relationships:

  1. AKS → ACR: Kubelet identity granted AcrPull role for image pulls
  2. AKS → Key Vault: Pods access secrets via CSI driver (future Implementación)
  3. Terraform State → Storage Account: Backend stores state in tf-state container
  4. Microservicios → MongoDB: Connection string stored in Key Vault, accessed via environment variables

Related Documentoation:

For Support:

  • Check Terraform plan output for errors
  • Review Azure Portal for resource Estado
  • Verify MongoDB Atlas cluster health in Atlas console
  • Contact infrastructure team for access and credentials