Skip to content

Latest commit

 

History

History
437 lines (336 loc) · 15.4 KB

File metadata and controls

437 lines (336 loc) · 15.4 KB

AWS Cloud Engineering System - Complete Summary

Version: 1.0 Date: October 2025 Status: Phase 5 Complete ✅


Overview

Complete AWS cloud engineering orchestration system for RooCode featuring 1 orchestrator and 9 specialized agents covering serverless, databases, AI/ML services, and DevOps operations.

File: aws-complete.yaml (290KB)


What's Included

1 Orchestrator

☁️ AWS Cloud Orchestrator (aws-cloud-orchestrator-export.yaml)

  • Coordinates Lambda, DynamoDB, S3, API Gateway, Step Functions, EventBridge workflows
  • Launches specialists based on problem category (troubleshooting, architecture, DevOps, cost)
  • Aggregates results from multiple specialists
  • Provides complete solutions with IaC, validation, and cost analysis

Triggers specialists for:

  • Troubleshooting → Lambda, Database, Integration, AI Services troubleshooters
  • Architecture → Serverless, Contact Center, AI Solutions architects
  • DevOps → Cloud Engineer (IaC/CI/CD)
  • Cost → Cost Optimizer

9 Specialists

Troubleshooting (4 Specialists)

1. λ Lambda Troubleshooter

File: lambda-troubleshooter-export.yaml

Diagnoses:

  • Timeout issues (VPC connectivity, slow downstream services, inefficient code)
  • Out of memory errors (large files, memory leaks, insufficient allocation)
  • Permission errors (IAM role troubleshooting)
  • VPC connectivity (VPC endpoints, NAT Gateway requirements)
  • Cold start optimization (reduce init time, provisioned concurrency)

Provides:

  • CloudWatch Log analysis
  • Configuration fixes (timeout, memory, VPC)
  • IAM policy additions
  • Code optimizations
  • Cost impact analysis

2. 🗄️ Database Troubleshooter

File: database-troubleshooter-export.yaml

Diagnoses:

  • Throttling (hot partitions, insufficient capacity, GSI throttling)
  • Slow queries (Scan vs Query, missing GSI, large result sets)
  • Capacity planning (on-demand vs provisioned analysis)
  • Schema design issues (access pattern mismatches)
  • Cost optimization (right-sizing, TTL, storage optimization)

Provides:

  • Capacity mode recommendations (on-demand vs provisioned)
  • GSI creation for access patterns
  • Schema redesign for single-table patterns
  • Auto-scaling configuration
  • Cost calculations and savings estimates

3. 🔗 Integration Troubleshooter

File: integration-troubleshooter-export.yaml

Diagnoses:

  • API Gateway errors (502, 403, 429, 504)
  • Step Functions failures (state errors, timeouts, permission issues)
  • EventBridge issues (events not triggering, pattern matching)
  • SNS/SQS problems (message delivery, permissions, DLQ)
  • Cross-service permission errors

Provides:

  • Request flow tracing
  • IAM permission fixes
  • API Gateway configuration
  • EventBridge pattern debugging
  • Step Functions error handling patterns

4. 🤖 AI Services Troubleshooter

File: ai-services-troubleshooter-export.yaml

Diagnoses:

  • Amazon Bedrock errors (access denied, throttling, validation)
  • Amazon Lex issues (intent not recognized, slot capture failures, fulfillment errors)
  • Prompt engineering problems (hallucinations, poor responses)
  • Quota management (tokens per minute, requests per second)
  • Performance optimization (latency, streaming, caching)

Provides:

  • Model access enablement steps
  • Bedrock retry logic with exponential backoff
  • Lex bot configuration fixes
  • Prompt engineering improvements (RAG patterns)
  • Quota increase requests and rate limiting

Architecture (3 Specialists)

5. ⚡ Serverless Architect

File: serverless-architect-export.yaml

Designs:

  • REST API architectures (Lambda + API Gateway + DynamoDB)
  • Event-driven architectures (EventBridge, SNS/SQS)
  • Step Functions workflows (multi-step processes with retry/error handling)
  • Data processing pipelines (S3 + Lambda)
  • Microservices with loose coupling

Delivers:

  • Complete architecture diagrams
  • DynamoDB single-table schema designs
  • Lambda function specifications (memory, timeout, IAM)
  • API Gateway configuration (HTTP API vs REST API)
  • Security design (JWT auth, IAM, encryption)
  • Cost estimates (1M, 10M requests/month)
  • Monitoring and alerting setup

6. 📞 Contact Center Architect

File: contact-center-architect-export.yaml

Designs:

  • AWS Connect contact flow architectures (IVR trees, routing logic)
  • Customer data management (DynamoDB schemas for call state, history)
  • Call recording and analytics (S3 + Transcribe + EventBridge)
  • Real-time dashboards (EventBridge rules for VIP alerts, long waits)
  • CRM integration patterns (Salesforce, custom CRMs)
  • Omnichannel design (voice, chat, tasks)

Delivers:

  • Contact flow designs with Lambda integration points
  • DynamoDB schemas (Customers, ContactHistory, CallState, AgentStatus)
  • S3 bucket structure with lifecycle policies
  • EventBridge event patterns for real-time processing
  • Cost estimates (10K, 100K contacts/month)
  • PCI/HIPAA compliance controls

7. 🧠 AI Solutions Architect

File: ai-solutions-architect-export.yaml

Designs:

  • RAG architectures (embeddings, vector search, context injection)
  • Conversational AI (Lex bots + Bedrock integration)
  • Document processing (Textract + Bedrock summarization)
  • AI agent patterns (tool use, planning, memory)
  • Cost-optimized AI systems (model selection, caching, batching)

Delivers:

  • Complete RAG pipeline (ingestion + query with vector DB)
  • Bedrock integration patterns (streaming, batch, prompt caching)
  • Lex bot designs (intents, slots, fulfillment with Bedrock)
  • Tool use implementation (multi-agent orchestration)
  • Cost optimization strategies (90% savings with prompt caching)
  • Prompt engineering templates
  • Quality metrics and evaluation

DevOps (2 Specialists)

8. 🛠️ Cloud Engineer

File: cloud-engineer-export.yaml

Implements:

  • Infrastructure as Code (CloudFormation, CDK, Terraform)
  • IAM roles and policies (least privilege)
  • CI/CD pipelines (GitHub Actions, CodePipeline)
  • Deployment strategies (blue/green, canary with auto-rollback)
  • Monitoring and alerting (CloudWatch dashboards, alarms)

Provides:

  • Complete CloudFormation templates (VPC, Lambda, DynamoDB, API Gateway)
  • Full IAM policies (no wildcards, condition-based security)
  • GitHub Actions workflows (OIDC auth, multi-stage deployment)
  • CloudWatch dashboards with custom metrics
  • CodeDeploy configurations (canary with smoke tests)
  • Rollback procedures for all deployments

9. 💰 Cost Optimizer

File: cost-optimizer-export.yaml

Analyzes:

  • AWS cost breakdown by service
  • Lambda right-sizing (memory utilization analysis)
  • DynamoDB capacity mode optimization (on-demand vs provisioned)
  • S3 lifecycle policies and storage class transitions
  • Architecture cost efficiency
  • Waste identification (unused resources)

Provides:

  • Cost Explorer analysis with 6-month trends
  • Service-specific optimization recommendations
  • Lambda memory sizing with cost calculations
  • DynamoDB reserved capacity analysis (up to 77% savings)
  • S3 lifecycle policy templates
  • Architecture cost review (API Gateway REST → HTTP API migration)
  • Monthly savings estimates with implementation priority

1 Context File

AWS Services Reference

File: aws-services-complete.md

Covers:

  • Lambda: Architecture, common issues (timeout, memory, VPC, permissions), best practices
  • DynamoDB: Data model, capacity modes, throttling, schema design, cost optimization
  • API Gateway: REST vs HTTP API, error codes (502, 403, 429), configuration, cost
  • Step Functions: State types, error handling, retry patterns, cost (Standard vs Express)
  • EventBridge: Event patterns, rules, targets, DLQ configuration
  • Bedrock: Model invocation, prompt engineering, streaming, quotas, cost
  • Lex: Bot configuration, intents, slots, fulfillment, Lambda integration
  • IAM: Least privilege patterns, common policies by service
  • CloudWatch: Key metrics, alarms, log analysis, X-Ray tracing

How It Works

Example 1: Troubleshooting Workflow

User: "My Lambda function keeps timing out when calling DynamoDB"

AWS Cloud Orchestrator:

  1. Asks clarifying questions (function name, timeout setting, VPC status)
  2. Launches Lambda Troubleshooter → Diagnoses VPC without VPC endpoint
  3. Launches Database Troubleshooter → Confirms DynamoDB has capacity
  4. Launches Cloud Engineer → Provides VPC endpoint CloudFormation template
  5. Aggregates results → Complete solution with validation steps

Output:

  • Root cause: Lambda in VPC cannot reach DynamoDB (no VPC endpoint or NAT Gateway)
  • Solution: Add DynamoDB VPC endpoint (free, faster, more secure)
  • CloudFormation template for VPC endpoint
  • Validation commands
  • Cost impact: $0 (VPC endpoints for DynamoDB/S3 are free)

Example 2: Architecture Workflow

User: "Design a serverless API for managing user data with authentication"

AWS Cloud Orchestrator:

  1. Asks requirements (traffic, data access patterns, auth method)
  2. Launches Serverless Architect → Designs complete API architecture
  3. Launches Cloud Engineer → Provides CloudFormation + GitHub Actions CI/CD
  4. Aggregates results → Production-ready architecture + deployment pipeline

Output:

  • Architecture: API Gateway (HTTP API) + Lambda + DynamoDB (single-table design)
  • Security: JWT authorizer, IAM least privilege, encryption at rest
  • Monitoring: CloudWatch dashboards, alarms, X-Ray tracing
  • Cost estimate: ~$12-18/month at 1M requests
  • CloudFormation template (all infrastructure)
  • GitHub Actions workflow (lint, test, deploy-staging, deploy-production)
  • Deployment runbook with rollback procedures

Example 3: AI Solutions Workflow

User: "Build a RAG system for our knowledge base with Bedrock"

AWS Cloud Orchestrator:

  1. Asks about knowledge base (size, format, query volume)
  2. Launches AI Solutions Architect → Designs RAG architecture
  3. Launches Cloud Engineer → Implements ingestion + query pipeline
  4. Aggregates results → Complete RAG system ready to deploy

Output:

  • Architecture: S3 (docs) → Lambda (chunking) → Bedrock (embeddings) → OpenSearch Serverless (vector DB)
  • Query pipeline: User query → Bedrock embeddings → k-NN search → context assembly → Bedrock LLM
  • Prompt templates with caching (90% cost savings)
  • Complete Python implementation
  • CloudFormation for all infrastructure
  • Cost estimate: ~$611/month for 10K queries/day (optimized to ~$100 with caching)

Import Instructions

Quick Import (Recommended)

# File: aws-complete.yaml (290KB)
# Import via RooCode: Settings → Custom Modes → Import

Includes all 10 modes:

  • 1 orchestrator
  • 9 specialists

Selective Import

# Troubleshooting only
cat specialists/aws/troubleshooting/*.yaml > aws-troubleshooting-only.yaml

# Architects only
cat specialists/aws/architects/*.yaml > aws-architects-only.yaml

# DevOps only
cat specialists/aws/devops/*.yaml > aws-devops-only.yaml

Key Differentiators

vs Network Engineering System

  • Cloud-native: Serverless, managed services, pay-per-use
  • DevOps integrated: CI/CD, IaC, deployment pipelines
  • AI/ML capabilities: Bedrock, Lex, RAG architectures
  • Cost-conscious: Every specialist considers cost optimization
  • Always online: No offline mode (AWS requires connectivity)

Coverage

  • Troubleshooting: Lambda, DynamoDB, API Gateway, Step Functions, EventBridge, SNS/SQS, Bedrock, Lex
  • Architecture: Serverless APIs, contact centers, AI solutions, event-driven systems
  • Implementation: CloudFormation/CDK, IAM, CI/CD, monitoring, blue/green deployments
  • Optimization: Cost analysis, right-sizing, architecture efficiency

Use Cases

Troubleshooting

  • Lambda timeout in VPC → VPC endpoint setup
  • DynamoDB throttling → Capacity mode switch or GSI creation
  • API Gateway 502 errors → Lambda response format fix
  • Bedrock access denied → Model access enablement
  • EventBridge events not triggering → Pattern matching fix

Architecture

  • REST API design → Complete serverless stack with auth
  • Contact center backend → AWS Connect + Lambda + DynamoDB
  • RAG system → Document ingestion + vector search + LLM
  • Event-driven microservices → EventBridge + Lambda
  • Multi-step workflows → Step Functions with error handling

Implementation

  • Infrastructure deployment → CloudFormation templates
  • CI/CD setup → GitHub Actions with blue/green deployment
  • Monitoring → CloudWatch dashboards + alarms + X-Ray
  • Security → IAM least privilege policies
  • Cost optimization → Right-sizing + architecture review

Integration with Network Engineering System

Complementary Systems:

  • Network Engineering: On-prem routing, switching, VoIP, ISE (Phases 1-4)
  • AWS Cloud: Serverless, databases, AI, DevOps (Phase 5)

Full-stack engineers can import both:

  • Total: 4 orchestrators + 20 specialists
  • Use RooCode mode groups to switch contexts
  • Network work → Network/VoIP/Security orchestrators
  • Cloud work → AWS Cloud orchestrator

Files Created

roocode-network-engineering/
├── aws-complete.yaml (290KB) ← Import this file
│
├── orchestrators/
│   └── aws-cloud-orchestrator-export.yaml (23KB)
│
├── specialists/aws/
│   ├── troubleshooting/
│   │   ├── lambda-troubleshooter-export.yaml (27KB)
│   │   ├── database-troubleshooter-export.yaml (31KB)
│   │   ├── integration-troubleshooter-export.yaml (25KB)
│   │   └── ai-services-troubleshooter-export.yaml (29KB)
│   │
│   ├── architects/
│   │   ├── serverless-architect-export.yaml (32KB)
│   │   ├── contact-center-architect-export.yaml (35KB)
│   │   └── ai-solutions-architect-export.yaml (38KB)
│   │
│   └── devops/
│       ├── cloud-engineer-export.yaml (42KB)
│       └── cost-optimizer-export.yaml (28KB)
│
└── contexts/
    └── aws-services-complete.md (18KB)

Next Steps

  1. Import: Load aws-complete.yaml into RooCode
  2. Test: Try AWS Cloud Orchestrator with "Lambda timing out when calling DynamoDB"
  3. Organize: Create "AWS Cloud" mode group if using both network + cloud systems
  4. Use: Start using for real AWS troubleshooting, architecture, and DevOps work

Success Metrics

Complete coverage: Lambda, DynamoDB, API Gateway, Step Functions, EventBridge, Bedrock, Lex ✅ Production-ready: All IaC templates are complete (no placeholders) ✅ Cost-conscious: Every specialist provides cost analysis ✅ DevOps integrated: CI/CD, monitoring, blue/green deployments included ✅ AI-enabled: RAG, conversational AI, prompt engineering covered


Phase 5 Complete! AWS Cloud Engineering System ready for use. 🚀