Skip to content

Commit e4f3da0

Browse files
icarthickclaude
andcommitted
feat(migration-to-aws): v2 with 6-phase workflow, AI workload detection, and billing discovery
Rewrite the migration-to-aws plugin from a 4-phase to a 6-phase workflow (discover → clarify → design → estimate → generate → feedback) with three parallel discovery paths: infrastructure, application code, and billing. Key changes: Discover phase: - Add app-code discovery path scanning source for AI/ML frameworks (Gemini, Vertex AI, OpenAI, traditional ML like TensorFlow/PyTorch) - Add billing discovery path with GCP billing export analysis - Enhance IaC discovery with improved Terraform resource clustering using typed-edge strategy and classification rules Clarify phase: - Implement adaptive category-based questioning (global, compute, database, AI, AI-only) that activates based on discover findings - Skip categories when discover already provides sufficient signal Design phase (new): - Separate design from discover with dedicated design-infra, design-ai, and design-billing reference documents - Source-specific AI model mapping via ai-gemini-to-bedrock and ai-openai-to-bedrock reference tables Estimate phase: - Split into estimate-infra, estimate-ai, and estimate-billing - Add pricing-cache with validated rates and confidence levels - Use awspricing MCP server for real-time price validation Generate phase: - Produce Terraform configurations from templates (main.tf, variables.tf) - Generate AI provider adapter (provider_adapter.py) for SDK migration - Generate Bedrock setup scripts and comparison test harnesses - Add billing artifact generation and documentation artifacts - Structured artifact specs for infra, AI, billing, docs, and scripts Feedback phase (new): - Anonymized telemetry trace capturing phase timings, confidence scores, and migration complexity metrics - No PII or source code in traces Supporting changes: - Add JSON schemas for discover-ai, discover-billing, discover-iac, estimate-infra, and phase-status data structures - Update plugin.json version and README - Enhance design-refs with confidence levels, factual corrections, and improved service mapping tables Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c51ed0d commit e4f3da0

56 files changed

Lines changed: 7561 additions & 1580 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

plugins/migration-to-aws/.claude-plugin/plugin.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@
1111
"migration",
1212
"cloud-migration",
1313
"terraform",
14-
"fargate"
14+
"fargate",
15+
"rds",
16+
"eks"
1517
],
1618
"license": "Apache-2.0",
1719
"name": "migration-to-aws",

plugins/migration-to-aws/.mcp.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
],
1111
"command": "uvx",
1212
"env": {
13+
"AWS_REGION": "us-east-1",
1314
"FASTMCP_LOG_LEVEL": "ERROR"
1415
},
1516
"timeout": 120000,

plugins/migration-to-aws/README.md

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,23 @@
11
# GCP-to-AWS Migration Plugin
22

3-
Migrate workloads from Google Cloud Platform to AWS with a 5-phase guided process.
3+
Migrate workloads from Google Cloud Platform to AWS with a 6-phase guided process.
44

55
## Overview
66

77
This plugin guides you through migrating GCP infrastructure to AWS by:
88

9-
1. **Discover** - Scan Terraform files for GCP resources
10-
2. **Clarify** - Answer 8 questions about your migration requirements
9+
1. **Discover** - Scan Terraform files, application code, and/or billing exports for GCP resources
10+
2. **Clarify** - Answer adaptive questions about your migration requirements
1111
3. **Design** - Map GCP services to equivalent AWS services
1212
4. **Estimate** - Calculate monthly costs and ROI
13-
5. **Execute** - Plan your migration timeline and rollback procedures
13+
5. **Generate** - Generate Terraform, migration scripts, AI adapters, and documentation
14+
6. **Feedback** - Collect optional feedback and migration trace (optional)
15+
16+
## Skills
17+
18+
| Skill | Description |
19+
| ------------ | --------------------------------------------------------- |
20+
| `gcp-to-aws` | Migrate GCP workloads to AWS via a 6-phase guided process |
1421

1522
## Usage
1623

@@ -20,12 +27,13 @@ Invoke the skill with migration-related phrases:
2027
- "Move off Google Cloud"
2128
- "Migrate Cloud SQL to RDS"
2229
- "GCP to AWS migration plan"
30+
- "Migrate our Vertex AI workloads to Bedrock"
31+
- "Estimate the cost of moving from GCP to AWS"
2332

2433
## Scope (v1.0)
2534

26-
- **Supports**: Terraform-based GCP infrastructure
27-
- **Generates**: AWS architecture design, cost estimates, execution timeline
28-
- **Does not include** (v1.1+): App code scanning, billing data import, CDK code generation
35+
- **Supports**: Terraform IaC, application code (AI workload detection), and GCP billing exports
36+
- **Generates**: AWS architecture design, cost estimates, Terraform configurations, migration scripts, AI migration code, and documentation
2937

3038
## MCP Servers
3139

@@ -47,10 +55,14 @@ The plugin uses state files (`.migration/[MMDD-HHMM]/`) to track migration progr
4755

4856
- `.phase-status.json` - Current phase and status
4957
- `gcp-resource-inventory.json` - Discovered GCP resources
50-
- `clarified.json` - User requirements
58+
- `preferences.json` - User requirements
5159
- `aws-design.json` - Mapped AWS services
52-
- `estimation.json` - Cost analysis
53-
- `execution.json` - Timeline and risks
60+
- `estimation-infra.json` / `estimation-ai.json` / `estimation-billing.json` - Cost analysis
61+
- `generation-infra.json` / `generation-ai.json` / `generation-billing.json` - Migration plans
62+
- `terraform/` - Generated Terraform configurations
63+
- `scripts/` - Migration scripts
64+
- `ai-migration/` - AI provider adapter and test harness
65+
- `MIGRATION_GUIDE.md` - Step-by-step migration guide
5466

5567
## Installation
5668

plugins/migration-to-aws/skills/gcp-to-aws/SKILL.md

Lines changed: 209 additions & 95 deletions
Large diffs are not rendered by default.

plugins/migration-to-aws/skills/gcp-to-aws/references/clustering/terraform/classification-rules.md

Lines changed: 61 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2,29 +2,57 @@
22

33
Hardcoded lists for classifying GCP resources as PRIMARY or SECONDARY.
44

5+
Each PRIMARY resource is assigned a `tier` indicating its infrastructure layer.
6+
57
## Priority 1: PRIMARY Resources (Workload-Bearing)
68

79
These resource types are always PRIMARY:
810

11+
### Compute (`tier: "compute"`)
12+
913
- `google_cloud_run_service` — Serverless container workload
14+
- `google_cloud_run_v2_service` — Serverless container workload (v2 API)
1015
- `google_container_cluster` — Kubernetes cluster
16+
- `google_container_node_pool` — Kubernetes node pool
1117
- `google_compute_instance` — Virtual machine
12-
- `google_cloudfunctions_function` — Serverless function
18+
- `google_cloudfunctions_function` — Serverless function (Gen 1)
19+
- `google_cloudfunctions2_function` — Serverless function (Gen 2)
20+
- `google_app_engine_application` — App Engine application
21+
22+
### Database (`tier: "database"`)
23+
1324
- `google_sql_database_instance` — Relational database
14-
- `google_firestore_database` — Document database (Firestore instance)
15-
- `google_firestore_document` — Document database (Firestore document resource)
16-
- `google_bigquery_dataset` — Data warehouse
17-
- `google_storage_bucket` — Object storage
25+
- `google_spanner_instance` — Globally-distributed relational database
26+
- `google_firestore_database` — Document database
27+
- `google_bigtable_instance` — Wide-column NoSQL database
1828
- `google_redis_instance` — In-memory cache
29+
30+
### Storage (`tier: "storage"`)
31+
32+
- `google_storage_bucket` — Object storage
33+
- `google_filestore_instance` — Managed NFS file storage
34+
- `google_bigquery_dataset` — Data warehouse
35+
36+
### Messaging (`tier: "messaging"`)
37+
1938
- `google_pubsub_topic` — Message queue
20-
- `google_compute_network` — Virtual network (VPC). Anchors the networking cluster (see clustering-algorithm.md Rule 1)
21-
- `google_dns_managed_zone` — DNS zone
22-
- `google_app_engine_application` — App Engine application
2339
- `google_cloud_tasks_queue` — Task queue
24-
- `google_compute_forwarding_rule` — Load balancer forwarding rule
25-
- `module.*` — Terraform module (treated as primary container)
2640

27-
**Action**: Mark as `PRIMARY`, classification done. No secondary_role.
41+
### Networking (`tier: "networking"`)
42+
43+
- `google_compute_network` — Virtual network (VPC — primary because it defines topology)
44+
- `google_compute_security_policy` — Web application firewall (Cloud Armor)
45+
- `google_dns_managed_zone` — DNS zone
46+
47+
### Monitoring (`tier: "monitoring"`)
48+
49+
- `google_monitoring_alert_policy` — Alert policy
50+
51+
### Other
52+
53+
- `module.*` — Terraform module that wraps primary resources (tier inferred from wrapped resource)
54+
55+
**Action**: Mark as `PRIMARY` with assigned `tier`. Classification done. No secondary_role.
2856

2957
## Priority 2: SECONDARY Resources by Role
3058

@@ -33,10 +61,11 @@ Match resource type against secondary classification table. Each match assigns a
3361
### Identity (`identity`)
3462

3563
- `google_service_account` — Workload identity
64+
- `data.google_service_account` — Data source reference to existing service account
3665

3766
### Access Control (`access_control`)
3867

39-
- `google_*_iam_member` — IAM binding (all variants)
68+
- `google_*_iam_member` — IAM binding (all variants: project, cloud_run_service, storage_bucket, etc.)
4069
- `google_*_iam_policy` — IAM policy (all variants)
4170

4271
### Network Path (`network_path`)
@@ -46,15 +75,18 @@ Match resource type against secondary classification table. Each match assigns a
4675
- `google_compute_firewall` — Firewall rule
4776
- `google_compute_router` — Cloud router
4877
- `google_compute_router_nat` — NAT rule
78+
- `google_compute_global_address` — Global IP address (for VPC peering, load balancing)
4979
- `google_service_networking_connection` — VPC peering
5080

5181
### Configuration (`configuration`)
5282

5383
- `google_sql_database` — SQL schema
5484
- `google_sql_user` — SQL user
85+
- `google_spanner_database` — Spanner database schema
5586
- `google_secret_manager_secret` — Secret vault
87+
- `google_secret_manager_secret_version` — Secret value
5688
- `google_dns_record_set` — DNS record
57-
- `google_monitoring_alert_policy` — Alert rule (skipped in design; no AWS equivalent)
89+
- `google_monitoring_notification_channel` — Alert notification target
5890

5991
### Encryption (`encryption`)
6092

@@ -71,21 +103,24 @@ Match resource type against secondary classification table. Each match assigns a
71103

72104
## Priority 3: LLM Inference Fallback
73105

74-
If resource type not in Priority 1 or 2, apply heuristic patterns:
106+
If resource type not in Priority 1 or 2, apply these **deterministic fallback heuristics** BEFORE free-form LLM reasoning:
75107

76-
- Name contains `scheduler`, `task`, `job``SECONDARY` / `orchestration`
77-
- Name contains `log`, `metric`, `alert`, `trace``SECONDARY` / `configuration`
78-
- Type contains `policy` or `binding``SECONDARY` / `access_control`
79-
- Type contains `network` or `subnet``SECONDARY` / `network_path`
108+
| Pattern | Classification | secondary_role | confidence |
109+
| ---------------------------------------------------- | ----------------- | -------------- | ---------- |
110+
| Name contains `scheduler`, `task`, `job`, `workflow` | SECONDARY | orchestration | 0.65 |
111+
| Name contains `log`, `metric`, `alert`, `dashboard` | SECONDARY | configuration | 0.60 |
112+
| Resource has zero references to/from other resources | SECONDARY | configuration | 0.50 |
113+
| Resource only referenced by a `module` block | SECONDARY | configuration | 0.55 |
114+
| Type contains `policy` or `binding` | SECONDARY | access_control | 0.65 |
115+
| Type contains `network` or `subnet` | SECONDARY | network_path | 0.60 |
116+
| None of the above match | Use LLM reasoning || 0.50-0.75 |
80117

81-
**Default**: If all heuristics fail: `SECONDARY` / `configuration` with confidence 0.5
118+
If still uncertain after heuristics, use LLM reasoning. Mark with:
82119

83-
**Downstream flagging for low-confidence classifications**: Any resource classified with confidence ≤ 0.5 (including the default fallback) MUST be:
120+
- `classification_source: "llm_inference"`
121+
- `confidence: 0.5-0.75`
84122

85-
1. Flagged in `gcp-resource-inventory.json` with `"confidence": 0.5` on the resource entry
86-
2. Added to a `low_confidence_resources[]` warning array in inventory metadata
87-
3. Reported to the user during Phase 1 completion: "⚠️ N resources were classified with low confidence and may need manual review: [list of addresses]"
88-
4. Passed through to Phase 3 (Design) where they appear in `warnings[]` as: "Low-confidence classification for [address] (classified as [role]). Verify AWS mapping is correct."
123+
**Default**: If all heuristics and LLM fail: `SECONDARY` / `configuration` with confidence 0.5. It is safer to under-classify (secondary) than over-classify (primary), because secondaries are grouped into existing clusters while primaries create new clusters.
89124

90125
## Serves[] Population
91126

@@ -95,6 +130,6 @@ For SECONDARY resources, populate `serves[]` array (list of PRIMARY resources it
95130
2. Include direct references: `field = resource_type.name.id` patterns
96131
3. Include transitive chains: if referenced resource is also SECONDARY, trace to PRIMARY
97132

98-
**Example**: `google_compute_firewall` → references `google_compute_network` (PRIMARY, network cluster anchor). The firewall is a `network_path` SECONDARY that serves the network cluster. Its `serves[]` includes the PRIMARY `google_compute_network.vpc`.
133+
**Example**: `google_compute_firewall` → references `google_compute_network` (SECONDARY) → serves `google_compute_instance.web` (PRIMARY)
99134

100-
**Serves array**: Points to the PRIMARY resources this SECONDARY supports. For `network_path` secondaries, this is the `google_compute_network` PRIMARY that anchors the network cluster (see clustering-algorithm.md Rule 1).
135+
**Serves array**: Points back to PRIMARY workloads affected by this firewall rule. Trace through SECONDARY resources until a PRIMARY is reached.

plugins/migration-to-aws/skills/gcp-to-aws/references/clustering/terraform/clustering-algorithm.md

Lines changed: 55 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ All resources with fields:
1717
**IF** `google_compute_network` resource exists:
1818

1919
- Group: `google_compute_network` + ALL network_path secondaries (subnetworks, firewalls, routers)
20-
- Cluster ID: `network_vpc_{gcp_region}_{sequence}` (e.g., `network_vpc_us-central1_001`)
20+
- Cluster ID: `networking_vpc_{gcp_region}_001` (e.g., `networking_vpc_us-central1_001`)
2121
- **Reasoning**: Network is shared infrastructure; groups all config together
2222

2323
**Output**: 1 cluster (or 0 if no networks found)
@@ -70,7 +70,7 @@ All resources with fields:
7070

7171
**Output**: ONE cluster per resource type (not per resource)
7272

73-
**Reasoning**: Identical workloads of the same GCP service type migrate together, share operational characteristics, and should be managed as a unit.
73+
**Reasoning**: Identical workloads of the same GCP service type migrate together, share operational characteristics, and are managed as a unit.
7474

7575
**Mark all resources of this type as clustered; remove from unassigned pool.**
7676

@@ -89,16 +89,15 @@ All resources with fields:
8989

9090
### Rule 4: Merge on Dependencies
9191

92-
**IF** two clusters have bidirectional or data_dependency edges between their PRIMARY resources:
92+
**IF** two clusters have **bidirectional** `data_dependency` edges between their PRIMARY resources (A→B AND B→A):
9393

94-
- **AND** they form a single logical deployment unit (determined by: shared infrastructure, sequential deploy, business logic)
9594
- **THEN** merge clusters
9695

9796
**Action**: Combine into one cluster; update ID to reflect both (e.g., `web-api_us-central1_001`)
9897

99-
**Reasoning**: Some workloads must deploy together (e.g., two Cloud Runs sharing database)
98+
**Reasoning**: Bidirectional data dependencies indicate a tightly coupled deployment unit that must migrate together.
10099

101-
**Heuristic**: Merge if one PRIMARY depends on another's output (e.g., Function → Database). Do NOT merge independent workloads.
100+
**Do NOT merge** when edges are unidirectional (A→B only). Unidirectional dependencies are captured in `dependencies[]` instead.
102101

103102
### Rule 5: Skip API Services
104103

@@ -115,7 +114,7 @@ All resources with fields:
115114
Apply consistent cluster naming:
116115

117116
- **Format**: `{service_category}_{service_type}_{gcp_region}_{sequence}`
118-
- **service_category**: One of: `compute`, `database`, `storage`, `network`, `messaging`, `analytics`, `security`
117+
- **service_category**: One of: `compute`, `database`, `storage`, `networking`, `messaging`, `monitoring`, `analytics`, `security`
119118
- **service_type**: GCP service shortname (e.g., `cloudrun`, `sql`, `bucket`, `vpc`)
120119
- **gcp_region**: Source region (e.g., `us-central1`)
121120
- **sequence**: Zero-padded counter (e.g., `001`, `002`)
@@ -125,36 +124,76 @@ Apply consistent cluster naming:
125124
- `compute_cloudrun_us-central1_001`
126125
- `database_sql_us-west1_001`
127126
- `storage_bucket_multi-region_001`
128-
- `network_vpc_us-central1_001` (rule 1 network cluster)
127+
- `networking_vpc_us-central1_001` (rule 1 network cluster)
129128

130129
**Reasoning**: Names reflect deployment intent; deterministic for reproducibility.
131130

131+
## Post-Clustering: Populate Cluster Metadata
132+
133+
After all clusters are formed, populate these fields for each cluster:
134+
135+
### `network`
136+
137+
Identify which VPC/network the cluster's resources belong to. Trace `network_path` edges from resources in this cluster to find the `google_compute_network` they reference. Store the network cluster ID (e.g., `networking_vpc_us-central1_001`). Set to `null` if resources have no network association.
138+
139+
### `must_migrate_together`
140+
141+
Default: `true` for all clusters. Set to `false` only if the cluster contains resources that can be independently migrated without breaking dependencies (rare — most clusters are atomic).
142+
143+
### `dependencies`
144+
145+
Derive from Primary→Primary edges that cross cluster boundaries. If cluster A contains a resource with a `data_dependency` edge to a resource in cluster B, then cluster A depends on cluster B. Store as array of cluster IDs.
146+
147+
### `creation_order`
148+
149+
Build a global ordering of clusters by depth level:
150+
151+
```json
152+
"creation_order": [
153+
{ "depth": 0, "clusters": ["networking_vpc_us-central1_001"] },
154+
{ "depth": 1, "clusters": ["security_iam_us-central1_001"] },
155+
{ "depth": 2, "clusters": ["database_sql_us-central1_001", "storage_gcs_us-central1_001"] },
156+
{ "depth": 3, "clusters": ["compute_cloudrun_us-central1_001"] }
157+
]
158+
```
159+
160+
Cluster depth = minimum depth across all primary resources in the cluster. Clusters at the same depth can be migrated in parallel.
161+
132162
## Output Cluster Schema
133163

134164
Each cluster includes:
135165

136166
```json
137167
{
138168
"cluster_id": "compute_cloudrun_us-central1_001",
139-
"name": "Cloud Run Application",
140-
"type": "compute",
141-
"description": "Primary: cloud_run_service.app, Secondary: service_account, iam_policy",
142169
"gcp_region": "us-central1",
143170
"primary_resources": ["google_cloud_run_service.app"],
144171
"secondary_resources": ["google_service_account.app_runner"],
145-
"network": "network_vpc_us-central1_001",
172+
"network": "networking_vpc_us-central1_001",
146173
"creation_order_depth": 2,
147174
"must_migrate_together": true,
148-
"dependencies": [],
149-
"edges": [{ "from": "...", "to": "...", "relationship_type": "..." }]
175+
"dependencies": ["database_sql_us-central1_001"],
176+
"edges": [
177+
{
178+
"from": "google_cloud_run_service.app",
179+
"to": "google_sql_database_instance.db",
180+
"relationship_type": "data_dependency",
181+
"evidence": {
182+
"field_path": "template.spec.containers[0].env[].value",
183+
"reference": "DATABASE_URL"
184+
}
185+
}
186+
]
150187
}
151188
```
152189

153190
## Determinism Guarantee
154191

155-
Given same Terraform input, algorithm produces same cluster structure every run:
192+
Given the same classified resource inputs, the clustering algorithm produces the same cluster structure every run:
156193

157194
1. Rules applied in fixed order
158195
2. Sequence counters increment deterministically
159196
3. Naming reflects source state, not random IDs
160-
4. Deterministic for Priority 1 and Priority 2 resources. Priority 3 (LLM inference fallback in classification-rules.md and typed-edges-strategy.md) may produce non-deterministic results for unknown resource types
197+
4. All clustering heuristics are deterministic (no LLM-based decisions within the clustering algorithm itself)
198+
199+
**Note:** Resource classification (see `classification-rules.md`) may use LLM inference as a fallback for resource types not in the hardcoded tables. If LLM-classified resources enter the pipeline, overall reproducibility depends on the LLM producing consistent classifications.

0 commit comments

Comments
 (0)