Skip to content
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
e3cf4c6
feat(aws-observability): Add AWS Observability plugin
theagenticguy Feb 26, 2026
5e77452
Merge branch 'main' into aws-observability
krokoko Feb 27, 2026
a4cf8d0
Merge branch 'main' into aws-observability
krokoko Mar 3, 2026
0f37872
fix: address PR review feedback for aws-observability plugin
theagenticguy Mar 5, 2026
cee0518
fix: use awsknowledge HTTP MCP server instead of local docs server
theagenticguy Mar 5, 2026
4fdcef6
Merge branch 'main' into aws-observability
theagenticguy Mar 5, 2026
5d73b18
fix: address round 3 Copilot review feedback
theagenticguy Mar 5, 2026
00016e8
refactor: slim SKILL.md to be agent-focused, not README-like
theagenticguy Mar 5, 2026
4515dd5
Merge branch 'main' into aws-observability
krokoko Mar 6, 2026
32af043
Merge branch 'main' into aws-observability
krokoko Mar 9, 2026
04931e6
Merge branch 'main' into aws-observability
krokoko Mar 10, 2026
3583878
Merge branch 'main' into aws-observability
krokoko Mar 11, 2026
66d4543
Merge branch 'main' into aws-observability
krokoko Mar 18, 2026
9b2d41f
Merge branch 'main' into aws-observability
theagenticguy Mar 19, 2026
23b0778
fix(aws-observability): address PR review - split oversized refs, fix…
theagenticguy Mar 19, 2026
439775e
refactor(aws-observability): rename plugin to observability-on-aws, f…
theagenticguy Mar 19, 2026
0ae372f
fix(observability-on-aws): revert tool name change - cost-explorer is…
theagenticguy Mar 19, 2026
d914cce
Merge branch 'main' into aws-observability
theagenticguy Mar 19, 2026
d456452
chore(observability-on-aws): add billing, cost-management, finops key…
theagenticguy Mar 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,28 @@
"tags": ["aws", "location", "maps", "geospatial"],
"version": "1.0.0"
},
{
"category": "observability",
"description": "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis.",
"keywords": [
"aws",
"observability",
"cloudwatch",
"monitoring",
"logs",
"metrics",
"alarms",
"application-signals",
"apm",
"cloudtrail",
"security",
"tracing"
],
Comment on lines +62 to +81
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This marketplace entry description omits Billing/Cost Management, but the plugin ships with the billing-cost-management MCP server and the skill docs describe cost analysis workflows. Updating the marketplace description/keywords would make the listing accurately reflect the plugin's capabilities.

Copilot uses AI. Check for mistakes.
"name": "aws-observability",
"source": "./plugins/aws-observability",
"tags": ["aws", "observability", "monitoring", "cloudwatch"],
"version": "1.0.0"
},
{
"category": "migration",
"description": "This no-cost tool assesses your current cloud provider's usage, geography, and billing data to estimate and compare AWS services and pricing, and recommends migration or continued use of your current provider. AWS pricing is based on current published pricing and may vary over time. The tool may generate a .migration folder containing comparison and migration execution data, which you may delete upon completion or use to migrate to AWS.",
Expand Down
25 changes: 25 additions & 0 deletions plugins/aws-observability/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"author": {
"name": "Amazon Web Services"
},
"description": "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis for complete monitoring, troubleshooting, and optimization.",
"homepage": "https://github.com/awslabs/agent-plugins",
"keywords": [
"aws",
"observability",
"cloudwatch",
"monitoring",
"logs",
"metrics",
"alarms",
"application-signals",
"apm",
"cloudtrail",
"security",
"tracing"
],
"license": "Apache-2.0",
"name": "aws-observability",
"repository": "https://github.com/awslabs/agent-plugins",
"version": "1.0.0"
}
52 changes: 52 additions & 0 deletions plugins/aws-observability/.mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"mcpServers": {
"awslabs.cloudwatch-mcp-server": {
"command": "uvx",
"args": [
"awslabs.cloudwatch-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awslabs.cloudwatch-applicationsignals-mcp-server": {
"command": "uvx",
"args": [
"awslabs.cloudwatch-applicationsignals-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awslabs.cloudtrail-mcp-server": {
"command": "uvx",
"args": [
"awslabs.cloudtrail-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awslabs.billing-cost-management-mcp-server": {
"command": "uvx",
"args": [
"awslabs.billing-cost-management-mcp-server@latest"
],
"env": {
"AWS_PROFILE": "default",
"AWS_REGION": "us-east-1",
"FASTMCP_LOG_LEVEL": "ERROR"
}
},
"awsknowledge": {
"type": "http",
"url": "https://knowledge-mcp.global.api.aws"
}
Comment thread
theagenticguy marked this conversation as resolved.
Outdated
}
}
88 changes: 88 additions & 0 deletions plugins/aws-observability/skills/aws-observability/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
name: aws-observability
description: "Comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, Billing & Cost Management, and automated codebase observability gap analysis. Triggers on phrases like: CloudWatch logs, metrics, alarms, monitoring, observability, application signals, APM, distributed tracing, performance, latency, errors, troubleshooting, root cause analysis, security audit, CloudTrail, log analysis, alerting, SLO, incident response, observability gaps, missing instrumentation, AWS costs, billing, cost anomaly."
---

# AWS Observability

Requires AWS CLI credentials. All stdio MCP servers use `AWS_PROFILE` and `AWS_REGION` from their env config (defaults: `default` profile, `us-east-1`).

## Capabilities

| Capability | MCP Server | Use When |
| --------------------------- | -------------------------------------------------- | -------------------------------------------------------- |
| CloudWatch Logs | `awslabs.cloudwatch-mcp-server` | Log queries, pattern detection, anomaly analysis |
| Metrics & Alarms | `awslabs.cloudwatch-mcp-server` | Metric data, alarm recommendations, trend analysis |
| Application Signals (APM) | `awslabs.cloudwatch-applicationsignals-mcp-server` | Service health, SLOs, distributed tracing, error budgets |
| CloudTrail Security | `awslabs.cloudtrail-mcp-server` | IAM changes, resource deletions, compliance audits |
| Billing & Cost Management | `awslabs.billing-cost-management-mcp-server` | Cost analysis, forecasting, Compute Optimizer, budgets |
| AWS Documentation | `awsknowledge` (HTTP) | Troubleshooting, best practices, API references |
| Codebase Observability Gaps | _(file analysis, no MCP)_ | Identify missing logging, metrics, tracing in code |

## Workflow Decision Tree

**User reports an incident or error?**
-> Load [Incident Response](references/incident-response.md). Start with `audit_services` wildcard, then correlate alarms + logs + traces + CloudTrail changes.

**User asks about logs or wants to query logs?**
-> Load [Log Analysis](references/log-analysis.md). Use `execute_log_insights_query`. Always include `| limit` in queries.

**User wants to set up or tune alarms?**
-> Load [Alerting Setup](references/alerting-setup.md). Use `get_recommended_metric_alarms` for best-practice thresholds.

**User asks about service performance, latency, or SLOs?**
-> Load [Performance Monitoring](references/performance-monitoring.md). Start with `audit_services`, then `search_transaction_spans` for 100% trace visibility.

**User needs security audit or compliance review?**
-> Load [Security Auditing](references/security-auditing.md). Follow data source priority: CloudTrail Lake > CloudWatch Logs > Lookup Events API.

**User wants to assess codebase observability?**
-> Load [Observability Gap Analysis](references/observability-gap-analysis.md). Analyze logging, metrics, tracing, error handling, health checks.

**User setting up Application Signals for the first time?**
-> Load [Application Signals Setup](references/application-signals-setup.md). Start with `get_enablement_guide`.

**CloudTrail data source priority reference** (loaded by security-auditing.md, not directly):
-> [CloudTrail Data Source Selection](references/cloudtrail-data-source-selection.md)
Comment thread
MichaelWalker-git marked this conversation as resolved.
Outdated

## Essential Log Query Patterns

### Error Search

```
fields @timestamp, @message, @logStream, level
| filter level = "ERROR"
| sort @timestamp desc
| limit 100
```

### Performance Analysis

```
stats count() as requestCount,
avg(duration) as avgDuration,
pct(duration, 95) as p95Duration,
pct(duration, 99) as p99Duration
by endpoint
| filter requestCount > 10
| sort p95Duration desc
| limit 100
```

### Error Rate Over Time

```
stats count() as total,
sum(statusCode >= 500) as errors,
(sum(statusCode >= 500) / count()) * 100 as errorRate
by bin(5m) as timeWindow
| sort timeWindow
```

## Key Tool Entry Points

- **Application Signals**: Start with `audit_services` using `[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}]` for wildcard discovery
- **Logs**: Use `describe_log_groups` to discover groups, then `execute_log_insights_query`
- **Metrics**: Use Sum for count metrics, Average for utilization, percentiles for latency
- **CloudTrail**: Check Lake first (`list_event_data_stores`), fall back to CloudWatch Logs, then `lookup_events`
- **Costs**: Use `cost-explorer` tool for spend analysis, `compute-optimizer` for right-sizing
Loading