Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ Only these extensions are permitted inside skill directories (enforced by `skill
5. Add evaluation tests (`.skilleval.yaml` and `evals/` directory).
6. Test the skill with DevOps Agent before submitting.
7. Update the root `README.md` skills table with the new skill's name, agent types, author, and docs link.
8. Update the `llms.txt` file at the repo root — add the new skill to the "Available Skills" section following the existing format: `- [Skill Name](skills/<name>/SKILL.md): One-line description`.
9. If the skill requires IAM permissions beyond the `AIDevOpsAgentAccessPolicy` managed policy, add a new parameter, condition, and inline policy resource to `cloudformation/devops-agent-skill-policies.yaml`, and update the `SkillPolicySummary` output.

## Zipping for Upload

Expand Down
1 change: 1 addition & 0 deletions .kiro/steering/project-conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ Only these extensions are permitted inside skill directories (enforced by `skill
6. Test the skill with DevOps Agent before submitting.
7. Update the root `README.md` skills table with the new skill's name, description, agent types, author, and docs link.
8. Update the `llms.txt` file at the repo root — add the new skill to the "Available Skills" section following the existing format: `- [Skill Name](skills/<name>/SKILL.md): One-line description`.
9. If the skill requires IAM permissions beyond the `AIDevOpsAgentAccessPolicy` managed policy, add a new parameter, condition, and inline policy resource to `cloudformation/devops-agent-skill-policies.yaml`, and update the `SkillPolicySummary` output.

## Maintaining llms.txt

Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ Skills enable DevOps Agent to:
| [skip-scheduled-maintenance](skills/skip-scheduled-maintenance/) | **Sample skill** demonstrating how to skip low-priority incidents during a scheduled maintenance window for the Incident Triage agent type | Incident Triage | [dgorin6](https://github.com/dgorin6) | [README](skills/skip-scheduled-maintenance/README.md) |
| [enrich-with-aws-security-agent](skills/enrich-with-aws-security-agent/) | Queries AWS Security Agent CloudWatch logs to retrieve code-level security findings (file, line number, vulnerability type) during incident investigations with potential security root causes | Chat tasks, Incident RCA | [yakiratz-aws](https://github.com/yakiratz-aws) | [README](skills/enrich-with-aws-security-agent/README.md) |
| [investigation-cost-guardrail](skills/investigation-cost-guardrail/) | Estimates the AWS API cost of an incident investigation before any query runs, shows a per-step cost plan, and cancels if the estimate exceeds a configurable threshold | Incident RCA | [inesttia](https://github.com/inesttia) | [README](skills/investigation-cost-guardrail/README.md) |
| [service-quota-check](skills/service-quota-check/) | Checks AWS service quota utilization during investigations and before provisioning resources, flags quotas at 85%+ utilization, and requests increases via the Service Quotas API or recommends support cases | Chat tasks, Incident RCA | [yuriypr](https://github.com/yuriypr) | [README](skills/service-quota-check/README.md) |

## Getting Started

Expand Down
40 changes: 40 additions & 0 deletions cloudformation/devops-agent-skill-policies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Metadata:
- EnableEnrichWithSecurityAgent
- EnableCrmInvestigationGuidelines
- EnableSkipScheduledMaintenance
- EnableServiceQuotaCheck
- Label:
default: Optional Resource Scoping
Parameters:
Expand Down Expand Up @@ -90,12 +91,19 @@ Parameters:
AllowedValues: ['true', 'false']
Default: 'true'

EnableServiceQuotaCheck:
Type: String
Description: Service Quota Check skill (Service Quotas + CloudWatch).
AllowedValues: ['true', 'false']
Default: 'true'

Conditions:
CreateNewRole: !Equals [!Ref ExistingRoleName, '']
SkillAwsHealthEvents: !Equals [!Ref EnableAwsHealthEvents, 'true']
SkillSupportCases: !Equals [!Ref EnableSupportCases, 'true']
SkillRdsOperationReview: !Equals [!Ref EnableRdsOperationReview, 'true']
SkillInvestigationCostGuardrail: !Equals [!Ref EnableInvestigationCostGuardrail, 'true']
SkillServiceQuotaCheck: !Equals [!Ref EnableServiceQuotaCheck, 'true']
HasRegionRestriction: !Not [!Equals [!Join ['', !Ref AllowedRegions], '']]

Resources:
Expand Down Expand Up @@ -213,6 +221,37 @@ Resources:
- pricing:GetProducts
Resource: '*'

# service-quota-check: adds servicequotas:* and cloudwatch:GetMetricData/GetMetricStatistics
PolicyServiceQuotaCheck:
Type: AWS::IAM::Policy
Condition: SkillServiceQuotaCheck
Properties:
PolicyName: DevOpsAgentSkill-ServiceQuotaCheck
Roles:
- !If [CreateNewRole, !Ref DevOpsAgentRole, !Ref ExistingRoleName]
PolicyDocument:
Version: '2012-10-17'
Statement:
- Sid: ServiceQuotasReadAndRequest
Effect: Allow
Action:
- servicequotas:ListServices
- servicequotas:ListServiceQuotas
- servicequotas:GetServiceQuota
- servicequotas:GetAWSDefaultServiceQuota
- servicequotas:ListRequestedServiceQuotaChangeHistory
- servicequotas:ListRequestedServiceQuotaChangeHistoryByQuota
- servicequotas:GetRequestedServiceQuotaChange
- servicequotas:RequestServiceQuotaIncrease
- servicequotas:CreateSupportCase
Resource: '*'
- Sid: CloudWatchUsageMetrics
Effect: Allow
Action:
- cloudwatch:GetMetricData
- cloudwatch:GetMetricStatistics
Resource: '*'

# Optional: restrict agent to specific regions
PolicyRegionalRestriction:
Type: AWS::IAM::Policy
Expand Down Expand Up @@ -257,6 +296,7 @@ Outputs:
- support-cases: ${EnableSupportCases} (support:DescribeCommunications)
- rds-operation-review: ${EnableRdsOperationReview} (rds:DownloadDBLogFilePortion, logs:GetLogEvents)
- investigation-cost-guardrail: ${EnableInvestigationCostGuardrail} (pricing:GetProducts)
- service-quota-check: ${EnableServiceQuotaCheck} (servicequotas:*, cloudwatch:GetMetricData/GetMetricStatistics)
Skills covered by AIDevOpsAgentAccessPolicy (no extra policy needed):
- eks-operation-review, enrich-with-aws-security-agent, crm-production-investigation-guidelines
No IAM required:
Expand Down
12 changes: 12 additions & 0 deletions custom-agents/service-quotas-monitor/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Changelog

## 1.0.0

- Initial version
- System prompt with Goal/Approach/Constraints/Output structure
- Multi-region quota discovery and utilization assessment
- Automatic quota increase requests for adjustable quotas at 85%+ utilization
- Support case creation fallback for non-adjustable quotas
- Recommendation creation for items requiring manual user follow-up
- Notification integration for flagged quotas
- Deduplication of recommendations for the same quota/region
66 changes: 66 additions & 0 deletions custom-agents/service-quotas-monitor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Service Quotas Monitor — Custom Agent

## Purpose

This custom agent proactively monitors AWS service quotas across all active regions, identifies quotas approaching their limits (85%+ utilization), and takes automated action — requesting quota increases via the Service Quotas API or escalating through support cases when programmatic increases are not possible.

## Key Capabilities

- Discovers all enabled regions and checks quotas across the entire account footprint
- Evaluates utilization for every service quota with available usage data
- Automatically requests quota increases for adjustable quotas at 85%+ utilization
- Creates support cases or recommendations when automatic increases are not possible
- Sends notifications via integrated communication tools when quotas are flagged
- Deduplicates recommendations to avoid alert fatigue

## Prerequisites

- An AWS DevOps Agent space
- IAM permissions for Service Quotas:
- `servicequotas:ListServices`
- `servicequotas:ListServiceQuotas`
- `servicequotas:GetServiceQuota`
- `servicequotas:RequestServiceQuotaIncrease`
- `servicequotas:CreateSupportCase`
- IAM permissions for EC2 region discovery: `ec2:DescribeRegions`
- (Optional) AWS Support API access for creating support cases: `support:CreateCase`
- (Optional) The [service-quota-check skill](../../skills/service-quota-check/) uploaded to your Agent Space for enhanced domain knowledge

## Creating the Agent

1. In the DevOps Agent web app, go to the "Agents" menu (on the bottom left pane)
2. Click "Create agent" (on the right side), then on the new menu that popped up, click "Form" (the left-most option)
3. In the "Name" field, use "service-quotas-monitor"
4. Copy the content of the "SYSTEM_PROMPT.md" file from this directory, and paste it into the "System prompt" field in the custom agent creation form
5. (Optional) In the "Skills" drop-down list, select the "service-quota-check" skill if available, and click "Create agent"
6. Now we need to add the `use_aws` tool — in the new custom agent's window, click "Edit"
7. In the new popped up window, select "Chat". A new chat will start on the left side. Wait for DevOps Agent to finish thinking, and it'll ask you what would you like to change
8. Type "Add the use_aws tool to this custom agent". Once the chat is finished, verify in the custom agent's page that `use_aws` is shown under "Tools" for this custom agent

## Executing the Agent

This agent is designed to run on a recurring schedule (e.g., daily or weekly) to catch quotas approaching their limits before they cause disruptions. You can also run it on-demand.

### Scheduled Execution (Recommended)

Follow the [Executing custom agents guide](https://docs.aws.amazon.com/devopsagent/latest/userguide/custom-agents-executing-custom-agents.html) to set up a recurring schedule. A daily run is recommended for production accounts with active scaling.

### On-Demand Execution

Run from the custom agent page or via chat. You can provide custom prompts:

- "Check quotas only in us-east-1 and eu-west-1"
- "Check only EC2 and VPC quotas"
- "Report quotas above 70% utilization instead of 85%"

## Output

The agent produces:
- **Task journal entry** — a text summary of all findings and actions taken
- **Recommendations** — for any quotas requiring manual user intervention
- **Notifications** — sent via integrated communication tools (e.g., Slack) if quotas are flagged

## Related

- [service-quota-check skill](../../skills/service-quota-check/) — the domain knowledge skill for quota checking methodology
- [AWS DevOps Agent custom agents documentation](https://docs.aws.amazon.com/devopsagent/latest/userguide/working-with-devops-agent-custom-agents-index.html)
66 changes: 66 additions & 0 deletions custom-agents/service-quotas-monitor/SYSTEM_PROMPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
You are a Service Quotas monitoring agent that proactively identifies AWS service quotas approaching their limits and takes action to prevent service disruptions.

## Goal

Check all AWS service quotas across all active regions, identify any with utilization at 85% or above, and take appropriate action: request quota increases automatically when possible, or escalate when manual intervention is required.

## Approach

1. **Discover active regions** — Call `use_aws` with EC2 `describe_regions` to get all enabled regions for the account.

2. **List all services with quotas** — For each region, call Service Quotas `list_services` to get all services that have quotas.

3. **Check quota utilization** — For each service in each region:
- Call `list_service_quotas` to get all quotas for the service
- For each quota, compare the current utilization value against the quota value
- Flag any quota where utilization is **85% or higher**

4. **Take action on flagged quotas** — For each quota at or above 85% utilization:

a. **If the quota is adjustable** (`Adjustable: true`):
- Calculate the new requested value: current quota value × 1.5 (50% increase)
- Call `request_service_quota_increase` with the new value
- Record the outcome (success or failure)

b. **If the quota is not adjustable OR the increase request fails**:
- Attempt to create an AWS Support case using `create_case` with:
- Service code: `service-quotas`
- Category: `general-guidance`
- Severity: `normal`
- Subject: "Service Quota Increase Request: [service] - [quota name] in [region]"
- Body: Include current quota value, current utilization, and requested increase
- If support case creation fails (insufficient permissions), create a **Recommendation** for the user to manually open a support case, including all relevant details

5. **Send notification** — If any quotas were flagged (regardless of action taken):
- Check if a communication tool integration exists (Slack or similar)
- If available, send a summary notification including:
- Total quotas checked
- Number of quotas at/above 85% utilization
- For each flagged quota: service, quota name, region, utilization %, action taken, and outcome
- Any items requiring user attention (failed increases, manual support cases needed)

## Constraints

- Read-only discovery, write only for quota increase requests and support cases
- Do not request increases for quotas below 85% utilization
- Do not retry failed API calls more than once
- If a region is inaccessible, log the error and continue with other regions
- Respect API rate limits — add brief pauses between high-volume API calls if needed

## Output

Produce a text summary in the task journal containing:
- Timestamp and account ID
- Regions checked
- Total quotas evaluated
- List of quotas at/above 85% with utilization details and actions taken
- Any errors encountered
- Clear indication of items requiring user follow-up

If any quota required action but could not be resolved automatically (non-adjustable quota, failed API call, insufficient permissions for support case), create a **Recommendation** with:
- Title: "Manual quota increase needed: [service] - [quota name]"
- Details: region, current value, current utilization, suggested new value, and reason automatic action failed

Before creating a new Recommendation, check if one already exists for the same quota in the same region — update it instead of creating a duplicate.

If a communication integration exists and any quotas were flagged, send a notification summarizing the run. Do not send a notification if all quotas are healthy.
1 change: 1 addition & 0 deletions llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Skills can be used with these AWS DevOps Agent types:
- [CRM Production Investigation Guidelines Skill](skills/crm-production-investigation-guidelines/SKILL.md): Sample skill demonstrating how to write production investigation guidelines for the Incident Triage agent type, showing application-specific architecture, incident isolation rules, and structured investigation procedures
- [Skip Scheduled Maintenance Skill](skills/skip-scheduled-maintenance/SKILL.md): Sample skill demonstrating how to skip low-priority incidents during a scheduled maintenance window, filtering MEDIUM and LOW severity alarms while preserving escalation for HIGH and CRITICAL incidents
- [Enrich with AWS Security Agent Skill](skills/enrich-with-aws-security-agent/SKILL.md): Queries AWS Security Agent CloudWatch logs to retrieve code-level security findings (file, line number, vulnerability type) during incident investigations with potential security root causes
- [Service Quota Check Skill](skills/service-quota-check/SKILL.md): Checks AWS service quota utilization during investigations and before provisioning resources, flags quotas at 85%+ utilization, and requests increases via the Service Quotas API or recommends support cases

## Key Concepts

Expand Down
3 changes: 3 additions & 0 deletions skills/service-quota-check/.skilleval.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
audit:
ignore:
- STR-016 # README alongside SKILL.md is intentional
12 changes: 12 additions & 0 deletions skills/service-quota-check/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Changelog

## 1.0.0

- Initial version
- Quota value retrieval via get-service-quota and list-service-quotas
- Utilization calculation using CloudWatch UsageMetric or resource counting
- Risk assessment with 85% threshold for triggering increase recommendations
- Automated quota increase request via request-service-quota-increase API
- Support case recommendation for non-adjustable quotas
- Duplicate request detection via pending request check
- Common quota codes reference for frequently checked services
Loading