Skip to content

gemini-cli-extensions/sre

Repository files navigation

SRE Extension Logo

Note: Given the recent deprecation of Gemini CLI, this Extension is also fully functional as a Plugin for agy CLI, Claude Code, and Codex.

About

The SRE Gemini CLI Extension is a dedicated toolkit comprising specialized Skills designed to augment Site Reliability Engineers (SREs). By integrating deeply with the Gemini CLI, this extension empowers SREs to investigate outages, configure MCP servers, formulate mitigations, and detect anomalies more rapidly.

See also:

Installation

For detailed installation and configuration instructions across all CLI environments, please refer to the Installation Guide (INSTALL.md).

If you have just (a modern make clone) installed, you can quickly set up the extension. If you don't have just yet, you can quickly install it via brew install just / sudo apt-get install just (or see casey/just for more options).

Once installed:

# Google Antigravity CLI (agy)
just install-agy
# Google Gemini CLI (deprecated)
just install-gemini
# Claude Code
just install-claude

Available Skills

🛠️ Core SRE Skills

  • investigation-entrypoint: Primary entrypoint for investigating production outages, orchestrating SRE response, and mitigating incidents. Start here when an incident occurs!
  • gcp-playbooks: Follows established SRE playbooks for GCP/GKE investigations, including infrastructure discovery and common mitigation steps.
  • gcp-mcp-setup: Automates enabling services, Google Managed MCP (OneMCP) servers, generating API keys, and configuring ~/.gemini/settings.json.
  • gcp-slo-management: Discover Monitoring Services, list existing SLOs, or create new SLOs (Availability/Latency) via the REST API.
  • postmortem-generator: Creates a generated PostMortem given enough context about a resolved incident/outage.

☁️ Cloud Capabilities

  • cloud-build-investigation: Expert-level SRE skill for Google Cloud Build (GCB) and Cloud Run investigations. Correlates git commits with build failures and analyzes logs.
  • cloud-logging: Skill for interacting with and analyzing Google Cloud Logging and Error Reporting. Processes large JSON logs or converts them to Apache format.
  • cloud-monitoring: Interacts with Google Cloud Monitoring via APIs to avoid large context bloat. Exports time-series data and helps setup SLOs.

📊 Detection, Graphs & Mitigations

  • generic-mitigations: Generic Mitigations high-level classification logic and actuation plan.
  • monitoring-graphs: Generates high-quality, annotated incident graphs for post-mortems using Python to visualize outages and error rates (nice graphs visible here).
  • anomaly-detection: Detects anomalies in time-series data from various sources (Isolation Forest, KNN, Z-score).
  • data-ingestion: Fetches and parses time-series data from various sources for downstream analysis.

Compatibility & Harness Support

Capability Gemini CLI Antigravity (agy) Claude Code Codex
Type Extension Plugin Plugin Plugin
Install 🟢
MCP Setup 🟢 🟢
SRE Skills 🟢 🟢
GKE Investigation 🟢 🟢

Legend: ✅ Works (Tested) | 🟢 Works (Untested) | 🔴 Doesn't Work (Red Flag)

Quickstart

  1. Install this extension by following the instructions in INSTALL.md.
  2. Only for the first time, use gcp-setup and gcp-mcp-setup skill to ensure your GCP project and MCP servers are set up correctly:
    $ agy
    /gcp-setup Setup my GCP project "foo-bar-123"
    with email `jane-doe-sre@credible-company.com`.
    [..]
    /gcp-mcp-setup Also set up MCP access to Cloud Logging,
    Cloud Monitoring, GKE and Documentation (Developer Knowledge). Skip
    BQ and Cloud Run for this time.
  3. Invoke the entrypoint skill with your incident request. For example:
    $ agy
    /investigation-entrypoint Use investigation entrypoint skill
    with this new incident: GKE cluster with frontend 1.2.3.4 is reported 
    down by numerous customers, please investigate.
  4. The agent will take it from there—fetching context, querying metrics, and formulating mitigations.

For detailed instructions on setup and usage, please refer to the User Manual.

Contributing

Check CONTRIBUTING.md.

Feedback

For feedback, please report bugs and feature requests in the issue tracker. Any other intelligible feedback should be sent to this form: SRE Extension Survey

Thanks

Program Lead: Riccardo

Co-authors and contributors:

About

SRE Extension to provide Site Reliability Engineering tools for CLI harness investigations (agy, Gemini CLI, claude, ..) on Google Cloud et al.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors