[INS-246] Add Google Gemini API key detector#4649
Conversation
| PhraseAccessToken = 1037; | ||
| Photoroom = 1038; | ||
| JWT = 1039; | ||
| GoogleGemini = 1040; |
There was a problem hiding this comment.
GoogleGemini is already quite specific, but are there different types of credentials available for Gemini (for example, API keys vs tokens)? If so, we might consider using a more explicit detector type like GoogleGeminiApiKey or GoogleGeminiToken instead of the generic GoogleGemini. Just a suggestion, not a blocker at all.
There was a problem hiding this comment.
Thanks for this. Yes, Google Gemini does have other ways of authenticating. I'll make the change.
| _, _ = io.Copy(io.Discard, res.Body) | ||
| }() | ||
|
|
||
| switch res.StatusCode { |
There was a problem hiding this comment.
It's common to receive a 403 response in a few situations:
- the key is not scoped to Gemini, but still valid for other google services
- the key is "restricted" either via IP address, referer, etc.
Might make sense add a case for 403s just so it's not throwing an error, when those cases are normal.
There was a problem hiding this comment.
Thanks for this!
You are very much right. I just confirmed this by generating a Google Cloud API Key. I also realized it's not just about adding this case. Getting a 403 means that the key is live, it just does not have the Generative Language API scope enabled.
Now I'm wondering if it makes sense to create a GoogleGeminiAPIKey detector, or simply a GoogleAPIKey detector. What do you think?
There was a problem hiding this comment.
I recommend keeping the original intent here and authoring a detector for only Gemini. If any other Google API services surface that are similarly risky, we can adapt then.
There was a problem hiding this comment.
I agree with the approach of authoring a detector only for Gemini. My only concern here is that for 403 we'll mark the credential as inactive/rotated, but that's misleading, because the credential will be live, just not scoped to Gemini.
There was a problem hiding this comment.
I would endorse that we mark an API key "LIVE" if we're certain that a 403 Forbidden response implies the key is valid and capable of accessing Google services beyond just Gemini.
There was a problem hiding this comment.
Agreed.
I've gone ahead and made the changes to make this a GoogleCloudAPIKey detector. @joeleonjr let me know if you have concerns and we can discuss.
There was a problem hiding this comment.
After internal discussion, it has been decided to make this a GoogleGeminiAPIKey detector as originally intended. For other Google Cloud API keys that have gemini disabled, we will mark them as disabled but set "active-google-key": "true" in the ExtraData field.
|
Is it possible to merge this in the next week? |
As a safety precaution for EE, we're only merging one detector per week. I'll ask the team if we can merge this one next. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| return false, nil, nil | ||
| default: | ||
| return false, nil, fmt.Errorf("unexpected status code: %d", res.StatusCode) | ||
| } |
There was a problem hiding this comment.
Missing HTTP 401 handler for invalid API keys
Medium Severity
The verify function's status code switch handles 200, 403, and 400 but not 401 (Unauthorized). Google's Generative Language API returns 401 for invalid, missing, or automatically blocked API keys (e.g., keys leaked in public repos). These cases fall through to the default branch, producing a verification error instead of correctly returning false, nil, nil. Many other detectors in the codebase (e.g., browshot, dropbox, elevenlabs) handle both StatusBadRequest and StatusUnauthorized together for invalid keys.
There was a problem hiding this comment.
I have verified with multiple tests that the API never returns 401. It is either 400 for expired, invalid and revoked keys, or 403 for keys without the generative AI permission.
* add google gemini api key detector * change detector name to google cloud api key, mark as verified if 403 is returned * add build tags for integration test * changes in defaults.go * Revert "changes in defaults.go" This reverts commit 12e7b6f. * Revert "change detector name to google cloud api key, mark as verified if 403 is returned" This reverts commit e46bb29. * revert google cloud api changes, change keyword to gemini, add extra field active_google_key * use aizasy as keyword instead of gemini * close response body after draining * remove \b from regex to support keys that end with - * add \b to the beginning


Description:
This PR closes #4623 by introducing the Google Gemini API Key detector
All Google Gemini API Keys follow a strict pattern: A prefix
AIzaSyfollowed by33characters.Regex for the detector:
\b(AIzaSy[A-Za-z0-9_-]{33})Note: A trailing word boundary is not added because the key can end with a hyphen, and having a trailing \b will not match such a key.
I have verified the regex by generating 5-10 keys.
For verification, we're using Gemini's
GET /v1/modelsendpoint. This is a safe endpoint that only lists the available models, which means no costs will be incurred. For non-gemini keys, this endpoint will return403, which indicates that the key is active, just not scoped to Gemini. In this case, we will mark the key as unverified, but set"active_google_key": "true"in theExtraData, so that the user can distinguish.Added unit and integration tests as well.
Update: I've tested this with our new method of detector testing. I ran it against both files (the 30gb and 3gb one) and this detector showed up in both of them (which is expected because the Google Cloud API key is a common one). However, it was not in the top 5, so we should be good. See screenshots:


Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Introduces a new network-verifying detector and a new protobuf enum value; risk is mainly around false positives/verification behavior and downstream compatibility with the updated detector type list.
Overview
Adds a new
googlegeminidetector that finds API keys matchingAIzaSy…via regex prefiltered by theaizasykeyword, and optionally verifies keys by calling Gemini’sGET /v1/modelsendpoint.Verification distinguishes Gemini-enabled keys (
200) from active but not Gemini-scoped Google API keys (403, reported withExtraData["active_google_key"]=true), while invalid keys (400) are treated as unverified. The detector is wired into default scanning (defaults.go) and introduces a newDetectorType_GoogleGeminiAPIKeyenum value inproto/detectors.proto(and regenerateddetectors.pb.go), with accompanying unit + integration tests and a benchmark.Written by Cursor Bugbot for commit a0989d8. This will update automatically on new commits. Configure here.