Add AI-powered duplicate issue detection system#1613
Conversation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9862d314b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
6277a11 to
737c0ae
Compare
Add a three-part system for detecting and managing duplicate GitHub issues: 1. Detection workflow (dedupe-issues.yml): Triggers on new issues or manual dispatch. Uses actions/ai-inference with GitHub MCP to find up to 3 duplicates via structured JSON Schema output, then posts a comment via comment-on-duplicates.sh with a 3-day grace period. 2. Auto-close workflow (auto-close-duplicates.yml): Runs daily to close issues that were flagged as duplicates over 3 days ago with no human activity or author opt-out (👎 reaction). 3. Backfill workflow (backfill-duplicate-detection.yml): Manual dispatch to trigger duplicate detection on existing open issues that haven't been checked yet. Shared GitHub API helpers (pagination, typed interfaces, authentication) live in scripts/lib/github.ts. Bot detection uses an HTML comment sentinel (<!-- ai-duplicate-check -->) instead of fragile login heuristics. All actions are pinned to SHA hashes.
737c0ae to
0d6a449
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| echo "Error: duplicate issue must be a number, got: $dup" >&2 | ||
| exit 1 | ||
| fi | ||
| done |
There was a problem hiding this comment.
Missing self-reference validation allows confusing comments
Medium Severity
The script validates that duplicate issue numbers are numeric and exist, but doesn't check whether any duplicate is the same as the base issue. If the AI model returns the queried issue number as a duplicate of itself (e.g., [123] for issue #123), the script would post a confusing comment saying "Found 1 possible duplicate issue: #123" pointing to itself. The prompt doesn't explicitly exclude self-references, making this scenario possible.
| }); | ||
|
|
||
| return laterComments.length > 0; | ||
| } |
There was a problem hiding this comment.
Redundant API calls fetch same comments twice
Low Severity
The hasActivityAfterComment function fetches issue comments via getIssueComments(issue.number), but main() already fetches the same comments at line 108 to find the bot comment. This results in duplicate API calls for each processed issue. The already-fetched comments array could be passed as a parameter to hasActivityAfterComment instead of re-fetching.
Additional Locations (1)
| content: You are a duplicate issue detector. You have access to GitHub MCP tools to read and search issues. | ||
| - role: user | ||
| content: | | ||
| Find up to 3 likely duplicate issues for issue #{{issue_number}} in the acacode/swagger-typescript-api repository. |
There was a problem hiding this comment.
Hardcoded repository breaks portability to forks
High Severity
The AI prompt hardcodes the repository name as acacode/swagger-typescript-api, while the workflows and scripts use dynamic repository references via github.repository_owner and github.event.repository.name. If this workflow runs in a fork or different repository, the AI searches for duplicates in the original repo but comments are posted to the current repo with links constructed from $GITHUB_REPOSITORY. This results in issue numbers from one repo being presented as duplicates in another, with broken or misleading links.
| await fetchGitHub(url, { | ||
| method: "POST", | ||
| body: JSON.stringify({ | ||
| ref: "main", |
There was a problem hiding this comment.
Hardcoded branch ref breaks non-main default branches
Medium Severity
The triggerWorkflow function hardcodes ref: "main" when dispatching the dedupe workflow. If the repository uses a different default branch (such as master), the workflow dispatch API call will fail with a 422 error because the workflow file won't exist on a branch named main. The ref parameter determines which branch the workflow file is read from, so this prevents the backfill script from working in repositories that don't use main as their default branch.


Problem
Duplicate issues accumulate without any systematic detection. Maintainers spend time triaging issues that have already been reported, and reporters don't discover existing issues to contribute to.
Solution
Three workflows that form a pipeline: detect → notify → auto-close.
Detect (
dedupe-issues.yml)Runs on every new issue (and manual dispatch). Uses
actions/ai-inference@v2withenable-github-mcpso the model reads the issue and searches for duplicates directly. The prompt (dedupe.prompt.yml) instructs the model to try at least 5 search queries and return up to 3 candidate issue numbers. The response is constrained to a strict JSON Schema ({"duplicates": [int]}) — no free-form text to parse.The workflow extracts the array with
jq, then callscomment-on-duplicates.shwhich validates every input (numeric format, issue existence, max 3 items) before posting. The comment includes an<!-- ai-duplicate-check -->HTML sentinel that the other two scripts use to identify bot-generated duplicate notices.Auto-close (
auto-close-duplicates.yml+auto-close-duplicates.ts)Runs daily at 09:00 UTC. Finds open issues that have a duplicate-detection comment older than 3 days, then checks two opt-out conditions before closing:
[bot]accounts are excluded)If neither condition is met, the script adds a
duplicatelabel, closes the issue asnot_planned, and posts an explanation comment.Backfill (
backfill-duplicate-detection.yml+backfill-duplicate-detection.ts)Manual-dispatch-only workflow for retroactively checking existing open issues. Configurable lookback window (default 90 days) and dry-run mode (default on). Skips issues that already have a detection comment. Triggers the dedupe workflow via
workflow_dispatchfor each unchecked issue.Shared API layer (
scripts/lib/github.ts)Both TypeScript scripts import from a shared module that provides:
GitHubIssue,GitHubComment,GitHubReaction) with nullableuserfields for deleted accountsfetchAllPages()— followsLinkheader pagination so results aren't silently truncated at 100 itemsfetchGitHub()— single-request helper for mutations (POST/PATCH)pull_requestfieldFiles
.github/prompts/dedupe.prompt.yml.github/workflows/dedupe-issues.yml.github/workflows/auto-close-duplicates.yml.github/workflows/backfill-duplicate-detection.ymlscripts/lib/github.tsscripts/auto-close-duplicates.tsscripts/backfill-duplicate-detection.tsscripts/comment-on-duplicates.shVerification
comment-on-duplicates.shrejects missingGITHUB_REPOSITORY, non-numeric inputs, and nonexistent issue numbersdry_run: trueand verify it logs which issues would be triggered without actually dispatchingNote
Medium Risk
Automates issue commenting/labeling/closure via scheduled workflows and GitHub API calls; misclassification or logic errors could incorrectly close/label issues, though there are opt-outs and age/activity checks.
Overview
Adds an AI-assisted issue deduplication pipeline that runs on new issues (or manually) to suggest up to 3 duplicates via
actions/ai-inferenceand posts a standardized comment containing an<!-- ai-duplicate-check -->sentinel.Introduces scheduled and manual automation to (a) backfill dedupe checks across recent open issues and (b) auto-close issues that were marked as duplicates and left inactive for 3 days, with opt-outs for post-comment human activity or an author 👎 reaction. Includes new Bun/TS and shell scripts plus a small shared GitHub API helper (
scripts/lib/github.ts) to paginate and mutate issues (comments/labels/close).Written by Cursor Bugbot for commit 0d6a449. This will update automatically on new commits. Configure here.