Skip to content

Add AI-powered duplicate issue detection system#1613

Merged
smorimoto merged 1 commit intomainfrom
add-ai-duplicate-detection
Feb 9, 2026
Merged

Add AI-powered duplicate issue detection system#1613
smorimoto merged 1 commit intomainfrom
add-ai-duplicate-detection

Conversation

@smorimoto
Copy link
Collaborator

@smorimoto smorimoto commented Feb 8, 2026

Problem

Duplicate issues accumulate without any systematic detection. Maintainers spend time triaging issues that have already been reported, and reporters don't discover existing issues to contribute to.

Solution

Three workflows that form a pipeline: detect → notify → auto-close.

Detect (dedupe-issues.yml)

Runs on every new issue (and manual dispatch). Uses actions/ai-inference@v2 with enable-github-mcp so the model reads the issue and searches for duplicates directly. The prompt (dedupe.prompt.yml) instructs the model to try at least 5 search queries and return up to 3 candidate issue numbers. The response is constrained to a strict JSON Schema ({"duplicates": [int]}) — no free-form text to parse.

The workflow extracts the array with jq, then calls comment-on-duplicates.sh which validates every input (numeric format, issue existence, max 3 items) before posting. The comment includes an <!-- ai-duplicate-check --> HTML sentinel that the other two scripts use to identify bot-generated duplicate notices.

Auto-close (auto-close-duplicates.yml + auto-close-duplicates.ts)

Runs daily at 09:00 UTC. Finds open issues that have a duplicate-detection comment older than 3 days, then checks two opt-out conditions before closing:

  1. Human activity — any non-bot comment after the detection comment (comments from [bot] accounts are excluded)
  2. Author objection — a 👎 reaction on the detection comment from the issue author specifically

If neither condition is met, the script adds a duplicate label, closes the issue as not_planned, and posts an explanation comment.

Backfill (backfill-duplicate-detection.yml + backfill-duplicate-detection.ts)

Manual-dispatch-only workflow for retroactively checking existing open issues. Configurable lookback window (default 90 days) and dry-run mode (default on). Skips issues that already have a detection comment. Triggers the dedupe workflow via workflow_dispatch for each unchecked issue.

Shared API layer (scripts/lib/github.ts)

Both TypeScript scripts import from a shared module that provides:

  • Typed interfaces for the GitHub API responses (GitHubIssue, GitHubComment, GitHubReaction) with nullable user fields for deleted accounts
  • fetchAllPages() — follows Link header pagination so results aren't silently truncated at 100 items
  • fetchGitHub() — single-request helper for mutations (POST/PATCH)
  • Pull request filtering — the GitHub Issues API returns PRs mixed in with issues; both scripts filter them out via the pull_request field

Files

File Purpose
.github/prompts/dedupe.prompt.yml AI prompt with structured JSON Schema output
.github/workflows/dedupe-issues.yml Trigger: issue opened / manual dispatch
.github/workflows/auto-close-duplicates.yml Trigger: daily cron / manual dispatch
.github/workflows/backfill-duplicate-detection.yml Trigger: manual dispatch only
scripts/lib/github.ts Shared GitHub API types, auth, and paginated fetching
scripts/auto-close-duplicates.ts Closes stale duplicates with opt-out checks
scripts/backfill-duplicate-detection.ts Dispatches dedupe workflow for unchecked issues
scripts/comment-on-duplicates.sh Input-validated comment posting

Verification

  • Trigger the dedupe workflow manually on a known-duplicate issue and verify the comment appears with correct format and sentinel
  • Verify comment-on-duplicates.sh rejects missing GITHUB_REPOSITORY, non-numeric inputs, and nonexistent issue numbers
  • Check that the auto-close script skips issues with human comments after the bot comment
  • Check that the auto-close script skips issues where the author reacted 👎
  • Run the backfill script with dry_run: true and verify it logs which issues would be triggered without actually dispatching

Note

Medium Risk
Automates issue commenting/labeling/closure via scheduled workflows and GitHub API calls; misclassification or logic errors could incorrectly close/label issues, though there are opt-outs and age/activity checks.

Overview
Adds an AI-assisted issue deduplication pipeline that runs on new issues (or manually) to suggest up to 3 duplicates via actions/ai-inference and posts a standardized comment containing an <!-- ai-duplicate-check --> sentinel.

Introduces scheduled and manual automation to (a) backfill dedupe checks across recent open issues and (b) auto-close issues that were marked as duplicates and left inactive for 3 days, with opt-outs for post-comment human activity or an author 👎 reaction. Includes new Bun/TS and shell scripts plus a small shared GitHub API helper (scripts/lib/github.ts) to paginate and mutate issues (comments/labels/close).

Written by Cursor Bugbot for commit 0d6a449. This will update automatically on new commits. Configure here.

@changeset-bot
Copy link

changeset-bot bot commented Feb 8, 2026

⚠️ No Changeset found

Latest commit: 0d6a449

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9862d314b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@smorimoto smorimoto changed the title Add AI-powered duplicate issue detection system Improve AI duplicate detection system Feb 9, 2026
@smorimoto smorimoto force-pushed the add-ai-duplicate-detection branch from 6277a11 to 737c0ae Compare February 9, 2026 02:38
@smorimoto smorimoto changed the title Improve AI duplicate detection system Add AI-powered duplicate issue detection system Feb 9, 2026
Add a three-part system for detecting and managing duplicate GitHub issues:

1. Detection workflow (dedupe-issues.yml): Triggers on new issues or manual
   dispatch. Uses actions/ai-inference with GitHub MCP to find up to 3
   duplicates via structured JSON Schema output, then posts a comment via
   comment-on-duplicates.sh with a 3-day grace period.

2. Auto-close workflow (auto-close-duplicates.yml): Runs daily to close
   issues that were flagged as duplicates over 3 days ago with no human
   activity or author opt-out (👎 reaction).

3. Backfill workflow (backfill-duplicate-detection.yml): Manual dispatch
   to trigger duplicate detection on existing open issues that haven't
   been checked yet.

Shared GitHub API helpers (pagination, typed interfaces, authentication)
live in scripts/lib/github.ts. Bot detection uses an HTML comment
sentinel (<!-- ai-duplicate-check -->) instead of fragile login heuristics.
All actions are pinned to SHA hashes.
@smorimoto smorimoto force-pushed the add-ai-duplicate-detection branch from 737c0ae to 0d6a449 Compare February 9, 2026 02:48
@smorimoto smorimoto merged commit e291ccb into main Feb 9, 2026
12 checks passed
@smorimoto smorimoto deleted the add-ai-duplicate-detection branch February 9, 2026 02:49
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

echo "Error: duplicate issue must be a number, got: $dup" >&2
exit 1
fi
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing self-reference validation allows confusing comments

Medium Severity

The script validates that duplicate issue numbers are numeric and exist, but doesn't check whether any duplicate is the same as the base issue. If the AI model returns the queried issue number as a duplicate of itself (e.g., [123] for issue #123), the script would post a confusing comment saying "Found 1 possible duplicate issue: #123" pointing to itself. The prompt doesn't explicitly exclude self-references, making this scenario possible.

Fix in Cursor Fix in Web

});

return laterComments.length > 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant API calls fetch same comments twice

Low Severity

The hasActivityAfterComment function fetches issue comments via getIssueComments(issue.number), but main() already fetches the same comments at line 108 to find the bot comment. This results in duplicate API calls for each processed issue. The already-fetched comments array could be passed as a parameter to hasActivityAfterComment instead of re-fetching.

Additional Locations (1)

Fix in Cursor Fix in Web

content: You are a duplicate issue detector. You have access to GitHub MCP tools to read and search issues.
- role: user
content: |
Find up to 3 likely duplicate issues for issue #{{issue_number}} in the acacode/swagger-typescript-api repository.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded repository breaks portability to forks

High Severity

The AI prompt hardcodes the repository name as acacode/swagger-typescript-api, while the workflows and scripts use dynamic repository references via github.repository_owner and github.event.repository.name. If this workflow runs in a fork or different repository, the AI searches for duplicates in the original repo but comments are posted to the current repo with links constructed from $GITHUB_REPOSITORY. This results in issue numbers from one repo being presented as duplicates in another, with broken or misleading links.

Fix in Cursor Fix in Web

await fetchGitHub(url, {
method: "POST",
body: JSON.stringify({
ref: "main",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded branch ref breaks non-main default branches

Medium Severity

The triggerWorkflow function hardcodes ref: "main" when dispatching the dedupe workflow. If the repository uses a different default branch (such as master), the workflow dispatch API call will fail with a 422 error because the workflow file won't exist on a branch named main. The ref parameter determines which branch the workflow file is read from, so this prevents the backfill script from working in repositories that don't use main as their default branch.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant