-
-
Notifications
You must be signed in to change notification settings - Fork 425
Add AI-powered duplicate issue detection system #1613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| messages: | ||
| - role: system | ||
| content: You are a duplicate issue detector. You have access to GitHub MCP tools to read and search issues. | ||
| - role: user | ||
| content: | | ||
| Find up to 3 likely duplicate issues for issue #{{issue_number}} in the acacode/swagger-typescript-api repository. | ||
|
|
||
| To do this, follow these steps precisely: | ||
|
|
||
| 1. Read issue #{{issue_number}} including its comments. Check if the issue (a) is closed, (b) does not need to be deduped (e.g. because it is broad product feedback without a specific solution, or positive feedback), or (c) already has a duplicates comment containing `<!-- ai-duplicate-check -->`. If so, do not proceed — return an empty duplicates array. | ||
| 2. Summarize the issue: what is the core problem, symptoms, and affected features. | ||
| 3. Search for duplicates of this issue using diverse keywords and search approaches based on the summary. Try at least 5 different search queries to maximize coverage. | ||
| 4. Filter out false positives that are likely not actually duplicates of the original issue. If there are no duplicates remaining, return an empty duplicates array. | ||
| 5. Return the remaining duplicate issue numbers (up to 3), ranked by confidence (highest first). | ||
| model: openai/gpt-4o | ||
| responseFormat: json_schema | ||
| jsonSchema: |- | ||
| { | ||
| "name": "duplicate_detection_result", | ||
| "strict": true, | ||
| "schema": { | ||
| "type": "object", | ||
| "properties": { | ||
| "duplicates": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "integer" | ||
| }, | ||
| "description": "Issue numbers of potential duplicates, ranked by confidence (highest first). Empty array if no duplicates found or issue should be skipped." | ||
| } | ||
| }, | ||
| "additionalProperties": false, | ||
| "required": ["duplicates"] | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| name: Auto-Close Duplicate Issues | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: 0 9 * * * | ||
| workflow_dispatch: | ||
|
|
||
| permissions: | ||
| contents: read | ||
| issues: write | ||
|
|
||
| jobs: | ||
| auto-close: | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 10 | ||
|
|
||
| steps: | ||
| - name: Checkout tree | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Set-up Mise | ||
| uses: jdx/mise-action@6d1e696aa24c1aa1bcc1adea0212707c71ab78a8 # v3.6.1 | ||
| with: | ||
| cache: false | ||
| - name: Run auto-close script | ||
| run: bun run scripts/auto-close-duplicates.ts | ||
smorimoto marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| env: | ||
| GITHUB_TOKEN: ${{ github.token }} | ||
| GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }} | ||
| GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| name: Backfill Duplicate Detection | ||
|
|
||
| on: | ||
| workflow_dispatch: | ||
| inputs: | ||
| days_back: | ||
| description: Number of days to look back for issues | ||
| required: false | ||
| default: "90" | ||
| dry_run: | ||
| description: Run in dry-run mode (only log, do not trigger workflows) | ||
| required: false | ||
| default: "true" | ||
| type: choice | ||
| options: | ||
| - "true" | ||
| - "false" | ||
|
|
||
| permissions: | ||
| contents: read | ||
| issues: read | ||
| actions: write | ||
|
|
||
| jobs: | ||
| backfill: | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 30 | ||
|
|
||
| steps: | ||
| - name: Checkout tree | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
| - name: Set-up Mise | ||
| uses: jdx/mise-action@6d1e696aa24c1aa1bcc1adea0212707c71ab78a8 # v3.6.1 | ||
| with: | ||
| cache: false | ||
| - name: Run backfill script | ||
| run: ./scripts/backfill-duplicate-detection.ts | ||
| env: | ||
| GITHUB_TOKEN: ${{ github.token }} | ||
| GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }} | ||
| GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }} | ||
| DAYS_BACK: ${{ github.event.inputs.days_back }} | ||
| DRY_RUN: ${{ github.event.inputs.dry_run }} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| name: Issue Dedupe | ||
|
|
||
| on: | ||
| issues: | ||
| types: | ||
| - opened | ||
| workflow_dispatch: | ||
| inputs: | ||
| issue_number: | ||
| description: Issue number to check for duplicates | ||
| required: true | ||
| type: number | ||
|
|
||
| permissions: | ||
| contents: read | ||
| issues: write | ||
| models: read | ||
|
|
||
| jobs: | ||
| dedupe: | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 10 | ||
|
|
||
| steps: | ||
| - name: Checkout tree | ||
| uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 | ||
|
|
||
| - name: AI duplicate detection | ||
| uses: actions/ai-inference@a380166897b5408b8fb7dddd148142794cb5624a # v2.0.6 | ||
| id: ai | ||
| with: | ||
| prompt-file: .github/prompts/dedupe.prompt.yml | ||
| input: | | ||
| issue_number: ${{ github.event.issue.number || inputs.issue_number }} | ||
| enable-github-mcp: true | ||
|
|
||
| - name: Post comment if duplicates found | ||
| run: | | ||
| DUPLICATES=$(echo "$AI_RESPONSE" | jq -r '.duplicates | map(tostring) | join(" ")') | ||
|
|
||
| if [ -z "$DUPLICATES" ] || [ "$DUPLICATES" = "null" ]; then | ||
| echo "No duplicates found" | ||
| exit 0 | ||
| fi | ||
|
|
||
| echo "Duplicates found: $DUPLICATES" | ||
| ./scripts/comment-on-duplicates.sh --base-issue "$ISSUE_NUMBER" --potential-duplicates $DUPLICATES | ||
| env: | ||
| AI_RESPONSE: ${{ steps.ai.outputs.response }} | ||
| GH_TOKEN: ${{ github.token }} | ||
| ISSUE_NUMBER: ${{ github.event.issue.number || inputs.issue_number }} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| #!/usr/bin/env bun | ||
|
|
||
| import { consola } from "consola"; | ||
| import type { | ||
| GitHubComment, | ||
| GitHubIssue, | ||
| GitHubReaction, | ||
| } from "./lib/github.js"; | ||
| import { | ||
| API_BASE, | ||
| fetchAllPages, | ||
| fetchGitHub, | ||
| GITHUB_REPOSITORY_NAME, | ||
| GITHUB_REPOSITORY_OWNER, | ||
| getIssueComments, | ||
| } from "./lib/github.js"; | ||
|
|
||
| const THREE_DAYS_MS = 3 * 24 * 60 * 60 * 1000; | ||
|
|
||
| async function getOpenIssuesOlderThan3Days(): Promise<GitHubIssue[]> { | ||
| const threeDaysAgo = new Date(Date.now() - THREE_DAYS_MS); | ||
| const url = `${API_BASE}/issues?state=open&per_page=100&sort=created&direction=asc`; | ||
|
|
||
| const issues = await fetchAllPages<GitHubIssue>(url); | ||
|
|
||
| return issues.filter((issue) => { | ||
| if (issue.pull_request) return false; | ||
| return new Date(issue.created_at) < threeDaysAgo; | ||
| }); | ||
| } | ||
smorimoto marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| async function getCommentReactions( | ||
| commentId: number, | ||
| ): Promise<GitHubReaction[]> { | ||
| const url = `${API_BASE}/issues/comments/${commentId}/reactions?per_page=100`; | ||
| return fetchAllPages<GitHubReaction>(url); | ||
| } | ||
|
|
||
| async function closeIssue(issueNumber: number, reason: string): Promise<void> { | ||
| await fetchGitHub(`${API_BASE}/issues/${issueNumber}/labels`, { | ||
| method: "POST", | ||
| body: JSON.stringify({ labels: ["duplicate"] }), | ||
| }); | ||
|
|
||
| await fetchGitHub(`${API_BASE}/issues/${issueNumber}`, { | ||
| method: "PATCH", | ||
| body: JSON.stringify({ | ||
| state: "closed", | ||
| state_reason: "not_planned", | ||
| }), | ||
smorimoto marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }); | ||
smorimoto marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| await fetchGitHub(`${API_BASE}/issues/${issueNumber}/comments`, { | ||
| method: "POST", | ||
| body: JSON.stringify({ | ||
| body: reason, | ||
| }), | ||
| }); | ||
| } | ||
|
|
||
| async function hasActivityAfterComment( | ||
| issue: GitHubIssue, | ||
| botCommentDate: Date, | ||
| ): Promise<boolean> { | ||
| const comments = await getIssueComments(issue.number); | ||
|
|
||
| const laterComments = comments.filter((comment) => { | ||
| if (comment.user?.login.endsWith("[bot]")) return false; | ||
| const commentDate = new Date(comment.created_at); | ||
| return commentDate > botCommentDate; | ||
| }); | ||
|
|
||
| return laterComments.length > 0; | ||
| } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Redundant API calls fetch same comments twiceLow Severity The Additional Locations (1) |
||
|
|
||
| async function hasCreatorThumbsDown( | ||
| issue: GitHubIssue, | ||
| botComment: GitHubComment, | ||
| ): Promise<boolean> { | ||
| if (!issue.user) { | ||
| return false; | ||
| } | ||
|
|
||
| const reactions = await getCommentReactions(botComment.id); | ||
|
|
||
| return reactions.some( | ||
| (reaction) => | ||
| reaction.content === "-1" && reaction.user?.login === issue.user?.login, | ||
| ); | ||
smorimoto marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
|
|
||
| async function main(): Promise<void> { | ||
| consola.info("Starting auto-close duplicates script..."); | ||
| consola.info( | ||
| `Repository: ${GITHUB_REPOSITORY_OWNER}/${GITHUB_REPOSITORY_NAME}`, | ||
| ); | ||
|
|
||
| const issues = await getOpenIssuesOlderThan3Days(); | ||
| consola.info(`Found ${issues.length} open issues older than 3 days`); | ||
|
|
||
| let processedCount = 0; | ||
| let closedCount = 0; | ||
|
|
||
| for (const issue of issues) { | ||
| processedCount++; | ||
| consola.info(`Processing issue #${issue.number}: ${issue.title}`); | ||
|
|
||
| const comments = await getIssueComments(issue.number); | ||
|
|
||
| const botComment = comments.find( | ||
| (comment) => | ||
| comment.user?.login === "github-actions[bot]" && | ||
| comment.body.includes("<!-- ai-duplicate-check -->"), | ||
| ); | ||
|
|
||
| if (!botComment) { | ||
| consola.info(` No duplicate bot comment found, skipping`); | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)); | ||
| continue; | ||
| } | ||
|
|
||
| const botCommentDate = new Date(botComment.created_at); | ||
| const now = new Date(); | ||
| const timeSinceComment = now.getTime() - botCommentDate.getTime(); | ||
|
|
||
| if (timeSinceComment < THREE_DAYS_MS) { | ||
| consola.info(` Bot comment is less than 3 days old, skipping`); | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)); | ||
| continue; | ||
| } | ||
|
|
||
| const hasActivity = await hasActivityAfterComment(issue, botCommentDate); | ||
| if (hasActivity) { | ||
| consola.info(` Has activity after bot comment, skipping`); | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)); | ||
| continue; | ||
| } | ||
|
|
||
| const hasThumbsDown = await hasCreatorThumbsDown(issue, botComment); | ||
| if (hasThumbsDown) { | ||
| consola.info(` Creator reacted with thumbs down, skipping`); | ||
| await new Promise((resolve) => setTimeout(resolve, 1000)); | ||
| continue; | ||
| } | ||
|
|
||
| consola.info(` Closing issue #${issue.number} as duplicate`); | ||
| await closeIssue( | ||
| issue.number, | ||
| "This issue has been automatically closed as a duplicate. It was marked as a duplicate over 3 days ago with no further activity. If you believe this was closed in error, please comment and we'll re-evaluate.", | ||
| ); | ||
|
|
||
| closedCount++; | ||
|
|
||
| await new Promise((resolve) => setTimeout(resolve, 1000)); | ||
| } | ||
|
|
||
| consola.info("\n=== Summary ==="); | ||
| consola.info(`Processed issues: ${processedCount}`); | ||
| consola.info(`Closed issues: ${closedCount}`); | ||
| } | ||
|
|
||
| try { | ||
| await main(); | ||
| } catch (error) { | ||
| consola.error("Error running auto-close script:", error); | ||
| process.exit(1); | ||
| } | ||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded repository breaks portability to forks
High Severity
The AI prompt hardcodes the repository name as
acacode/swagger-typescript-api, while the workflows and scripts use dynamic repository references viagithub.repository_ownerandgithub.event.repository.name. If this workflow runs in a fork or different repository, the AI searches for duplicates in the original repo but comments are posted to the current repo with links constructed from$GITHUB_REPOSITORY. This results in issue numbers from one repo being presented as duplicates in another, with broken or misleading links.