Skip to content

Commit 737c0ae

Browse files
committed
Add AI-powered duplicate issue detection system
Add a three-part system for detecting and managing duplicate GitHub issues: 1. Detection workflow (dedupe-issues.yml): Triggers on new issues or manual dispatch. Uses actions/ai-inference with GitHub MCP to find up to 3 duplicates via structured JSON Schema output, then posts a comment via comment-on-duplicates.sh with a 3-day grace period. 2. Auto-close workflow (auto-close-duplicates.yml): Runs daily to close issues that were flagged as duplicates over 3 days ago with no human activity or author opt-out (👎 reaction). 3. Backfill workflow (backfill-duplicate-detection.yml): Manual dispatch to trigger duplicate detection on existing open issues that haven't been checked yet. Shared GitHub API helpers (pagination, typed interfaces, authentication) live in scripts/lib/github.ts. Bot detection uses an HTML comment sentinel (<!-- ai-duplicate-check -->) instead of fragile login heuristics. All actions are pinned to SHA hashes.
1 parent 92a2d46 commit 737c0ae

8 files changed

Lines changed: 628 additions & 0 deletions

.github/prompts/dedupe.prompt.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
messages:
2+
- role: system
3+
content: You are a duplicate issue detector. You have access to GitHub MCP tools to read and search issues.
4+
- role: user
5+
content: |
6+
Find up to 3 likely duplicate issues for issue #{{issue_number}} in the acacode/swagger-typescript-api repository.
7+
8+
To do this, follow these steps precisely:
9+
10+
1. Read issue #{{issue_number}} including its comments. Check if the issue (a) is closed, (b) does not need to be deduped (e.g. because it is broad product feedback without a specific solution, or positive feedback), or (c) already has a duplicates comment containing `<!-- ai-duplicate-check -->`. If so, do not proceed — return an empty duplicates array.
11+
2. Summarize the issue: what is the core problem, symptoms, and affected features.
12+
3. Search for duplicates of this issue using diverse keywords and search approaches based on the summary. Try at least 5 different search queries to maximize coverage.
13+
4. Filter out false positives that are likely not actually duplicates of the original issue. If there are no duplicates remaining, return an empty duplicates array.
14+
5. Return the remaining duplicate issue numbers (up to 3), ranked by confidence (highest first).
15+
model: openai/gpt-4o
16+
responseFormat: json_schema
17+
jsonSchema: |-
18+
{
19+
"name": "duplicate_detection_result",
20+
"strict": true,
21+
"schema": {
22+
"type": "object",
23+
"properties": {
24+
"duplicates": {
25+
"type": "array",
26+
"items": {
27+
"type": "integer"
28+
},
29+
"description": "Issue numbers of potential duplicates, ranked by confidence (highest first). Empty array if no duplicates found or issue should be skipped."
30+
}
31+
},
32+
"additionalProperties": false,
33+
"required": ["duplicates"]
34+
}
35+
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Auto-Close Duplicate Issues
2+
3+
on:
4+
schedule:
5+
- cron: 0 9 * * *
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: read
10+
issues: write
11+
12+
jobs:
13+
auto-close:
14+
runs-on: ubuntu-latest
15+
timeout-minutes: 10
16+
17+
steps:
18+
- name: Checkout tree
19+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
20+
- name: Set-up Mise
21+
uses: jdx/mise-action@6d1e696aa24c1aa1bcc1adea0212707c71ab78a8 # v3.6.1
22+
with:
23+
cache: false
24+
- name: Run auto-close script
25+
run: bun run scripts/auto-close-duplicates.ts
26+
env:
27+
GITHUB_TOKEN: ${{ github.token }}
28+
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
29+
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Backfill Duplicate Detection
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
days_back:
7+
description: Number of days to look back for issues
8+
required: false
9+
default: "90"
10+
dry_run:
11+
description: Run in dry-run mode (only log, do not trigger workflows)
12+
required: false
13+
default: "true"
14+
type: choice
15+
options:
16+
- "true"
17+
- "false"
18+
19+
permissions:
20+
contents: read
21+
issues: read
22+
actions: write
23+
24+
jobs:
25+
backfill:
26+
runs-on: ubuntu-latest
27+
timeout-minutes: 30
28+
29+
steps:
30+
- name: Checkout tree
31+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
32+
- name: Set-up Mise
33+
uses: jdx/mise-action@6d1e696aa24c1aa1bcc1adea0212707c71ab78a8 # v3.6.1
34+
with:
35+
cache: false
36+
- name: Run backfill script
37+
run: ./scripts/backfill-duplicate-detection.ts
38+
env:
39+
GITHUB_TOKEN: ${{ github.token }}
40+
GITHUB_REPOSITORY_OWNER: ${{ github.repository_owner }}
41+
GITHUB_REPOSITORY_NAME: ${{ github.event.repository.name }}
42+
DAYS_BACK: ${{ github.event.inputs.days_back }}
43+
DRY_RUN: ${{ github.event.inputs.dry_run }}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
name: Issue Dedupe
2+
3+
on:
4+
issues:
5+
types:
6+
- opened
7+
workflow_dispatch:
8+
inputs:
9+
issue_number:
10+
description: Issue number to check for duplicates
11+
required: true
12+
type: number
13+
14+
permissions:
15+
contents: read
16+
issues: write
17+
models: read
18+
19+
jobs:
20+
dedupe:
21+
runs-on: ubuntu-latest
22+
timeout-minutes: 10
23+
24+
steps:
25+
- name: Checkout tree
26+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
27+
28+
- name: AI duplicate detection
29+
uses: actions/ai-inference@a380166897b5408b8fb7dddd148142794cb5624a # v2.0.6
30+
id: ai
31+
with:
32+
prompt-file: .github/prompts/dedupe.prompt.yml
33+
input: |
34+
issue_number: ${{ github.event.issue.number || inputs.issue_number }}
35+
enable-github-mcp: true
36+
37+
- name: Post comment if duplicates found
38+
run: |
39+
DUPLICATES=$(echo "$AI_RESPONSE" | jq -r '.duplicates | map(tostring) | join(" ")')
40+
41+
if [ -z "$DUPLICATES" ] || [ "$DUPLICATES" = "null" ]; then
42+
echo "No duplicates found"
43+
exit 0
44+
fi
45+
46+
echo "Duplicates found: $DUPLICATES"
47+
./scripts/comment-on-duplicates.sh --base-issue "$ISSUE_NUMBER" --potential-duplicates $DUPLICATES
48+
env:
49+
AI_RESPONSE: ${{ steps.ai.outputs.response }}
50+
GH_TOKEN: ${{ github.token }}
51+
ISSUE_NUMBER: ${{ github.event.issue.number || inputs.issue_number }}

scripts/auto-close-duplicates.ts

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
#!/usr/bin/env bun
2+
3+
import { consola } from "consola";
4+
import type {
5+
GitHubComment,
6+
GitHubIssue,
7+
GitHubReaction,
8+
} from "./lib/github.js";
9+
import {
10+
API_BASE,
11+
fetchAllPages,
12+
fetchGitHub,
13+
GITHUB_REPOSITORY_NAME,
14+
GITHUB_REPOSITORY_OWNER,
15+
getIssueComments,
16+
} from "./lib/github.js";
17+
18+
const THREE_DAYS_MS = 3 * 24 * 60 * 60 * 1000;
19+
20+
async function getOpenIssuesOlderThan3Days(): Promise<GitHubIssue[]> {
21+
const threeDaysAgo = new Date(Date.now() - THREE_DAYS_MS);
22+
const url = `${API_BASE}/issues?state=open&per_page=100&sort=created&direction=asc`;
23+
24+
const issues = await fetchAllPages<GitHubIssue>(url);
25+
26+
return issues.filter((issue) => {
27+
if (issue.pull_request) return false;
28+
return new Date(issue.created_at) < threeDaysAgo;
29+
});
30+
}
31+
32+
async function getCommentReactions(
33+
commentId: number,
34+
): Promise<GitHubReaction[]> {
35+
const url = `${API_BASE}/issues/comments/${commentId}/reactions?per_page=100`;
36+
return fetchAllPages<GitHubReaction>(url);
37+
}
38+
39+
async function closeIssue(issueNumber: number, reason: string): Promise<void> {
40+
await fetchGitHub(`${API_BASE}/issues/${issueNumber}/labels`, {
41+
method: "POST",
42+
body: JSON.stringify({ labels: ["duplicate"] }),
43+
});
44+
45+
await fetchGitHub(`${API_BASE}/issues/${issueNumber}`, {
46+
method: "PATCH",
47+
body: JSON.stringify({
48+
state: "closed",
49+
state_reason: "not_planned",
50+
}),
51+
});
52+
53+
await fetchGitHub(`${API_BASE}/issues/${issueNumber}/comments`, {
54+
method: "POST",
55+
body: JSON.stringify({
56+
body: reason,
57+
}),
58+
});
59+
}
60+
61+
async function hasActivityAfterComment(
62+
issue: GitHubIssue,
63+
botCommentDate: Date,
64+
): Promise<boolean> {
65+
const comments = await getIssueComments(issue.number);
66+
67+
const laterComments = comments.filter((comment) => {
68+
if (comment.user?.login.endsWith("[bot]")) return false;
69+
const commentDate = new Date(comment.created_at);
70+
return commentDate > botCommentDate;
71+
});
72+
73+
return laterComments.length > 0;
74+
}
75+
76+
async function hasCreatorThumbsDown(
77+
issue: GitHubIssue,
78+
botComment: GitHubComment,
79+
): Promise<boolean> {
80+
if (!issue.user) {
81+
return false;
82+
}
83+
84+
const reactions = await getCommentReactions(botComment.id);
85+
86+
return reactions.some(
87+
(reaction) =>
88+
reaction.content === "-1" && reaction.user?.login === issue.user?.login,
89+
);
90+
}
91+
92+
async function main(): Promise<void> {
93+
consola.info("Starting auto-close duplicates script...");
94+
consola.info(
95+
`Repository: ${GITHUB_REPOSITORY_OWNER}/${GITHUB_REPOSITORY_NAME}`,
96+
);
97+
98+
const issues = await getOpenIssuesOlderThan3Days();
99+
consola.info(`Found ${issues.length} open issues older than 3 days`);
100+
101+
let processedCount = 0;
102+
let closedCount = 0;
103+
104+
for (const issue of issues) {
105+
processedCount++;
106+
consola.info(`Processing issue #${issue.number}: ${issue.title}`);
107+
108+
const comments = await getIssueComments(issue.number);
109+
110+
const botComment = comments.find(
111+
(comment) =>
112+
comment.user?.login === "github-actions[bot]" &&
113+
comment.body.includes("<!-- ai-duplicate-check -->"),
114+
);
115+
116+
if (!botComment) {
117+
consola.info(` No duplicate bot comment found, skipping`);
118+
await new Promise((resolve) => setTimeout(resolve, 1000));
119+
continue;
120+
}
121+
122+
const botCommentDate = new Date(botComment.created_at);
123+
const now = new Date();
124+
const timeSinceComment = now.getTime() - botCommentDate.getTime();
125+
126+
if (timeSinceComment < THREE_DAYS_MS) {
127+
consola.info(` Bot comment is less than 3 days old, skipping`);
128+
await new Promise((resolve) => setTimeout(resolve, 1000));
129+
continue;
130+
}
131+
132+
const hasActivity = await hasActivityAfterComment(issue, botCommentDate);
133+
if (hasActivity) {
134+
consola.info(` Has activity after bot comment, skipping`);
135+
await new Promise((resolve) => setTimeout(resolve, 1000));
136+
continue;
137+
}
138+
139+
const hasThumbsDown = await hasCreatorThumbsDown(issue, botComment);
140+
if (hasThumbsDown) {
141+
consola.info(` Creator reacted with thumbs down, skipping`);
142+
await new Promise((resolve) => setTimeout(resolve, 1000));
143+
continue;
144+
}
145+
146+
consola.info(` Closing issue #${issue.number} as duplicate`);
147+
await closeIssue(
148+
issue.number,
149+
"This issue has been automatically closed as a duplicate. It was marked as a duplicate over 3 days ago with no further activity. If you believe this was closed in error, please comment and we'll re-evaluate.",
150+
);
151+
152+
closedCount++;
153+
154+
await new Promise((resolve) => setTimeout(resolve, 1000));
155+
}
156+
157+
consola.info("\n=== Summary ===");
158+
consola.info(`Processed issues: ${processedCount}`);
159+
consola.info(`Closed issues: ${closedCount}`);
160+
}
161+
162+
try {
163+
await main();
164+
} catch (error) {
165+
consola.error("Error running auto-close script:", error);
166+
process.exit(1);
167+
}

0 commit comments

Comments
 (0)