Skip to content

feat: extend enricher to cover extra fields [CM-1220]#4186

Open
mbani01 wants to merge 3 commits into
mainfrom
feat/extend_repo_enricher
Open

feat: extend enricher to cover extra fields [CM-1220]#4186
mbani01 wants to merge 3 commits into
mainfrom
feat/extend_repo_enricher

Conversation

@mbani01

@mbani01 mbani01 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces a comprehensive enrichment of repository metadata by adding activity snapshot collection and enhancing security metadata extraction. The main changes include a new migration and schema for storing repository activity snapshots, new logic for fetching and computing repository activity metrics (such as commit and PR/issue statistics and medians), and improvements to how security policy and file presence are detected and stored.

Repository Activity Snapshot Collection and Storage:

  • Added a new SQL migration to the repos table to include security_policy_enabled, security_file_enabled, and snapshot_at columns, and created a new repo_activity_snapshot table to store detailed activity metrics over a 12-month window.
  • Implemented fetchActivitySnapshot.ts to query GitHub's GraphQL API for commit, PR, and issue activity, including paging and rate limit handling, and aggregate results into a structured snapshot.
  • Added median computation utilities for PR and issue response/merge/close times in computeMedians.ts, used by the snapshot logic.
  • Updated the enrichment loop to fetch and buffer activity snapshots, and to bulk upsert them into the database. [1] [2] [3]

Security Metadata Improvements:

  • Enhanced the repository fetch logic to extract isSecurityPolicyEnabled via GraphQL and to independently check for the presence of SECURITY.md in both the root and .github directories using the GitHub Contents API, storing the results in new fields. [1] [2] [3] [4] [5] [6]

These changes enable more robust and granular tracking of repository activity and security posture for downstream analytics and reporting.


Note

Medium Risk
Adds heavy GitHub API usage and new persistence paths in the enricher worker; snapshot failures are isolated but rate limits can park installations and slow throughput.

Overview
Extends the packages_worker GitHub repo enricher to persist 12-month activity snapshots and security posture fields alongside existing light metadata.

A migration adds repo_activity_snapshot (commit/PR/issue counts and median response/merge/close times) plus repos.security_policy_enabled, security_file_enabled, and repos.snapshot_at. New fetchActivitySnapshot pulls GitHub GraphQL (summary search + paginated PRs/issues), derives medians via computeMedians, and bulk-upserts through updateRepoActivitySnapshot (which also stamps repos.snapshot_at). The enrichment loop runs snapshots after a successful light fetch, buffers them with repo updates, tolerates snapshot failures without failing the whole repo, and tracks HTTP/rate-limit cost.

Light repo enrichment now records isSecurityPolicyEnabled from GraphQL and probes SECURITY.md / .github/SECURITY.md via the Contents API, persisted in bulkUpdateEnrichedRepos.

Reviewed by Cursor Bugbot for commit 70ee920. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 self-assigned this Jun 9, 2026
Copilot AI review requested due to automatic review settings June 9, 2026 16:16
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@mbani01 mbani01 changed the title feat: extend enricher to cover extra fields feat: extend enricher to cover extra fields [CM-1220] Jun 9, 2026
Comment thread services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts
Comment thread services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts
Comment thread services/apps/packages_worker/src/enricher/fetchLightRepo.ts Outdated
Comment thread services/apps/packages_worker/src/enricher/runEnrichmentLoop.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the packages_worker GitHub repo enricher to collect and persist richer repository metadata: (1) repository activity snapshots over a 12‑month window (commit/PR/issue counts + median timing metrics) and (2) improved security posture signals (security policy + SECURITY.md presence).

Changes:

  • Adds a new DB migration introducing repo_activity_snapshot plus new security-related columns on repos.
  • Introduces activity snapshot fetching via GitHub GraphQL, including PR/issue paging and median computations.
  • Updates the enrichment loop and DB write path to buffer and bulk upsert snapshots alongside existing “light repo” enrichment.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
services/apps/packages_worker/src/enricher/updateRepoActivitySnapshot.ts Adds bulk upsert into repo_activity_snapshot.
services/apps/packages_worker/src/enricher/updateEnrichedRepos.ts Persists new security_policy_enabled / security_file_enabled repo fields.
services/apps/packages_worker/src/enricher/types.ts Adds RepoActivitySnapshot type and new fields on LightRepoResult.
services/apps/packages_worker/src/enricher/runEnrichmentLoop.ts Fetches activity snapshots per repo and buffers snapshot writes.
services/apps/packages_worker/src/enricher/fetchLightRepo.ts Enhances repo fetch to include isSecurityPolicyEnabled + checks for SECURITY.md.
services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts New snapshot collection implementation (summary query + paging + medians).
services/apps/packages_worker/src/enricher/computeMedians.ts New utilities to compute median response/merge/close timings.
backend/src/osspckgs/migrations/V1780996561__repo_activity_snapshot.sql Adds schema changes for snapshots + new security columns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/apps/packages_worker/src/enricher/fetchLightRepo.ts Outdated
Comment thread services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts
Comment thread services/apps/packages_worker/src/enricher/runEnrichmentLoop.ts
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 93795a5. Configure here.

Comment thread services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts
Comment thread services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts
@mbani01 mbani01 requested a review from themarolt June 9, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants