feat: extend enricher to cover extra fields [CM-1220]#4186
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
There was a problem hiding this comment.
Pull request overview
This PR extends the packages_worker GitHub repo enricher to collect and persist richer repository metadata: (1) repository activity snapshots over a 12‑month window (commit/PR/issue counts + median timing metrics) and (2) improved security posture signals (security policy + SECURITY.md presence).
Changes:
- Adds a new DB migration introducing
repo_activity_snapshotplus new security-related columns onrepos. - Introduces activity snapshot fetching via GitHub GraphQL, including PR/issue paging and median computations.
- Updates the enrichment loop and DB write path to buffer and bulk upsert snapshots alongside existing “light repo” enrichment.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| services/apps/packages_worker/src/enricher/updateRepoActivitySnapshot.ts | Adds bulk upsert into repo_activity_snapshot. |
| services/apps/packages_worker/src/enricher/updateEnrichedRepos.ts | Persists new security_policy_enabled / security_file_enabled repo fields. |
| services/apps/packages_worker/src/enricher/types.ts | Adds RepoActivitySnapshot type and new fields on LightRepoResult. |
| services/apps/packages_worker/src/enricher/runEnrichmentLoop.ts | Fetches activity snapshots per repo and buffers snapshot writes. |
| services/apps/packages_worker/src/enricher/fetchLightRepo.ts | Enhances repo fetch to include isSecurityPolicyEnabled + checks for SECURITY.md. |
| services/apps/packages_worker/src/enricher/fetchActivitySnapshot.ts | New snapshot collection implementation (summary query + paging + medians). |
| services/apps/packages_worker/src/enricher/computeMedians.ts | New utilities to compute median response/merge/close timings. |
| backend/src/osspckgs/migrations/V1780996561__repo_activity_snapshot.sql | Adds schema changes for snapshots + new security columns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 93795a5. Configure here.

This pull request introduces a comprehensive enrichment of repository metadata by adding activity snapshot collection and enhancing security metadata extraction. The main changes include a new migration and schema for storing repository activity snapshots, new logic for fetching and computing repository activity metrics (such as commit and PR/issue statistics and medians), and improvements to how security policy and file presence are detected and stored.
Repository Activity Snapshot Collection and Storage:
repostable to includesecurity_policy_enabled,security_file_enabled, andsnapshot_atcolumns, and created a newrepo_activity_snapshottable to store detailed activity metrics over a 12-month window.fetchActivitySnapshot.tsto query GitHub's GraphQL API for commit, PR, and issue activity, including paging and rate limit handling, and aggregate results into a structured snapshot.computeMedians.ts, used by the snapshot logic.Security Metadata Improvements:
isSecurityPolicyEnabledvia GraphQL and to independently check for the presence ofSECURITY.mdin both the root and.githubdirectories using the GitHub Contents API, storing the results in new fields. [1] [2] [3] [4] [5] [6]These changes enable more robust and granular tracking of repository activity and security posture for downstream analytics and reporting.
Note
Medium Risk
Adds heavy GitHub API usage and new persistence paths in the enricher worker; snapshot failures are isolated but rate limits can park installations and slow throughput.
Overview
Extends the packages_worker GitHub repo enricher to persist 12-month activity snapshots and security posture fields alongside existing light metadata.
A migration adds
repo_activity_snapshot(commit/PR/issue counts and median response/merge/close times) plusrepos.security_policy_enabled,security_file_enabled, andrepos.snapshot_at. NewfetchActivitySnapshotpulls GitHub GraphQL (summary search + paginated PRs/issues), derives medians viacomputeMedians, and bulk-upserts throughupdateRepoActivitySnapshot(which also stampsrepos.snapshot_at). The enrichment loop runs snapshots after a successful light fetch, buffers them with repo updates, tolerates snapshot failures without failing the whole repo, and tracks HTTP/rate-limit cost.Light repo enrichment now records
isSecurityPolicyEnabledfrom GraphQL and probesSECURITY.md/.github/SECURITY.mdvia the Contents API, persisted inbulkUpdateEnrichedRepos.Reviewed by Cursor Bugbot for commit 70ee920. Bugbot is set up for automated code reviews on this repo. Configure here.