Pulldasher records a fair amount of issue metadata in the issues table (title, assignee, status, milestone, created/closed dates) but not a few things I'd like for reporting on GitHub stats: who opened the issue, why it was closed, and when it was assigned to its current assignee.
- Author isn't stored at all. The
pulls table already keeps the PR author as owner, but issues have no equivalent.
- Closed reason isn't stored. GitHub distinguishes
completed from not_planned (the "won't do" case) via the issue's state_reason, and that's useful signal we're dropping.
- Date assigned isn't stored. We have the current
assignee but not when they were assigned, so we can't measure how long issues sit before someone picks them up.
Do This
Capture these on issue ingest (webhook + bulk refresh) and backfill history:
Background
author and state_reason are plain fields on the GitHub issue object, so they flow straight through Issue.getFromGH. date_assigned is trickier: the issue object carries no assignment timestamp, so we have to read it from the issue's events (the most recent assigned event for the current assignee). Pulldasher already fetches issue events for label history during a full refresh, so it's available there.
One consequence: today most issue webhooks take a lightweight path that upserts the webhook body directly and never looks at events. To get date_assigned (and to keep label attribution correct), I think we should route issue webhooks through the same full refresh that opened already uses. That's an extra events API call per issue webhook, which seems fine at our issue volume.
The schema change is three columns on issues. Following the pattern from the mergeable column (pulldasher #356 plus the metrics migration in ifixit #46835), the pulldasher repo carries the migration and schema.sql update for posterity, and the actual ALTER runs against the metrics database through an ifixit migration.
Pulldasher records a fair amount of issue metadata in the
issuestable (title, assignee, status, milestone, created/closed dates) but not a few things I'd like for reporting on GitHub stats: who opened the issue, why it was closed, and when it was assigned to its current assignee.pullstable already keeps the PR author asowner, but issues have no equivalent.completedfromnot_planned(the "won't do" case) via the issue'sstate_reason, and that's useful signal we're dropping.assigneebut not when they were assigned, so we can't measure how long issues sit before someone picks them up.Do This
Capture these on issue ingest (webhook + bulk refresh) and backfill history:
authorfrom the issue's GitHubuser.loginstate_reason(completed/not_planned/reopened/ null)date_assigned, the time the current assignee was assignedBackground
authorandstate_reasonare plain fields on the GitHub issue object, so they flow straight throughIssue.getFromGH.date_assignedis trickier: the issue object carries no assignment timestamp, so we have to read it from the issue's events (the most recentassignedevent for the current assignee). Pulldasher already fetches issue events for label history during a full refresh, so it's available there.One consequence: today most issue webhooks take a lightweight path that upserts the webhook body directly and never looks at events. To get
date_assigned(and to keep label attribution correct), I think we should route issue webhooks through the same full refresh thatopenedalready uses. That's an extra events API call per issue webhook, which seems fine at our issue volume.The schema change is three columns on
issues. Following the pattern from themergeablecolumn (pulldasher #356 plus the metrics migration in ifixit #46835), the pulldasher repo carries the migration andschema.sqlupdate for posterity, and the actualALTERruns against themetricsdatabase through an ifixit migration.