Postmortem: Workshop Invitation Batch Failures (March–April 2026) #2559
mroderick
started this conversation in
Maintenance
Replies: 1 comment 1 reply
-
|
I suspect that there might still be a gremlin in the system. For the upcoming workshop in Berlin, I want to double check the invitations in the database against the chapter members, to double check that even with the invitation log bugs we experienced, that it still can and has sent out invitations to everyone eligible.
Further evidence: https://app.rollbar.com/a/codebar-production/fix/item/codebar-production/674#detail I'll take a closer look over the weekend |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
Postmortem: Workshop Invitation Batch Failures (March–April 2026)
Date: 2026-04-10
Status: Resolved
Severity: High
Author: Morgan Roderick
Executive Summary
Between 2026-03-31 and 2026-04-09, workshop invitation batches failed repeatedly with the error "Member has already been taken." These failures prevented thousands of codebar members across 46 chapters from receiving workshop invitations.
The root cause traced back to a refactoring in May 2018 that accidentally removed deduplication logic from a database query. When the invitation logging system launched in April 2026, it surfaced a bug that had existed silently for seven years. Additional contributing issues were discovered during investigation, and all were addressed across five pull requests.
All three bugs have been resolved. The system now reliably sends invitations across all chapters.
Why This Went Unreported for Seven Years
This section explains why a bug introduced in May 2018 remained hidden until April 2026.
The Bug Chain
The "Member has already been taken" error requires two separate bugs to interact:
.distinctin queries produces duplicate member rowsEither bug alone does not produce the error:
find_or_create_byhandles duplicatesWhy Single-Audience Sends Didn't Trigger the Bug
When sending invitations to students only:
find_or_create_by→ finds existing invitation, returns itWhen sending to everyone (both audiences):
Additional Factor: UI Workflow
The admin interface at
app/views/admin/workshops/send_invites.html.hamlexplicitly encourages sending to one audience at a time:%p If you click "Students" now, you can come back and send them to the coaches at a separate time, and vice versa.The "Everybody" button exists but sees less use than the individual audience buttons. This reduced the frequency of the trigger condition.
Timeline
.uniqfor deduplication (commit448acc7c).uniq, introducing the dormant bug (commitd7a463da).distinctto fix duplicate member processingRoot Cause Analysis
Three distinct bugs produced the same error message. This section explains each.
Bug 1: Missing Query Deduplication
Affected file:
app/models/invitation_manager.rbThe
chapter_studentsandchapter_coachesmethods query the database using a JOIN on the subscriptions table:When a member subscribes to multiple groups within a chapter, the JOIN produces duplicate rows. In the London chapter, 336 members had this issue.
The original code in January 2017 used
.uniqto deduplicate:The May 2018 refactoring extracted this to a scope but omitted the deduplication:
This bug lay dormant for seven years because invitation batches typically targeted single audiences ("students" OR "coaches"). The bug only surfaced when sending to "everyone" (both audiences), which processes more members and triggers the duplicate rows.
Fix applied: PR #2551 adds
.distinctto the query methods.Bug 2: Subscription Race Conditions
Affected table:
subscriptionsThe model validation
validates :group, uniqueness: { scope: :member_id }prevents duplicate subscriptions at the application level. However, it cannot prevent race conditions where two concurrent requests both pass validation before either saves.Analysis of production data revealed 25 unique member/group combinations with duplicate subscriptions created within 3 milliseconds to 2.5 seconds of each other. These spanned 2015 to 2023.
Fix applied: PR #2553 adds a database-level unique index on
(member_id, group_id)and cleans up 31 existing duplicate records.Bug 3: Cross-Status Retry in InvitationLogger
Affected file:
app/services/invitation_logger.rbThe
InvitationLogEntrymodel validates uniqueness on(member_id, invitation_type, invitation_id). Status is NOT part of the constraint, meaning a member can have both a "success" entry AND a "failure" entry for the same invitation.The original retry logic used:
When an entry existed with status
failedbut the code requested statussuccess, the query returned the existing (wrong-status) entry. The subsequent save attempt triggered the uniqueness validation error.Fix applied: PR #2556 and #2558 refactor to check for existing entries without including status in the query:
Community review (olleolleolle) suggested using
find_or_create_bywith a block, which was implemented in the final version for cleaner, more idiomatic Rails code.Impact
Chapters Affected
46 of 54 chapters (85%) had members with duplicate subscriptions.
Failed Batches Observed
In each failed batch, the remaining members never received invitations.
What Went Well
Systematic root cause isolation — The team methodically separated the three bugs despite identical error messages.
Community contributions — Review feedback improved the final fix, demonstrating the value of open-source collaboration.
Production data analysis — Queries against production identified the full scope: 1,200 affected members across 46 chapters.
Complementary fixes — Each PR addressed a distinct aspect, ensuring the solution is robust across different scenarios.
Prevention
Query Deduplication Tests
Add test coverage for methods that return members from JOIN queries. Ensure tests cover the case where a member appears in multiple groups.
Action: Review existing specs for
chapter_studentsandchapter_coaches, add edge case coverage.Database-Level Uniqueness
All polymorphic associations with user-facing uniqueness constraints should have database-level unique indexes.
Action: Audit models for
validates :uniquenesswithout corresponding database indexes.Post-Deploy Verification
Include verification queries in deployment runbooks to confirm expected state after schema or model changes.
Example query:
Acknowledgments
Thank you to all contributors who helped diagnose and fix these issues:
Related Pull Requests
Beta Was this translation helpful? Give feedback.
All reactions