Issue
When a compute_worker task is interrupted after the work has started but before the message is acked (worker crash, OOM kill, soft time-out, container restart, network partition with the broker), RabbitMQ requeues the same compute_worker_run payload. A subsequent worker pulls it from the broker and runs the full lifecycle a second time end-to-end.
Observed concrete symptoms:
- The submission row's
scoring_worker_hostname gets overwritten with the second worker's hostname, destroying the audit trail of who actually ran the job.
- The submission
status cycles backwards: a row already in Finished is PATCHed back through Running → Scoring → Finished, breaking downstream signals (notifications, leaderboard recomputation).
upload_submission_scores is called a second time and inserts a duplicate SubmissionScore row per leaderboard column. The aggregator Submission.calculate_scores() then crashes with MultipleObjectsReturned on submission.scores.get(column=col), and any subsequent score read can return either of the duplicates non-deterministically.
Issue
When a
compute_workertask is interrupted after the work has started but before the message is acked (worker crash, OOM kill, soft time-out, container restart, network partition with the broker), RabbitMQ requeues the samecompute_worker_runpayload. A subsequent worker pulls it from the broker and runs the full lifecycle a second time end-to-end.Observed concrete symptoms:
scoring_worker_hostnamegets overwritten with the second worker's hostname, destroying the audit trail of who actually ran the job.statuscycles backwards: a row already inFinishedis PATCHed back throughRunning → Scoring → Finished, breaking downstream signals (notifications, leaderboard recomputation).upload_submission_scoresis called a second time and inserts a duplicateSubmissionScorerow per leaderboard column. The aggregatorSubmission.calculate_scores()then crashes withMultipleObjectsReturnedonsubmission.scores.get(column=col), and any subsequent score read can return either of the duplicates non-deterministically.