-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Bug
Computed.populate(*restrictions, reserve_jobs=True) ignores the *restrictions argument. It processes ALL pending jobs in the jobs table instead of only those matching the restriction.
How we found it
We have a Registration table with ~730 pending keys. We wanted to populate only a subset (5 keys) matching a filter:
restriction = ProcessScan & "pipeline_preset IS NOT NULL"
# Correctly shows 5 pending:
pending = restriction - Registration
print(len(pending)) # 5
# But this processes all 730 pending keys, not just 5:
Registration.populate(restriction, reserve_jobs=True, display_progress=True)The progress bar shows 0/730 and it starts computing keys that don't match the restriction at all.
Diagnosis
In autopopulate.py, populate() delegates to _populate_distributed() when reserve_jobs=True. Looking at that method (~line 460):
def _populate_distributed(self, *restrictions, ...):
...
if refresh:
self.jobs.refresh(*restrictions, priority=priority, delay=-1) # restrictions used here
# But here, restrictions are NOT applied:
pending_query = self.jobs.pending & "scheduled_time <= CURRENT_TIMESTAMP(3)"
keys = pending_query.keys(order_by="priority ASC, scheduled_time ASC", limit=max_calls)The *restrictions are passed to self.jobs.refresh() (which creates/updates job entries), but when fetching pending keys to process (line ~499), it queries self.jobs.pending without any restriction filter. So it picks up every pending job in the table regardless of what was passed to populate().
In contrast, _populate_direct() (used when reserve_jobs=False) correctly computes (key_source & restrictions) - target, so the restriction works as expected there.
Workaround
Use reserve_jobs=False:
Registration.populate(restriction, reserve_jobs=False) # works correctlyExpected behavior
populate(*restrictions, reserve_jobs=True) should only process jobs matching the restrictions, consistent with reserve_jobs=False behavior.
Version
DataJoint 2.1.1