Commit 51d2eea
feat(arclight#29): Refactor run orchestration for threaded and single-scope runs
Restructured the pipeline for collections and creators to run independently
with their own timestamps, proper cleanup, and parallel execution orchestrated
via ThreadPoolExecutor
Changes:
- Split last_updated into last_updated_collections and last_updated_creators
- Extract run_collections() and run_creators() from monolithic run()
- Add run_all() that orchestrates both via ThreadPoolExecutor
- Scope Solr cleanup to record type using is_creator flag
- Update process_deleted_records() to accept scope parameter
- Move update_repositories() into run_all() (only runs for full updates)
- Fix timestamp comparisons to use min() where needed
- Add directory creation safeguards (os.makedirs with exist_ok)
- Change is_creator from string 'true' to boolean true
- Add proper exception handling in parallel execution
Benefits:
- Collections and creators can be rebuilt independently (--collections-only, --agents-only)
- Full runs execute both pipelines in parallel (faster)
- Each record type maintains its own timestamp state
- Solr cleanup is scoped to avoid deleting unrelated records1 parent 5952798 commit 51d2eea
3 files changed
Lines changed: 191 additions & 90 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
65 | | - | |
66 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
185 | | - | |
| 185 | + | |
| 186 | + | |
186 | 187 | | |
187 | 188 | | |
188 | 189 | | |
| |||
0 commit comments