Archive pipeline runs and artifacts to HuggingFace#642
Draft
Archive pipeline runs and artifacts to HuggingFace#642
Conversation
Publish full run records (metadata, diagnostics, intermediate build artifacts) to a dedicated HF model repo (PolicyEngine/policyengine-us-data-pipeline) so run history survives Modal volume deletion. All existing uploads to the main data repo are unchanged. - Add upload_to_pipeline_repo() utility with retry and batching - Mirror meta.json, diagnostics, and validation files on every write - Archive Step 1 intermediate artifacts (acs, puf, extended_cps, etc.) - Archive Step 2 calibration_package_meta.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 10-minute subprocess timeout to _mirror_to_pipeline_repo - Pass data via env vars instead of f-string code injection - Narrow exception handling to SubprocessError/OSError - Add mirror=False param to write_run_meta for error handlers - Validate HUGGING_FACE_TOKEN before attempting uploads - Extract _batched_hf_upload shared helper (dedup ~30 lines) - Consolidate _archive_build/package_artifacts into _archive_artifacts - Fix TOCTOU race between exists() and stat() with try/except - Build validation mirror list inline instead of re-probing filesystem - Add proper type hints to new functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PolicyEngine/policyengine-us-data-pipeline)policyengine/policyengine-us-dataare unchangedFixes #641
Changes
policyengine_us_data/utils/data_upload.py_batched_hf_upload()shared helper fromupload_to_staging_hf(deduplicates ~30 lines of batch/retry logic)upload_to_pipeline_repo()thin wrapper targetingPolicyEngine/policyengine-us-data-pipelineHUGGING_FACE_TOKENbefore attempting uploads (fail-fast instead of misleading 401 after retries)modal_app/pipeline.py_mirror_to_pipeline_repo()— non-fatal subprocess wrapper with 10-min timeout, env-var data passing (no f-string injection), narrow exception handling_archive_artifacts()— unified helper for archiving named artifacts from the pipeline volumewrite_run_meta()gainsmirrorparam (set toFalsein the failure handler to prevent hanging on network errors)archive_diagnostics()mirrors diagnostic files to the pipeline repo after volume commit_write_validation_diagnostics()builds mirror list inline as files are written (no re-probing filesystem)STEP1_ARTIFACTS) and Step 2 metadata (STEP2_ARTIFACTS) after each step completeschangelog.d/pipeline-hf-archival.added.mdTest plan
PolicyEngine/policyengine-us-data-pipelinemodel repo on HuggingFace (one-time)make pipelineon Modal with small epoch count{run_id}/meta.json,{run_id}/diagnostics/*,{run_id}/artifacts/*policyengine/policyengine-us-dataare unchanged🤖 Generated with Claude Code