Skip to content

Restructuring: top-level layout + workflow/papers/scratch division#197

Draft
cailmdaley wants to merge 112 commits into
developfrom
cleanup/restructuring
Draft

Restructuring: top-level layout + workflow/papers/scratch division#197
cailmdaley wants to merge 112 commits into
developfrom
cleanup/restructuring

Conversation

@cailmdaley

@cailmdaley cailmdaley commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Restructuring: top-level layout + workflow / papers / scratch split

Reorganizes sp_validation around one principle — the things you run live at the top — with a clean split between analysis, paper figures, and personal scratch, plus a modular Snakemake workflow built for more than one person. Base branch: develop (untouched).

Layout

sp_validation/
├── src/sp_validation/   library code (incl. the glass_mock core)
├── cosmo_val/           shear / cosmology validation — code + config
├── cosmo_inference/     inference — code + config (cosmosis / cosmocov)
├── workflow/            ALL analysis — modular Snakemake, multi-person → results/
├── papers/              final-figure assembly only — bmodes, catalog, cosmo_val, harmonic
├── scripts/             reduction scripts (+ examples/, glass_mock/)
├── scratch/             per-person ad hoc work, tracked (cdaley/, guerrini/)
├── results/             analysis products + diagnostic plots (contents gitignored)
└── docs/  config/  + tests under src/sp_validation/tests/

Previously cosmo_val was buried inside notebooks/ while cosmo_inference/ sat at the top, so you had to hunt for where each piece lived. Now the things you actually run sit side by side, sharing src/ underneath.

Division of labor

The boundary is the inputs to a paper figure: everything up to that point is analysis, the figure itself is presentation.

  • workflow/ — all analysis. Generic, reusable, modular. Produces analysis products and diagnostic plots into results/. The bulk of the work lives here.
  • papers/{name}/ — final-figure assembly only. Figure PDFs, colour, layout. Tied to one paper; may never touch Snakemake.
  • scratch/{person}/ — personal, ad hoc, tracked. One-off experiments and custom workflows; tracked because seeing each other's scratch is useful.

Modular workflow

Nothing here is computed once — the catalog changed ~20× in the first release, and every paper varies the data vector, covariance, and inference. So the workflow is parameterized: the rules are shared, the config changes per run. Snakemake's module directive imports the rules under your own config and an output prefix, with per-rule override:

module analysis:
    snakefile: "../../workflow/Snakefile"
    config:    config           # this run's catalog, cuts, blind
    prefix:    "results/bmodes"  # products land here — no clobbering
use rule * from analysis

Each run namespaces under results/{run}/; a --dry-run on every composition guards against silent breakage as the structure grows.

Notebooks

The analysis tree is now notebook-free — the top-level notebooks/ directory is gone, and every notebook was moved to a proper home, converted, or deleted:

  • Reusable code → library / scripts. Reduction-notebook logic lives in src/sp_validation/; runnable workflows in scripts/examples/ (e.g. extract_info.py, calibrate_comprehensive_cat.py, leakage_minimal.py).
  • Tutorial → docs. tutorial_UNIONS_SP_v1.0 is now the live Sphinx page "Using the weak-lensing catalogues".
  • Paper-plot notebooks → papers/. The harmonic plot set and the catalog check_gaia plot moved under papers/{harmonic,catalog}/ (kept as notebooks — final-figure assembly).
  • cosmo_val/ resolved. Generic helpers lifted into src/sp_validation/basic.py; the five working notebooks converted to jupytext percent-light scripts under scratch/guerrini/. cosmo_val/ now holds only code + config + README.
  • Obsolete deleted. The exploratory reduction notebooks, glass_mock/validate_glass_mock, and defunct/ (quarantined since 2024) — all recoverable from develop history.
  • Untouched: cosmo_inference/ keeps its own working notebooks in place.
  • Discipline via tooling. nbstripout strips notebook outputs on commit, plus a large-file pre-commit hook (pre-commit install; see CONTRIBUTING).
Full per-notebook ledger (every .ipynb moved or deleted vs develop)

Moved → papers/harmonic/ (from cosmo_inference/notebooks/2D_harmonic_space_cosmic_shear_plots/, preserved as notebooks):
2025_09_26_plot_contours + its variants (_NL_modelling, _blind, _cl_vs_xi, _covariance, _glass_mock, _iNKA_vs_OneCov, _leakage, _scale_cut, _small_vs_large_scales, _weak_lensing), 2025_10_28_plot_whisker, S8_whisker

Moved → papers/catalog/ (from notebooks/cosmo_val/catalog_paper_plot/):
check_gaia

Promoted → docs (from notebooks/analyse_shear_cat/):
tutorial_UNIONS_SP_v1.0docs/source/using_the_catalogues.md

Converted → scripts in scratch/guerrini/ (from notebooks/cosmo_val/; reusable helpers chi2_and_pte, corr_from_cov, cov_from_one_covariance lifted to src/sp_validation/basic.py):
compute_pte_cell.py, one_covariance.py, plot_comparison.py, get_prior_leakage.py, exploration.py (+ its namaster_utils.py helper)

Deleted (recoverable from develop history):

  • notebooks/: main_set_up, metacal_global, metacal_local, psf_leakage, maps, maps_local, match_stats, correlation, cosmology, write_cat, frac_error_local_calib, analyse_matched_stars_UNIONS_HSC, analyse_shear_cat/m2_SP_LF_alpha
  • glass_mock/: validate_glass_mock
  • defunct/: TD_WL_cycle2_2021, validation_local_cal

glass_mock

The generation core folded into src/sp_validation/glass_mock.py; the runner scripts moved to scripts/glass_mock/. The top-level glass_mock/ directory is gone.

Still open

Some scratch/ material may still deserve promotion into src/ or workflow/ and was intentionally left untouched in this PR — most notably scratch/guerrini/namaster_utils.py, a cohesive ~200-line NaMaster covariance toolkit that overlaps cosmo_val/harmonic_covariance_gaussian_sims.py. Consolidating the two (and adding tests) is a covariance-API decision best left to Sacha rather than forked here.

Safety net

A back-pressure guard suite holds structural invariants as files move — imports + standalone-scripts/ resolution, snakemake -n dry-runs, config-path existence, symlink integrity, and a dangling-reference / move-map guard. Sacha's develop foundation (paper plots, harmonic configs, library changes) is folded in, with the cosmology.py mnu KeyError fixed.

— Claude on behalf of Cail

sachaguer and others added 30 commits March 3, 2026 15:55
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fold Sacha's pending foundation (PR #192 head, sachaguer:develop @ c22f075)
onto current develop so the restructuring builds on his foundation without
racing his merge gesture (Cail's direction, 2026-06-05).

.gitignore conflict resolved in favour of develop: kept the .felt tracking
block, rejected sacha's broad cluster bans (*.png *.sh *.fits *.out *.err) —
those get narrowed during the restructuring gitignore pass, not adopted
wholesale. cosmo_val.py / cat_config.yaml auto-merged cleanly (origin's
docstring-RST polish + sacha's functional changes did not collide).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cosmology.py get_cosmo read planck_defaults["mnu"] but the dict never
defined the key, so every bare get_cosmo() call (no ccl_params, no mnu
arg) raised KeyError: 'mnu'. Add "mnu": PLANCK18["m_nu"] (0.06 eV).

Verified: test_cosmology.py 26/26 pass (was immediate KeyError before).
This is the one blocker that kept Sacha's foundation from running clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs/source/sp_validation.*.rst are regenerated on every docs build by
sphinx-apidoc (deploy-docs.yml: `sphinx-apidoc -feTMo docs/source
src/sp_validation`), matching the already-ignored fortuna.*/scripts.*
stubs — they should never be committed.

uv.lock: the container is the canonical runtime (CLAUDE.md), the lockfile
has never been tracked, so ignore it rather than make an unowned
pinned-dep commitment. One-line flip to track if we decide to pin.

Establishes a clean base for the restructuring branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sacha's branch removed the cosmosis_pipeline_glass_mock_0*.ini and
_v0*.ini ignore patterns, which un-ignored ~700 generated glass-mock
pipeline configs in cosmo_inference/cosmosis_config/. Restore the two
specific patterns (not broad bans) so the tree returns to develop's
clean state. These are generated artifacts, never tracked.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cailmdaley and others added 27 commits June 13, 2026 15:56
Purge foreign paths from the dormant inference subsystem: the output root now
derives from COSMO_INFERENCE (this repo's cosmo_inference, via common.py) and
the shell rundir from a COSMO_INFERENCE_RUNDIR constant, instead of
/home/guerrini and the deprecated pure_eb symlink; the external chain/mock dirs
(/n09data/guerrini/*) become config keys (inference.chains_dir,
.glass_mock_data_dir, .glass_mock_chains_dir). grep confirms zero
guerrini/pure_eb/n09data paths remain in inference.smk.

Resolve the pseudo-Cℓ filename schism (consumer adopts the producer's tagged
name): pseudo_cl_assets() now requests
pseudo_cl_{version}_blind={blind}_{binning}_nbins={nbins}.fits and the matching
pseudo_cl_cov_ name, sourcing blind/binning/nbins from a new harmonic.fiducial
config block (A/powspace/32); PSEUDO_CL_DIR fixed COSMO_VAL.parent -> COSMO_VAL
so the DAG edge to the producer forms. Verified on disk that the producer's
tagged files exist at exactly these names (the old bare name does not). Keys
added to both papers/bmodes and papers/cosmo_val configs (both load the workflow).

DAG verified: snakemake --list + inference_fiducial -n build cleanly (only the
expected dormant cov_tau MissingInput); dry-run guard passes. Scope: data
products only, no cosmosis chains. The FITS-CONTENT schema reconciliation
(reader vs producer HDUs) remains for when the subsystem is revived.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Integration test for cosmo_inference/scripts/cosmosis_fitting.py (numpy +
astropy only, fully exercisable in-container). Pins the HDU set for both the
plain-xi and --use-rho-tau paths, the xi+ then xi- data-vector ordering, and
the blocked-covariance offsets (STRT_0..3 = 0, N, 2N, 3N with the tau block
truncated 3N->2N). Teeth confirmed via mutation testing (wrong offset/order/
truncation all turn it red). 9 tests, green in-container. Covers the riskiest,
least-tested code in the inference data-product path; the harmonic-Cℓ
augmentation path is left for a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t guard

git rm --cached the 27 catalog-paper plot PDFs (~13M) under papers/catalog/plots
(no .tex in this repo references them — the paper TeX lives in the docs/ repo —
so they are regenerable script/notebook outputs) and the 2 tracked glass_mock
cosmosis_config .ini files (already matched by .gitignore but committed before
the pattern existed). Both now gitignored so they won't reappear.

Add test_no_stray_outputs.py: a structural guard asserting no tracked image/data
outputs (png/pdf/fits/npy/...) live under any */scripts/ dir — back-pressure
against the regression that once committed a 761 KB PNG beside a script.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…_mock

Tracks the latent bug surfaced by adding GLASS to the container: glass_mock.py's
map path needs a cosmology with comoving_distance that the installed glass+
cosmology pair doesn't provide. Map test xfail'd; fix = pin compatible versions,
verify in the fresh sandbox, drop the xfail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the deprecated pure_eb/ compat-symlink prefix from COSMO_VAL,
COSMO_INFERENCE, CAT_CONFIG, and CV_RUNDIR. Behavior-identical (same
inode; pure_eb -> analyses/shear_2d/bmodes_2d -> code/sp_validation),
but no longer depends on a symlink slated for removal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolves the glass_mock map-path AttributeError (Cosmology.from_camb ->
CambCosmology lacks comoving_distance). glass 2025.1 is the unique version
with the flat API the map path uses AND the legacy cosmo.dc/xm/ef interface
that cosmology 2022.10.9 (its newest release) provides; matter_cls lives in
the separate glass.ext.camb package. Verified under uv: the full map path
runs and is seed-deterministic. Drop the test_glass_mock xfail once the
container is rebuilt with the [glass] extra (gated on the sandbox swap).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase 1 of the lightcone reproduction: root unions analysis + 3 paper
sub-analyses (II bmodes/Daley, III cosmic_shear_2d/Goh, IV harmonic/Guerrini),
shared decisions hoisted to root, stop-at-inference. Decisions + claimed
findings extracted from docs/unions_release TeX and adversarially verified;
astra validate passes. Lives in scratch (Cail's), not papers/. See fiber
unions-astra-reproduction; Phase 2 (review) next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…reads

Record on cosmo-val-workflow that the canonical validation set
(SP_v1.4.6.3 +-leak_corr, npatch=100) is confirmed and include_pseudo_cl
is on by default. File notebooks-cleanup, glass-mock-migration, and
restructuring-docs under sp-validation-restructuring.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Add short, structural READMEs to workflow/, papers/, cosmo_val/,
scripts/, and results/, each explaining what belongs there and the
boundary with its neighbors. The organizing idea is the inputs to a
paper figure: analysis lives in workflow/, presentation in papers/.
Allow results/README.md past the results/ gitignore.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Add a Repository layout section to the top-level README and a matching
repository_structure narrative page to the Sphinx docs, both carrying
the target tree and the analysis-vs-presentation division of labor.
Wire the new page into toc.rst (Getting Started) and link it from the
index landing page.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Remove all .ipynb exploratory notebooks (analysis precursors superseded by
sp_validation modules and scripts/, preserved in git history), plus
tests_bump.py (scratch) and misc.ipybn (junk, typo extension).
Lift the reusable functions out of the top-level glass_mock/ runner scripts
into src/sp_validation/glass_mock.py, alongside the existing generation core:

- downgrade_mask, ia_convergence + growth_factor (from make_unions_glass_sim)
- create_mask_from_catalogue (from create_mask)
- compute_two_point_xi / _cl / _cl_map, get_n_gal_map, TREECORR_CONFIG
  (from compute_two_point_stats_glass)
- compute_leakage_harmony (from compute_leakage_harmony)
- powspace_bins: factor out the NaMaster square-root bandpower binning that
  the harmonic stats and leakage paths duplicated

Heavy deps (healpy/pymaster/treecorr/scipy/astropy) stay lazily imported so
the module and its import guard resolve without the full GLASS stack.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
… notebook

Per the restructuring principle (library in src/, runners in scripts/), move
the four top-level glass_mock/ CLI scripts to scripts/glass_mock/ as thin
wrappers that import their logic from sp_validation.glass_mock:

- make_unions_glass_sim.py: keeps arg parsing, mask/sampling/FITS I/O; the
  Sky helper methods (downgrade_mask, IA/growth math) now call the library.
- create_mask.py, compute_two_point_stats_glass.py, compute_leakage_harmony.py:
  reduced to argparse + I/O around the library functions.

Delete the exploratory validate_glass_mock.ipynb (hardcoded personal paths,
research scratch) — it stays in git history. The top-level glass_mock/
directory no longer exists.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
- glass_mock.py module docstring now points at scripts/glass_mock/ and notes
  the post-processing helpers it collects.
- test_dangling_move_references.py: correct the now-stale comment that claimed
  the top-level glass_mock/ directory survives; it has been fully removed, but
  the bare string "glass_mock" stays live (workflow, glass_mocks config,
  results/ paths, mock filenames), so it is intentionally not registered as a
  retired path.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Relocate demo, catalogue-reduction, and plotting scripts (jupytext/percent
format) plus the params.py config template and des_y3 example from notebooks/
to scripts/examples/. notebooks/ is now removed entirely.

- Group all demos and reduction scripts under scripts/examples/ to keep the
  doc-scanned top-level scripts/ limited to clean CLI tools.
- Fix two paste-corruption syntax errors in demo_calibrate_minimal_cat.py so
  the file parses (matched unpacking/signature to calibrate_comprehensive_cat).
- Apply ruff safe autofixes (whitespace, unused imports, import sorting,
  f-strings) to the moved files.
Background agents create transient isolated worktrees under
.claude/worktrees/; they must never be tracked.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Repoint params.py config references (CLAUDE.md, quickstart.rst,
run_validation.md, post_processing.md, prepare_patch_for_spval.sh) at
scripts/examples/params.py, and the extract_information.* reference at the
moved scripts/examples/extract_info.py. Register notebooks/params.py ->
scripts/examples/params.py in the dangling-move-references guard.
Convert the user-facing tutorial_UNIONS_SP_v1.0 notebook (removed in the
notebooks/ cleanup) into a Sphinx User Guide page rather than deleting it.
Updated for the HDF5 catalogue format (>= v1.4.1) and cross-referenced to
sp_validation.calibration.get_calibrated_m_c / get_calibrate_e_from_cat,
which now automate the hand-rolled metacalibration steps.

https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
The cosmo_val/ module was promoted from notebooks/cosmo_val/ during phase-2
with its notebooks riding along untouched. Resolve them per "reasonably
reusable code -> library, the rest -> scratch scripts":

- Lift three generic helpers into src/sp_validation/basic.py:
  chi2_and_pte, corr_from_cov, cov_from_one_covariance.
- Convert the five notebooks to jupytext percent-light .py under
  scratch/guerrini/ (magics guarded like run_cosmo_val.py, hardcoded
  paths preserved verbatim). compute_pte_cell and one_covariance now
  import the lifted helpers and SquareRootScale from sp_validation.rho_tau.
- Move the investigation namaster/ bundle (utils.py -> namaster_utils.py,
  exploration.ipynb -> exploration.py) and repair a latent broken import
  (sp_validation.utils_cosmo_val -> rho_tau) the phase-2 move missed.

cosmo_val/ is now notebook-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Conversion shipped in e3147b5; record outcome, closed status, and handoff.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Four .tldr/status + .tldrignore files (auto-generated by the TLDR tool,
status 'stopped') were committed by accident during the restructuring.
Remove them and add the patterns to .gitignore so they don't return.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011QuJMSPvnpsBkr7PBZVcPQ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants