Skip to content

Add create_obs_over_exp_cooler() and CLI tool (sandbox)#564

Open
Phlya wants to merge 3 commits intomasterfrom
sandbox/obs-over-exp-cooler
Open

Add create_obs_over_exp_cooler() and CLI tool (sandbox)#564
Phlya wants to merge 3 commits intomasterfrom
sandbox/obs-over-exp-cooler

Conversation

@Phlya
Copy link
Member

@Phlya Phlya commented Mar 19, 2026

Summary

Add a function and standalone CLI tool that divides a cooler's contact matrix by expected values and saves the O/E (observed/expected) ratios to a new cooler file.

New files

cooltools/sandbox/obs_over_exp_cooler.py (extended)

  • create_obs_over_exp_cooler(): Streams pixel chunks from an input cooler, divides balanced values by cis/trans expected, and writes a new .cool file via cooler.create_cooler().
  • Supports pre-computed or on-the-fly expected computation
  • Cis-only mode (expected_trans_df=False) — trans pixels are written as NaN
  • Smoothed expected columns with automatic trans fallback to base column
  • Configurable chunksize for memory-efficient streaming

cooltools/sandbox/cli_obs_over_exp.py (new)

Standalone Click CLI wrapping the function. Key options:

  • --cis-expected / --trans-expected: Paths to pre-computed TSV files
  • --no-trans: Skip trans O/E computation
  • --smooth / --aggregate-smoothed: Smoothing when computing on the fly
  • --view: BED file with genomic regions
  • Auto-adjusts expected_value_col when smoothing is requested on the fly

Can be run as: python -m cooltools.sandbox.cli_obs_over_exp --help

tests/test_obs_over_exp_cooler.py (new)

16 tests organized in 3 classes:

  • TestCreateObsOverExpCooler (9 tests): API-level tests covering precomputed expected, cis-only, smoothed columns, on-the-fly computation, no-view fallback, trans fallback column, invalid column error, no-weights in output, and O/E values near 1
  • TestObsOverExpCLI (6 tests): CLI tests covering precomputed expected, cis-only with --no-trans, on-the-fly computation, --smooth auto-adjustment, --help, and missing output error
  • TestAPIvsCLIConsistency (1 test): Pixel-by-pixel comparison of API vs CLI output

Uses the fast 10 Mb-resolution test cooler (CN.mm9.10000kb.cool, ~38K pixels) with module-scoped pytest fixtures for shared expected computation. Full suite runs in ~14 seconds.

Usage example

from cooltools.sandbox.obs_over_exp_cooler import create_obs_over_exp_cooler
import cooler

clr = cooler.Cooler('data.cool')
create_obs_over_exp_cooler(
    clr, 'data_oe.cool',
    expected_cis_df=cis_expected,
    expected_trans_df=trans_expected,
    view_df=view_df,
    expected_value_col='balanced.avg.smoothed.agg',
)

Or via CLI:

python -m cooltools.sandbox.cli_obs_over_exp data.cool \
    --cis-expected cis_exp.tsv \
    --trans-expected trans_exp.tsv \
    --expected-value-col balanced.avg.smoothed.agg \
    -o data_oe.cool

Phlya added 2 commits March 19, 2026 11:53
Add a function and CLI tool that divides a cooler's contact matrix by
expected and saves the O/E ratios to a new cooler file.

New files:
- cooltools/sandbox/obs_over_exp_cooler.py: Extended with
  create_obs_over_exp_cooler() that streams pixel chunks, divides by
  cis/trans expected, and writes a new .cool file via cooler.create_cooler().
  Supports pre-computed or on-the-fly expected, smoothing, cis-only mode
  (expected_trans_df=False), and trans column fallback.
- cooltools/sandbox/cli_obs_over_exp.py: Standalone Click CLI wrapping the
  function. Options include --cis-expected, --trans-expected, --no-trans,
  --smooth, --aggregate-smoothed, --view, and auto-adjustment of
  expected_value_col when smoothing is computed on the fly.
- tests/test_obs_over_exp_cooler.py: 16 tests covering the Python API,
  CLI, and API-vs-CLI consistency. Uses the fast 10 Mb-resolution test
  cooler with module-scoped fixtures for shared expected computation.
@Phlya
Copy link
Member Author

Phlya commented Mar 19, 2026

Old tests are failing, not the new ones... And locally everything passes for me, not sure what's going on

- Add logging at function start showing cooler info, output path, and key params
- Add nproc parameter for parallel O/E computation via multiprocess.Pool.imap
- Add output_dtype parameter (float32/float64) to control storage precision
- Add mode parameter (a/w) for HDF5 write mode (default 'a' for mcool append)
- Update CLI with --output-dtype and --mode options
- Add tests: test_output_dtype (parametrized), test_parallel_matches_sequential
- Fix float dtype assertions to use np.issubdtype
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant