Add create_obs_over_exp_cooler() and CLI tool (sandbox)#564
Open
Add create_obs_over_exp_cooler() and CLI tool (sandbox)#564
Conversation
Add a function and CLI tool that divides a cooler's contact matrix by expected and saves the O/E ratios to a new cooler file. New files: - cooltools/sandbox/obs_over_exp_cooler.py: Extended with create_obs_over_exp_cooler() that streams pixel chunks, divides by cis/trans expected, and writes a new .cool file via cooler.create_cooler(). Supports pre-computed or on-the-fly expected, smoothing, cis-only mode (expected_trans_df=False), and trans column fallback. - cooltools/sandbox/cli_obs_over_exp.py: Standalone Click CLI wrapping the function. Options include --cis-expected, --trans-expected, --no-trans, --smooth, --aggregate-smoothed, --view, and auto-adjustment of expected_value_col when smoothing is computed on the fly. - tests/test_obs_over_exp_cooler.py: 16 tests covering the Python API, CLI, and API-vs-CLI consistency. Uses the fast 10 Mb-resolution test cooler with module-scoped fixtures for shared expected computation.
Member
Author
|
Old tests are failing, not the new ones... And locally everything passes for me, not sure what's going on |
- Add logging at function start showing cooler info, output path, and key params - Add nproc parameter for parallel O/E computation via multiprocess.Pool.imap - Add output_dtype parameter (float32/float64) to control storage precision - Add mode parameter (a/w) for HDF5 write mode (default 'a' for mcool append) - Update CLI with --output-dtype and --mode options - Add tests: test_output_dtype (parametrized), test_parallel_matches_sequential - Fix float dtype assertions to use np.issubdtype
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a function and standalone CLI tool that divides a cooler's contact matrix by expected values and saves the O/E (observed/expected) ratios to a new cooler file.
New files
cooltools/sandbox/obs_over_exp_cooler.py(extended)create_obs_over_exp_cooler(): Streams pixel chunks from an input cooler, divides balanced values by cis/trans expected, and writes a new.coolfile viacooler.create_cooler().expected_trans_df=False) — trans pixels are written as NaNchunksizefor memory-efficient streamingcooltools/sandbox/cli_obs_over_exp.py(new)Standalone Click CLI wrapping the function. Key options:
--cis-expected/--trans-expected: Paths to pre-computed TSV files--no-trans: Skip trans O/E computation--smooth/--aggregate-smoothed: Smoothing when computing on the fly--view: BED file with genomic regionsexpected_value_colwhen smoothing is requested on the flyCan be run as:
python -m cooltools.sandbox.cli_obs_over_exp --helptests/test_obs_over_exp_cooler.py(new)16 tests organized in 3 classes:
--no-trans, on-the-fly computation,--smoothauto-adjustment,--help, and missing output errorUses the fast 10 Mb-resolution test cooler (
CN.mm9.10000kb.cool, ~38K pixels) with module-scoped pytest fixtures for shared expected computation. Full suite runs in ~14 seconds.Usage example
Or via CLI:
python -m cooltools.sandbox.cli_obs_over_exp data.cool \ --cis-expected cis_exp.tsv \ --trans-expected trans_exp.tsv \ --expected-value-col balanced.avg.smoothed.agg \ -o data_oe.cool