Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Pull request overview
This PR updates the data download/preprocessing utilities and associated notebooks to support downloading and annotating CPJUMP experimental + MOA metadata, while removing unused helper code in preparation for splitting notebooks into a separate analysis repository.
Changes:
- Expanded
utils/io_utils.pywith improved module docs and a new helper to load + concatenate profile parquet files. - Simplified/cleaned
utils/data_utils.py(removing unused signature-grouping helpers) and added feature-modality utilities (split_data,remove_feature_prefixes). - Updated
notebooks/0.download-data/*notebooks (and nbconverted scripts) to use a newdl-configs.yamland to generate/consume compound+MOA metadata.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| utils/validator.py | Removes an unused clustering param-grid validator module. |
| utils/io_utils.py | Adds module docs + load_and_concat_profiles; adjusts formatting/error messaging. |
| utils/data_utils.py | Cleans up unused functions; improves docs; adds modality/prefix helpers. |
| notebooks/0.download-data/nbconverted/1.download-data.py | Uses dl-configs.yaml; adds CPJUMP compound+MOA merge/export steps; doc edits. |
| notebooks/0.download-data/nbconverted/2.preprocessing.py | Switches MOA annotation to use generated compound metadata TSV; doc edits. |
| notebooks/0.download-data/nbconverted/3.subset-jump-controls.py | Updates paths/filenames for control subsets. |
| notebooks/0.download-data/dl-configs.yaml | Adds dedicated download configuration for the download notebook(s). |
| notebooks/0.download-data/1.download-data.ipynb | Notebook equivalent of the nbconverted updates + new compound/MOA section. |
| notebooks/0.download-data/2.preprocessing.ipynb | Notebook equivalent of preprocessing updates (compound metadata TSV usage). |
| notebooks/0.download-data/3.subset-jump-controls.ipynb | Notebook equivalent of control-subsetting path/filename updates. |
| .pre-commit-config.yaml | Bumps ruff-pre-commit revision. |
Comments suppressed due to low confidence (2)
notebooks/0.download-data/nbconverted/3.subset-jump-controls.py:122
- This notebook header says it subsets controls from the CPJUMP1 CRISPR dataset, but the code now loads
cpjump1_compound_concat_profiles.parquetand writescpjump1_compound_negcon_...outputs. Update the top-level notebook description (and any related variable names/text) to match the compound dataset being processed to avoid confusion for readers.
cpjump1_data_path = (
profiles_dir / "cpjump1" / "cpjump1_compound_concat_profiles.parquet"
).resolve(strict=True)
notebooks/0.download-data/3.subset-jump-controls.ipynb:151
- The notebook introduction describes subsetting controls from the CPJUMP1 CRISPR dataset, but this code cell is now pointing at
cpjump1_compound_concat_profiles.parquet. Please update the introductory text to reflect the compound dataset (or adjust the code back to CRISPR) so the narrative matches the executed workflow.
"# setting directory where all the single-cell profiles are stored\n",
"data_dir = pathlib.Path.cwd() / \"data\"\n",
"profiles_dir = (data_dir / \"sc-profiles\").resolve(strict=True)\n",
"\n",
"cpjump1_data_path = (\n",
" profiles_dir / \"cpjump1\" / \"cpjump1_compound_concat_profiles.parquet\"\n",
").resolve(strict=True)\n",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Given the number of changes made throughout the analysis notebooks, this PR updates the downloads module to include functions for downloading the CPJUMP experimental and MOA data, along with several improvements to the module documentation.
We have also removed functions that are no longer used in the notebooks and updated the documentation accordingly.
These changes are part of the preparation for separating the notebooks into a dedicated analysis repository while transitioning this repository into a focused software package.