Skip to content

Update downloads module #80

Open
axiomcura wants to merge 17 commits intoWayScience:mainfrom
axiomcura:update-download-module
Open

Update downloads module #80
axiomcura wants to merge 17 commits intoWayScience:mainfrom
axiomcura:update-download-module

Conversation

@axiomcura
Copy link
Copy Markdown
Member

Given the number of changes made throughout the analysis notebooks, this PR updates the downloads module to include functions for downloading the CPJUMP experimental and MOA data, along with several improvements to the module documentation.

We have also removed functions that are no longer used in the notebooks and updated the documentation accordingly.

These changes are part of the preparation for separating the notebooks into a dedicated analysis repository while transitioning this repository into a focused software package.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the data download/preprocessing utilities and associated notebooks to support downloading and annotating CPJUMP experimental + MOA metadata, while removing unused helper code in preparation for splitting notebooks into a separate analysis repository.

Changes:

  • Expanded utils/io_utils.py with improved module docs and a new helper to load + concatenate profile parquet files.
  • Simplified/cleaned utils/data_utils.py (removing unused signature-grouping helpers) and added feature-modality utilities (split_data, remove_feature_prefixes).
  • Updated notebooks/0.download-data/* notebooks (and nbconverted scripts) to use a new dl-configs.yaml and to generate/consume compound+MOA metadata.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
utils/validator.py Removes an unused clustering param-grid validator module.
utils/io_utils.py Adds module docs + load_and_concat_profiles; adjusts formatting/error messaging.
utils/data_utils.py Cleans up unused functions; improves docs; adds modality/prefix helpers.
notebooks/0.download-data/nbconverted/1.download-data.py Uses dl-configs.yaml; adds CPJUMP compound+MOA merge/export steps; doc edits.
notebooks/0.download-data/nbconverted/2.preprocessing.py Switches MOA annotation to use generated compound metadata TSV; doc edits.
notebooks/0.download-data/nbconverted/3.subset-jump-controls.py Updates paths/filenames for control subsets.
notebooks/0.download-data/dl-configs.yaml Adds dedicated download configuration for the download notebook(s).
notebooks/0.download-data/1.download-data.ipynb Notebook equivalent of the nbconverted updates + new compound/MOA section.
notebooks/0.download-data/2.preprocessing.ipynb Notebook equivalent of preprocessing updates (compound metadata TSV usage).
notebooks/0.download-data/3.subset-jump-controls.ipynb Notebook equivalent of control-subsetting path/filename updates.
.pre-commit-config.yaml Bumps ruff-pre-commit revision.
Comments suppressed due to low confidence (2)

notebooks/0.download-data/nbconverted/3.subset-jump-controls.py:122

  • This notebook header says it subsets controls from the CPJUMP1 CRISPR dataset, but the code now loads cpjump1_compound_concat_profiles.parquet and writes cpjump1_compound_negcon_... outputs. Update the top-level notebook description (and any related variable names/text) to match the compound dataset being processed to avoid confusion for readers.
cpjump1_data_path = (
    profiles_dir / "cpjump1" / "cpjump1_compound_concat_profiles.parquet"
).resolve(strict=True)

notebooks/0.download-data/3.subset-jump-controls.ipynb:151

  • The notebook introduction describes subsetting controls from the CPJUMP1 CRISPR dataset, but this code cell is now pointing at cpjump1_compound_concat_profiles.parquet. Please update the introductory text to reflect the compound dataset (or adjust the code back to CRISPR) so the narrative matches the executed workflow.
    "# setting directory where all the single-cell profiles are stored\n",
    "data_dir = pathlib.Path.cwd() / \"data\"\n",
    "profiles_dir = (data_dir / \"sc-profiles\").resolve(strict=True)\n",
    "\n",
    "cpjump1_data_path = (\n",
    "    profiles_dir / \"cpjump1\" / \"cpjump1_compound_concat_profiles.parquet\"\n",
    ").resolve(strict=True)\n",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

axiomcura and others added 9 commits March 7, 2026 22:53
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@axiomcura axiomcura assigned wli51 and unassigned wli51 Mar 20, 2026
@axiomcura axiomcura requested a review from wli51 March 20, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants