Remove Anomalyze and update READMEs by MattsonCam · Pull Request #67 · WayScience/JUMP-single-cell

MattsonCam · 2026-05-20T17:19:44Z

This pr updates and simplifies the project documentation so it better matches the current workflow and is easier for new contributors to follow. It also removes anomalyze code and outdated references, and adds clearer guidance on how to run key analysis steps.

it is in another repo

- update root README to reflect current module structure and step-based `just` usage - clarify that `2.evaluate_data` uses active class-balanced logistic regression workflows - clean up and align module READMEs for `0.download_data`, `1.process_data`, `2.evaluate_data`, and `3.analyze_data` - add missing README coverage for `0.5.quality_control`, `4.validate_data`, and `4.visualize_data` - remove stale/anachronistic documentation language and verify no anomalyze references remain in markdown docs

review-notebook-app · 2026-05-20T17:19:50Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jenna-tomkinson

Nice simple PR! I am requesting some documentation changes, as I found some discrepancies with where single-cell QC is performed. Happy to chat if you have any questions!

jenna-tomkinson · 2026-05-20T21:24:46Z


-In this module, we perform multiple analyses on the predicted probability data to validate the phenotypic predictions for each treatment (e.g., compound, CRISPR, or ORF).
-To compare treatments and the negative control groups, we perform KS tests.
+In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).


Suggested change

In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).

In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments per dataset (e.g., compound, CRISPR, and ORF).

Wouldn't this be more accurate?

What do you mean by per dataset here? Aren't we only using the jump pilot dataset (cpg0000)?

This was an assumption, I believe that compound, CRISPR and ORF are individual "datasets" or treated as such. Feel free to ignore or change the wording!

jenna-tomkinson · 2026-05-20T21:36:20Z

+| :--- | :--- | :--- |
+| [0.download_data](./0.download_data/) | Download JUMP-Target SQLite files and process them with [CytoTable](https://github.com/cytomining/CytoTable) | Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table. |
+| [0.5.quality_control](./0.5.quality_control/) | Perform quality control on downloaded and processed data | Runs quality control scripts used between download and feature processing stages. |
+| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |


Quality control is performed here what I can see.

Are you saying that the module should be removed from here? If so, I think I already removed it

Suggested change

| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |

| [1.process_data](./1.process_data/) | Process SQLite files | Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. |

I am recommending changing the language here to be more explicit.

MattsonCam · 2026-05-21T17:21:58Z

Thanks for the first review @jenna-tomkinson ! I am requesting another review

jenna-tomkinson

LGTM! Thanks for addressing earlier comments, I have only left one more and replied to previous comments.

jenna-tomkinson · 2026-05-26T16:23:09Z


-In this module, we perform multiple analyses on the predicted probability data to validate the phenotypic predictions for each treatment (e.g., compound, CRISPR, or ORF).
-To compare treatments and the negative control groups, we perform KS tests.
+In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).


This was an assumption, I believe that compound, CRISPR and ORF are individual "datasets" or treated as such. Feel free to ignore or change the wording!

jenna-tomkinson · 2026-05-26T16:24:29Z

+| :--- | :--- | :--- |
+| [0.download_data](./0.download_data/) | Download JUMP-Target SQLite files and process them with [CytoTable](https://github.com/cytomining/CytoTable) | Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table. |
+| [0.5.quality_control](./0.5.quality_control/) | Perform quality control on downloaded and processed data | Runs quality control scripts used between download and feature processing stages. |
+| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |


Suggested change

| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |

| [1.process_data](./1.process_data/) | Process SQLite files | Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. |

I am recommending changing the language here to be more explicit.

jenna-tomkinson · 2026-05-26T16:25:34Z

+| [4.validate_data](./4.validate_data/) | Validate phenotype-level findings | Contains validation scripts for compound and phenotype-level enrichment analyses. |
+| [4.visualize_data](./4.visualize_data/) | Visualize analysis outputs | Contains visualization scripts for downstream interpretation and reporting. |


Why are these both numbered with 4? Recommend in this PR or a future PR to change the numbers because at first glance I thought this was a copy paste error.

Cameron Mattson added 2 commits May 20, 2026 10:59

Removed anomalyze portion from this repo, because

4804c33

it is in another repo

MattsonCam requested a review from jenna-tomkinson May 20, 2026 17:34

jenna-tomkinson requested changes May 20, 2026

View reviewed changes

MattsonCam requested a review from jenna-tomkinson May 21, 2026 17:22

Addressed review comments about READMEs

79173ac

jenna-tomkinson approved these changes May 26, 2026

View reviewed changes

	In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).
	In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments per dataset (e.g., compound, CRISPR, and ORF).

	\| [1.process_data](./1.process_data/) \| Process SQLite files \| Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. \|
	\| [1.process_data](./1.process_data/) \| Process SQLite files \| Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. \|

		\| [4.validate_data](./4.validate_data/) \| Validate phenotype-level findings \| Contains validation scripts for compound and phenotype-level enrichment analyses. \|
		\| [4.visualize_data](./4.visualize_data/) \| Visualize analysis outputs \| Contains visualization scripts for downstream interpretation and reporting. \|

Conversation

MattsonCam commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 20, 2026

Uh oh!

jenna-tomkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MattsonCam commented May 21, 2026

Uh oh!

jenna-tomkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MattsonCam commented May 20, 2026 •

edited

Loading