Skip to content

Remove Anomalyze and update READMEs#67

Open
MattsonCam wants to merge 3 commits into
mainfrom
remove_anomalyze
Open

Remove Anomalyze and update READMEs#67
MattsonCam wants to merge 3 commits into
mainfrom
remove_anomalyze

Conversation

@MattsonCam
Copy link
Copy Markdown
Member

@MattsonCam MattsonCam commented May 20, 2026

This pr updates and simplifies the project documentation so it better matches the current workflow and is easier for new contributors to follow. It also removes anomalyze code and outdated references, and adds clearer guidance on how to run key analysis steps.

Cameron Mattson added 2 commits May 20, 2026 10:59
- update root README to reflect current module structure and step-based `just` usage
- clarify that `2.evaluate_data` uses active class-balanced logistic regression workflows
- clean up and align module READMEs for `0.download_data`, `1.process_data`,
  `2.evaluate_data`, and `3.analyze_data`
- add missing README coverage for `0.5.quality_control`, `4.validate_data`,
  and `4.visualize_data`
- remove stale/anachronistic documentation language and verify no anomalyze
  references remain in markdown docs
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simple PR! I am requesting some documentation changes, as I found some discrepancies with where single-cell QC is performed. Happy to chat if you have any questions!

Comment thread 0.download_data/README.md Outdated
Comment thread 1.process_data/README.md Outdated
Comment thread 1.process_data/README.md
Comment thread 1.process_data/README.md Outdated
Comment thread 1.process_data/README.md Outdated
Comment thread 3.analyze_data/README.md

In this module, we perform multiple analyses on the predicted probability data to validate the phenotypic predictions for each treatment (e.g., compound, CRISPR, or ORF).
To compare treatments and the negative control groups, we perform KS tests.
In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).
In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments per dataset (e.g., compound, CRISPR, and ORF).

Wouldn't this be more accurate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by per dataset here? Aren't we only using the jump pilot dataset (cpg0000)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an assumption, I believe that compound, CRISPR and ORF are individual "datasets" or treated as such. Feel free to ignore or change the wording!

Comment thread 3.analyze_data/README.md Outdated
Comment thread 3.analyze_data/README.md Outdated
Comment thread README.md Outdated
Comment thread README.md
| :--- | :--- | :--- |
| [0.download_data](./0.download_data/) | Download JUMP-Target SQLite files and process them with [CytoTable](https://github.com/cytomining/CytoTable) | Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table. |
| [0.5.quality_control](./0.5.quality_control/) | Perform quality control on downloaded and processed data | Runs quality control scripts used between download and feature processing stages. |
| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quality control is performed here what I can see.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that the module should be removed from here? If so, I think I already removed it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |
| [1.process_data](./1.process_data/) | Process SQLite files | Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. |

I am recommending changing the language here to be more explicit.

@MattsonCam
Copy link
Copy Markdown
Member Author

Thanks for the first review @jenna-tomkinson ! I am requesting another review

Copy link
Copy Markdown
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing earlier comments, I have only left one more and replied to previous comments.

Comment thread 3.analyze_data/README.md

In this module, we perform multiple analyses on the predicted probability data to validate the phenotypic predictions for each treatment (e.g., compound, CRISPR, or ORF).
To compare treatments and the negative control groups, we perform KS tests.
In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an assumption, I believe that compound, CRISPR and ORF are individual "datasets" or treated as such. Feel free to ignore or change the wording!

Comment thread README.md
| :--- | :--- | :--- |
| [0.download_data](./0.download_data/) | Download JUMP-Target SQLite files and process them with [CytoTable](https://github.com/cytomining/CytoTable) | Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table. |
| [0.5.quality_control](./0.5.quality_control/) | Perform quality control on downloaded and processed data | Runs quality control scripts used between download and feature processing stages. |
| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. |
| [1.process_data](./1.process_data/) | Process SQLite files | Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. |

I am recommending changing the language here to be more explicit.

Comment thread README.md
Comment on lines +39 to +40
| [4.validate_data](./4.validate_data/) | Validate phenotype-level findings | Contains validation scripts for compound and phenotype-level enrichment analyses. |
| [4.visualize_data](./4.visualize_data/) | Visualize analysis outputs | Contains visualization scripts for downstream interpretation and reporting. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these both numbered with 4? Recommend in this PR or a future PR to change the numbers because at first glance I thought this was a copy paste error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants