Remove Anomalyze and update READMEs#67
Conversation
it is in another repo
- update root README to reflect current module structure and step-based `just` usage - clarify that `2.evaluate_data` uses active class-balanced logistic regression workflows - clean up and align module READMEs for `0.download_data`, `1.process_data`, `2.evaluate_data`, and `3.analyze_data` - add missing README coverage for `0.5.quality_control`, `4.validate_data`, and `4.visualize_data` - remove stale/anachronistic documentation language and verify no anomalyze references remain in markdown docs
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
jenna-tomkinson
left a comment
There was a problem hiding this comment.
Nice simple PR! I am requesting some documentation changes, as I found some discrepancies with where single-cell QC is performed. Happy to chat if you have any questions!
|
|
||
| In this module, we perform multiple analyses on the predicted probability data to validate the phenotypic predictions for each treatment (e.g., compound, CRISPR, or ORF). | ||
| To compare treatments and the negative control groups, we perform KS tests. | ||
| In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF). |
There was a problem hiding this comment.
| In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF). | |
| In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments per dataset (e.g., compound, CRISPR, and ORF). |
Wouldn't this be more accurate?
There was a problem hiding this comment.
What do you mean by per dataset here? Aren't we only using the jump pilot dataset (cpg0000)?
There was a problem hiding this comment.
This was an assumption, I believe that compound, CRISPR and ORF are individual "datasets" or treated as such. Feel free to ignore or change the wording!
| | :--- | :--- | :--- | | ||
| | [0.download_data](./0.download_data/) | Download JUMP-Target SQLite files and process them with [CytoTable](https://github.com/cytomining/CytoTable) | Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table. | | ||
| | [0.5.quality_control](./0.5.quality_control/) | Perform quality control on downloaded and processed data | Runs quality control scripts used between download and feature processing stages. | | ||
| | [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. | |
There was a problem hiding this comment.
Quality control is performed here what I can see.
There was a problem hiding this comment.
Are you saying that the module should be removed from here? If so, I think I already removed it
There was a problem hiding this comment.
| | [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. | | |
| | [1.process_data](./1.process_data/) | Process SQLite files | Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. | |
I am recommending changing the language here to be more explicit.
|
Thanks for the first review @jenna-tomkinson ! I am requesting another review |
jenna-tomkinson
left a comment
There was a problem hiding this comment.
LGTM! Thanks for addressing earlier comments, I have only left one more and replied to previous comments.
|
|
||
| In this module, we perform multiple analyses on the predicted probability data to validate the phenotypic predictions for each treatment (e.g., compound, CRISPR, or ORF). | ||
| To compare treatments and the negative control groups, we perform KS tests. | ||
| In this module, we perform analyses on predicted probability data to evaluate phenotypic behavior across treatments (for example, compound, CRISPR, and ORF). |
There was a problem hiding this comment.
This was an assumption, I believe that compound, CRISPR and ORF are individual "datasets" or treated as such. Feel free to ignore or change the wording!
| | :--- | :--- | :--- | | ||
| | [0.download_data](./0.download_data/) | Download JUMP-Target SQLite files and process them with [CytoTable](https://github.com/cytomining/CytoTable) | Downloads CellProfiler SQLite outputs for 51 plates from AWS and processes them into Parquet files that combine compartment and image metadata in one table. | | ||
| | [0.5.quality_control](./0.5.quality_control/) | Perform quality control on downloaded and processed data | Runs quality control scripts used between download and feature processing stages. | | ||
| | [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. | |
There was a problem hiding this comment.
| | [1.process_data](./1.process_data/) | Process SQLite files | Uses pycytominer on SQLite outputs to merge single cells, normalize features, and produce downstream-ready data. | | |
| | [1.process_data](./1.process_data/) | Process SQLite files | Uses CytoTable on SQLite outputs to merge single cells, coSMicQC for single-cell filtering, and pycytominer to normalize features, and produce downstream-ready data. | |
I am recommending changing the language here to be more explicit.
| | [4.validate_data](./4.validate_data/) | Validate phenotype-level findings | Contains validation scripts for compound and phenotype-level enrichment analyses. | | ||
| | [4.visualize_data](./4.visualize_data/) | Visualize analysis outputs | Contains visualization scripts for downstream interpretation and reporting. | |
There was a problem hiding this comment.
Why are these both numbered with 4? Recommend in this PR or a future PR to change the numbers because at first glance I thought this was a copy paste error.
This pr updates and simplifies the project documentation so it better matches the current workflow and is easier for new contributors to follow. It also removes anomalyze code and outdated references, and adds clearer guidance on how to run key analysis steps.