WayScience · MattsonCam · May 20, 2026 · May 20, 2026 · May 21, 2026
diff --git a/0.download_data/README.md b/0.download_data/README.md
@@ -1,11 +1,13 @@
-# Download JUMP-Target SQLite files from AWS
+# Download JUMP-Target SQLite Files from AWS
 
-In this module, we download the SQLite files from [AWS](https://cellpainting-gallery.s3.amazonaws.com/index.html#cpg0000-jump-pilot/source_4/workspace/backend/2020_11_04_CPJUMP1/) with [aws-cli](https://github.com/aws/aws-cli) on Aug 10, 2023 using instructions provided from [JUMP Cell Painting Datasets](https://github.com/jump-cellpainting/datasets).
-There are 51 plates from the pilot dataset (cpg0000), totalling 1.1 TB of storage from the SQLite files.
+In this module, we download SQLite files from [AWS](https://cellpainting-gallery.s3.amazonaws.com/index.html#cpg0000-jump-pilot/source_4/workspace/backend/2020_11_04_CPJUMP1/) with [aws-cli](https://github.com/aws/aws-cli).
+There are 51 plates from the pilot dataset (`cpg0000`), totaling about 1.1 TB of SQLite files.
 
-Firstly, we generate a manifest file in the [data folder](./data/) called [jump_dataset_location_manifest.csv](./data/jump_dataset_location_manifest.csv).
-Afterwards, we process each plate using [CytoTable](https://github.com/cytomining/CytoTable).
+First, we generate a manifest file in the [data folder](./data/) called [jump_dataset_location_manifest.csv](./data/jump_dataset_location_manifest.csv).
+Afterward, we process each plate using [CytoTable](https://github.com/cytomining/CytoTable).
 
-Optionally, to download only the SQLite plates, please use the [download_from_aws.sh](./download_from_aws.sh) file, which contains the bash script that will download the files from the paths in the manifest.
+The module entrypoint is [run.sh](./run.sh), which runs manifest generation, CytoTable plate processing, and image-download notebook execution.
 
-Please see the notes from the main [`README.md` on processing this step](../README.md#running-code-from-this-project).
+Optionally, to download only the SQLite plates, use [download_from_aws.sh](./download_from_aws.sh), which downloads files from paths in the manifest.
+
+See the main [`README.md` section on running code](../README.md#running-code-from-this-project) for step-level execution details.
diff --git a/1.process_data/README.md b/1.process_data/README.md
@@ -1,21 +1,22 @@
-# Merge, normalize, feature select, and aggregate single cells with pycytominer
+# Single-cell Quality Control, Merge, Normalize, Feature Select, and Aggregate Single Cells
 
-In this module, we perform four preprocessing steps on the SQLite files using [pycytominer](https://github.com/cytomining/pycytominer/tree/main):
+In this module, we perform five preprocessing steps on single-cell data generated from CytoTable outputs, using [pycytominer](https://github.com/cytomining/pycytominer/tree/main) for normalization, feature selection, and aggregation:
 
-1. Merge and annotate single cells from the SQLite file using the [pycytominer SingleCell class](https://github.com/cytomining/pycytominer/blob/main/pycytominer/cyto_utils/cells.py)
-2. [Normalize](https://github.com/cytomining/pycytominer/blob/main/pycytominer/normalize.py) the single cells using the negative controls (e.g., DMSO for compound treatment, no-target or target intergenic region sgRNAs for crispr treatment, and genes with weak signatures in orf treatment) as reference for the standard scalar method per plate.
-3. [Feature Select](https://github.com/cytomining/pycytominer/blob/main/pycytominer/feature_select.py) the single cell plate morphology data per plate by variance thresholding, correlation thresholding, and by filtering columns containing NaNs and columns specified in the blocklist.
-4. [Aggregate](https://github.com/cytomining/pycytominer/blob/main/pycytominer/feature_select.py) both the normalized and feature selected single-cell morphology data to the well level.
+1. Perform single-cell quality control (QC) after CytoTable and before annotation/merging to remove low-quality cells.
+2. Merge and annotate single-cell profiles from CytoTable outputs for downstream normalization and feature selection.
+3. [Normalize](https://github.com/cytomining/pycytominer/blob/main/pycytominer/normalize.py) single cells using negative controls (for example, DMSO for compounds, no-target or intergenic-targeting sgRNAs for CRISPR, and weak-signature genes for ORF) as reference populations for standard scaling per plate.
+4. [Feature select](https://github.com/cytomining/pycytominer/blob/main/pycytominer/feature_select.py) single-cell morphology data per plate using variance thresholding, correlation thresholding, and filtering columns containing NaNs or listed in the blocklist.
+5. Aggregate both normalized and feature-selected single-cell morphology data to the well level.
 
-## Run merging and normalization notebook
+## Run Single-cell Processing Pipeline
 
-To process the data, run the [process_data.sh](./process_data.sh) file which will convert the notebook into a python file and run it from terminal.
+To process the data, run [process_data.sh](./process_data.sh), which converts notebooks to Python scripts in `nbconverted/` and runs QC-aware single-cell processing through merging/annotation, normalization, and feature selection; then run [aggregate_sc_data.sh](./aggregate_sc_data.sh) for well-level aggregation.
 
 ```bash
 # Make sure you are in the 1.process_data directory
 cd 1.process_data
-# Process the data with steps 1-3
+# Process the data with steps 1-4
 ./process_data.sh
-# Process the data with step 4
+# Process the data with step 5
 ./aggregate_sc_data.sh
 ```
diff --git a/2.evaluate_data/README.md b/2.evaluate_data/README.md
@@ -1,18 +1,19 @@
-# Apply phenotypic profiling model to JUMP data
+# Apply phenotypic profiling models to JUMP data
 
-In this module, we generate single-cell probabilities for each of the 15 phenotypic classes by applying the [phenotypic profiling model](https://github.com/WayScience/phenotypic_profiling_model).
-There are two model types, final and shuffled baseline.
-The shuffled baseline model trains using randomly shuffled single-cell features. 
-We output one file for all plates that contains phenotypic probabilities and relevant metadata for all of the single-cells.
-The files we output are in `parquet` format.
+This module generates single-cell probabilities for 15 phenotypic classes using trained morphology models.
+Current workflows focus on class-balanced logistic regression prediction runs and related per-plate processing.
 
-## Run the prediction notebook
+Outputs are written as `parquet` files with phenotypic probabilities and relevant metadata.
 
-To generate the probabilities for each single cell, run the [evaluate_data.sh](./evaluate_data.sh) file which will convert the notebook into a python file and run it from terminal.
+## Run predictions
+
+To generate probabilities, run [evaluate_data.sh](./evaluate_data.sh):
 
 ```bash
-# Make sure you are in the 2.evaluate_data directory
+# make sure you are in the 2.evaluate_data directory
 cd 2.evaluate_data
-# Run the notebook as a python script
+# run prediction pipeline
 source evaluate_data.sh
 ```
+
+`evaluate_data.sh` executes the active prediction workflow in `class_balanced_log_reg_areashape_predict_sc_probabilities/`.
diff --git a/2.evaluate_data/compute_sc_anomalyze_data/compute_aggregate_treatment_anomaly_data.ipynb b/2.evaluate_data/compute_sc_anomalyze_data/compute_aggregate_treatment_anomaly_data.ipynb