z-aggregate

z-aggregate is a command-line tool for estimating transcription factor activities from transcriptomic profiles. It combines an expression matrix with a prior regulatory network and reports transcription factor activity scores and associated p-values for each cell or sample.

The method is intended for single-cell or bulk transcriptomic data where rows are observations and columns are genes.

Installation

Clone the repository:

git clone https://github.com/PathwayAndDataAnalysis/z-aggregate
cd z-aggregate

Install uv if it is not already available:

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows PowerShell:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install the project dependencies:

uv sync

Quick Start

Run z-aggregate with an expression dataset, a prior network, and an output directory:

uv run z-aggregate \
  --dataset ./dataset/example.h5ad \
  --prior-type collectri \
  --output ./results

A fuller example is:

uv run z-aggregate \
  --dataset ./dataset/TianKampmann2021_CRISPRi.h5ad \
  --prior-type collectri \
  --output ./results \
  --weight-type Uniform \
  --output-format both \
  --verbose

The dataset above can be downloaded from scPerturb:

mkdir -p dataset
wget "https://zenodo.org/records/13350497/files/TianKampmann2021_CRISPRi.h5ad?download=1" \
  -O dataset/TianKampmann2021_CRISPRi.h5ad

Command-Line Options

Option	Required	Default	Description
`-ds`, `--dataset`	Yes	-	Path to the expression dataset. Supported formats are `.h5ad`, `.csv`, `.tsv`, and `.txt`.
`-p`, `--prior-type`	Yes	-	Prior network to use. Use a named prior such as `causalpath-priors`, `collectri`, `dorothea`, or `ensemble-priors`, or provide a custom file path.
`-o`, `--output`	Yes	-	Directory where output files will be written.
`--min-targets`	No	`5`	Minimum number of target genes required for a transcription factor to be included.
`--preprocess`	No	Enabled	Apply standard preprocessing before running the method.
`--no-preprocess`	No	-	Skip preprocessing. Use this when the input data are already filtered and normalized.
`--min-genes`	No	`1000`	Minimum number of genes required per cell during preprocessing.
`--min-cells`	No	`10`	Minimum number of cells required per gene during preprocessing.
`--max-mt-pct`	No	`20.0`	Maximum mitochondrial read percentage allowed during preprocessing.
`--weight-type`	No	`Uniform`	Weighting strategy for prior-network edges.
`--output-format`	No	`both`	Output format: `tsv`, `csv`, `h5ad`, or `both`. With `both`, table files and an AnnData file are written.
`-v`, `--verbose`	No	Disabled	Print more detailed log messages.

Input Data

Expression Dataset

The expression dataset is passed with --dataset.

Supported formats:

.h5ad: AnnData object.
.csv: comma-separated matrix.
.tsv or .txt: tab-separated matrix.

For text files, the first column should contain cell or sample identifiers, and the remaining columns should be genes. The matrix should be organized as observations by genes.

Prior Network

The prior network is passed with --prior-type.

You may use a named prior network:

causalpath-priors
collectri
dorothea
ensemble-priors

You may also provide a path to a custom .csv, .tsv, or .txt file.

A prior network must contain transcription factor-target relationships. The standard columns are:

Column	Meaning
`source`	Transcription factor or regulator.
`interaction`	Direction of regulation. Positive values indicate activation; negative values indicate inhibition.
`target`	Target gene.
`weight`	Optional edge weight. Used when `--weight-type Existing` is selected.

Common alternative column names such as tf, regulator, gene, target_gene, mor, mode, direction, effect, and sign are also accepted.

Interaction values may be numeric, or may use terms such as upregulates-expression, downregulates-expression, activation, and inhibition.

Preprocessing

Preprocessing is enabled by default. It performs:

Cell filtering using --min-genes.
Gene filtering using --min-cells.
Mitochondrial-content filtering using --max-mt-pct.
Library-size normalization to a target sum of 10,000.
Log transformation.

Use --no-preprocess when the dataset has already been quality controlled, normalized, and transformed.

Weighting Strategies

Choose the edge-weighting method with --weight-type.

Value	Description
`Uniform`	Uses the sign of the prior interaction as the edge weight.
`Correlation`	Uses Spearman correlation between transcription factor expression and target-gene expression.
`Specificity`	Weights each target by `1 / number of TFs regulating that target`.
`NonzeroRate`	Weights each target by its detection rate in the dataset.
`Existing`	Uses the `weight` column from the prior network, if present.

Output Files

Output files are written to the directory given by --output.

For table output, z-aggregate writes:

<dataset>_<prior>_z_agg_scores.<format>
<dataset>_<prior>_z_agg_pvalues.<format>

For AnnData output, it writes:

<dataset>_z_aggregate_results.h5ad

The AnnData output contains the activity scores in .obsm["z_aggregate_scores"] and p-values in .obsm["z_aggregate_pvalues"].

Reproducing Paper Results

Instructions for reproducing the paper results are provided in reproduce/README.md.

The main reproduction notebooks are:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
reproduce		reproduce
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

z-aggregate

Installation

Quick Start

Command-Line Options

Input Data

Expression Dataset

Prior Network

Preprocessing

Weighting Strategies

Output Files

Reproducing Paper Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

z-aggregate

Installation

Quick Start

Command-Line Options

Input Data

Expression Dataset

Prior Network

Preprocessing

Weighting Strategies

Output Files

Reproducing Paper Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages