Protein-network proximity analysis for Parkinson's disease candidate prioritization using STRING and GTEx substantia nigra expression.
- A canonical, documented pipeline instead of multiple competing script paths.
- Offline seed mapping from local STRING alias files.
- A tracked analysis rerun path that works from the files stored in Git.
- A full rebuild path for the main network preprocessing if you also provide the raw STRING detailed network download locally.
- SHA256 checksums for the tracked reproducibility-critical inputs and outputs.
conda env create -f environment.yml
conda activate pd-netprox
python scripts/verify_repository.py
python scripts/run_pipeline.pyIf you also have the raw STRING detailed network file locally, the pipeline will rebuild data/string/ppi_edges.csv. Otherwise, reuses the tracked data/string/ppi_edges.csv :(
To refresh the checksum file after a fresh run:
python scripts/run_pipeline.py --update-checksumsscripts/extract_gtex_sn.pyscripts/map_seeds_to_ensp.pyscripts/process_string.pyif the raw STRING detailed network is availablescripts/build_graph.pyscripts/compute_proximity.pyscripts/prepare_top20.pyscripts/scatter_plot.pyscripts/plot_ppi_subgraph.py
Tracked source or reduced-input files:
data/expression/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_median_tpm.gct.gzdata/expression/gtex_sn_expression.tsvdata/seeds/pd_monogenic.txtdata/seeds/pd_monogenic_ensp.txtdata/seeds/pd_monogenic_seed_mapping.tsvdata/string/9606.protein.aliases.v11.0.txt.gzdata/string/9606.protein.info.v11.0.txt.gzdata/string/ppi_edges.csv
Tracked canonical outputs:
results/ranked_candidates.tsvresults/top20_candidates.tsvresults/top20_genes_for_enrichment.txtfigures/proximity_vs_tpm_corr.pngfigures/ppi_subgraph_top20.png
Not tracked in normal Git history:
- The raw STRING detailed interaction file because it exceeds GitHub's normal file-size limits.
- Local-only large or unused files such as uncompressed alias tables, intermediate graph pickles, and archive material in
data/NOT USED/.
environment.yml: Small runtime environment for the canonical pipeline.environment.exact.yml: Full export of the Windows conda environment used for this repository snapshot.requirements.exact.txt: Exactpip freeze --alloutput from the same environment.
Repository validation:
python scripts/verify_repository.pyRequires the raw STRING detailed network to be present too:
python scripts/verify_repository.py --full-rebuildscripts/: Canonical pipeline scripts only.scripts/legacy/: Archived exploratory or superseded scripts kept for provenance.data/: Tracked inputs, reduced inputs, and documented optional raw inputs.results/: Canonical tabular outputs tracked in Git.figures/: Canonical figures tracked in Git.paper/: Writing and manuscript material.
Reach out with questions