This repository hosts a comprehensive computational framework for analyzing multi-omics data from Kidney Renal Clear Cell Carcinoma (KIRC) and Lung Squamous Cell Carcinoma (LUSC).
The project integrates Clinical data, Gene Expression (GE), Copy Number Variation (CNV), and Protein levels (RPPA) to build predictive survival models and identify key genomic biomarkers using statistical and machine learning techniques.
- Objective: Predict patient survival outcomes based on integrated multi-omics features.
- Methods:
- Preprocessing: Imputation and normalization of clinical and genomic data.
- Feature Selection: Using LASSO (L1 Regularization) to identify the most relevant prognostic markers.
- Modeling: Building Cox Proportional Hazards models and Random Survival Forests.
- Metric: Evaluated using the Concordance Index (C-Index).
- Objective: Identify significantly up-regulated and down-regulated genes between tumor and normal tissues.
-
Visualization: Generated Volcano Plots to visualize statistical significance (
$-\log_{10} P$ ) vs. magnitude of change ($\log_2 Fold Change$ ). - Top Hits: Extracted top 5 genes based on Fold Change and P-value for biological interpretation.
- Objective: Explore linear relationships between specific gene pairs and clinical variables.
- Analysis: Simple and Multiple Linear Regression to understand gene-gene interactions.
- Clone the repository:
git clone [https://github.com/mariamashraf731/Multi-Omics-Cancer-Survival.git](https://github.com/mariamashraf731/Multi-Omics-Cancer-Survival.git)
- Install Requirements:
pip install -r requirements.txt
- Run Survival Pipeline:
python src/survival/train_model.py
- Generate Plots:
python src/plotting/volcano_plot.py
For detailed methodology and biological interpretation, refer to the Final Project Report.