RWE Health Analytics Platform

A modular analytics framework for working with Real World Evidence (RWE) and electronic health records (EHR).
The project is structured as a realistic healthcare data pipeline: from synthetic data generation to ingestion, cohort construction, exploratory analysis, and the foundation for downstream modeling (risk prediction, survival analysis, and causal inference).

Business / Clinical Objective

This project simulates a realistic Real-World Evidence (RWE) workflow using synthetic EHR data to demonstrate how patient-level healthcare data can be transformed into clinically meaningful insights and modelling-ready datasets. It is designed to reflect use cases relevant to HEOR, clinical research, digital health, and observational analytics

Use Cases / Why it matters

Cohort construction for observational studies

Risk stratification and outcome prediction

Survival analysis for time-to-event endpoints

Treatment pattern and medication utilization analysis

Causal inference and comparative effectiveness research

Synthetic data prototyping for privacy-preserving healthcare analytics

Potential Digital Health Applications

The analytical workflows implemented in this project are relevant for several digital health and precision medicine use cases, including:

• digital biomarker discovery • remote patient monitoring • clinical risk stratification • early disease detection • decision support systems

These approaches are particularly relevant for chronic disease management contexts such as diabetes, cardiovascular disease and metabolic disorders.

1. Overview

This repository provides a reproducible environment for analysing longitudinal, patient-level healthcare data.
It follows principles commonly used in pharmaceutical R&D, HEOR, and clinical analytics:

Structured data ingestion and validation
Cohort definition logic
Domain-aware exploratory analysis
Modular organisation under src/
Reproducible workflows using notebooks

The project currently uses synthetic EHR-style data aligned with realistic clinical patterns, but the architecture supports real datasets with minimal adaptation.

2. Tech Stack

Core Languages & Tools

Machine Learning & Modeling

Causal Inference & Survival Analysis

Visualization

Apps, Data & Infra

3. Project Structure

rwe-health-analytics/
│
├── data/
│   ├── raw/                     # Synthetic EHR datasets
│   └── processed/               # Cleaned datasets (future)
│
├── notebooks/
│   └── 01_exploraroy.analysis.ipynb
│
├── src/
│   └── rwe_health_analytics/
│       ├── data/
│       │   ├── data_loader.py
│       │   └── data_generation/
│       │       └── synthetic_data_generator.py
│       ├── models/              # Survival / ML / causal (future)
│       ├── evaluation/          # Metrics and validation (future)
│       └── visualization/       # Plotting utilities (future)
│
├── tests/                       # Test suite
├── docs/                        # Methodology and technical notes
├── requirements.txt
└── setup.py

4. Data Flow

The system follows a structured, modular lifecycle:

1. Data Generation

Synthetic patient-level data are created using clinically-inspired rules and realistic statistical distributions.

2. Data Ingestion

HealthcareDataLoader reads and validates the raw tables, offering:

loading by domain (patients, diagnoses, labs…)
patient-level queries
data quality checks
basic cohort creation logic

3. Exploratory Analysis

Jupyter notebooks provide:

demographic summaries
comorbidity patterns
laboratory distributions
medication behaviour
longitudinal event timelines

4. Modelling Layer (future)

The architecture is prepared for:

survival analysis
risk prediction
clustering
causal inference pipelines

Maintaining separation between data → logic → analysis.

5. Installation

git clone https://github.com/Finarosalina/rwe-health-analytics.git
cd rwe-health-analytics

Create environment:

python -m venv venv

Activate:

Windows

venv\Scripts\activate

macOS/Linux

source venv/bin/activate

Install dependencies:

pip install -r requirements.txt
pip install -e .

6. Synthetic Data Generation

Run:

python src/rwe_health_analytics/data/data_generation/synthetic_data_generator.py

This generates:

data/raw/
    demographics.csv
    diagnoses.csv
    lab_results.csv
    medications.csv
    outcomes.csv

Dataset Description

Dataset	Description
Demographics	Age, sex, race, BMI, smoking, insurance
Diagnoses	ICD-like codes, visit types, chronicity
Laboratories	Lab measures with normal/abnormal ranges
Medications	Prescriptions, dose, duration, specialty
Outcomes	Hospitalizations, ER visits, mortality

Data are generated using:

normal, log-normal, Poisson, and multinomial distributions
disease-driven physiological changes
comorbidity-based risk modeling
temporal sequencing for visits, prescriptions, labs and outcomes

This enables EDA, ML, causal inference and survival analysis without PHI/PII.

7. Exploratory Analysis

Launch notebooks:

jupyter notebook

Notebook included:

notebooks/01_exploratory_analysis.ipynb

The analysis covers:

demographic distributions
comorbidity burden
diagnosis patterns
medication dynamics
laboratory value distributions
outcome timelines

8. Extensibility

The project is structured as a Python package to support clean expansion.

Future modules may include:

Cox PH models
Random Survival Forests
Gradient-boosted risk models
Propensity score pipelines
Streamlit dashboard
Docker deployment

9. Testing

pytest
pytest --cov=src

10. License

MIT License.

11. Author

Maria Pais Fajín
GitHub: https://github.com/Finarosalina
LinkedIn: https://linkedin.com/in/maria-pais-fajin

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.history		.history
notebooks		notebooks
src/rwe_health_analytics		src/rwe_health_analytics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
project_setup.sh		project_setup.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RWE Health Analytics Platform

1. Overview

2. Tech Stack

Core Languages & Tools

Machine Learning & Modeling

Causal Inference & Survival Analysis

Visualization

Apps, Data & Infra

3. Project Structure

4. Data Flow

1. Data Generation

2. Data Ingestion

3. Exploratory Analysis

4. Modelling Layer (future)

5. Installation

6. Synthetic Data Generation

Dataset Description

7. Exploratory Analysis

8. Extensibility

9. Testing

10. License

11. Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RWE Health Analytics Platform

1. Overview

2. Tech Stack

Core Languages & Tools

Machine Learning & Modeling

Causal Inference & Survival Analysis

Visualization

Apps, Data & Infra

3. Project Structure

4. Data Flow

1. Data Generation

2. Data Ingestion

3. Exploratory Analysis

4. Modelling Layer (future)

5. Installation

6. Synthetic Data Generation

Dataset Description

7. Exploratory Analysis

8. Extensibility

9. Testing

10. License

11. Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages