A computational neuroscience pipeline for investigating the relationship between brain activity and semantic features of visual stimuli using Representational Similarity Analysis (RSA) on the Natural Scenes Dataset (NSD).
This project analyzes how the brain represents visual information by comparing neural responses (fMRI data) with high-level semantic features extracted by Vision-Language Models (VLMs). The pipeline uses RSA to quantify similarities between brain representations and semantic feature spaces across different Regions of Interest (ROIs).
- Modular Design: Each analysis step is handled by dedicated Python modules
- GPU Acceleration: Leverages NVIDIA RAPIDS (cuML, cuPy) and mdscuda for high-performance computing
- Comprehensive Analysis: Covers both object-level and scene-level feature processing
- Automated Visualization: Generates structured plots and quantitative metrics
- Reproducible Environment: Includes Conda environment specification
- RDM Calculation: Processes fMRI beta-series data to compute Representational Dissimilarity Matrices (RDMs) for various ROIs
- VLM Feature Processing: Loads and structures high-level semantic features extracted from stimulus images
- Alignment & Visualization: Aligns brain data with VLM features using dimensionality reduction (MDS, t-SNE)
- Cluster Quality Analysis: Quantifies feature clustering quality using silhouette scores and other metrics
main.py- Main pipeline orchestratorconfig.py- Central configuration for paths, ROI definitions, and parametersdata_loader.py- Handles fMRI beta data and stimulus information loadingrsa_analysis.py- Computes Representational Dissimilarity Matricesfeature_processing.py- Processes VLM feature data from JSON outputsdim_reduction_and_viz.py- Dimensionality reduction and visualization functionsclustering_analysis.py- Clustering algorithms and quality metricsroi_utils.py- ROI mask handling utilities
main_analysis.py- Extended analysis pipelinemain_numerical_plots.py- Numerical feature visualizationrsa_model_analysis.py- RSA correlation analysis and plotting
run_hpc.sh- HPC job submission scriptanalysis_sub.sh- Analysis job submissioncalculate_embeddings.sh- Embedding calculation script
nsd_access/- Natural Scenes Dataset access utilitiesutils/- General utility functions for data processing
# Create the environment
conda env create -f environment.yml
# Activate the environment
conda activate rsa-vlm-envEdit config.py to set up your data paths:
NSD_DATA_ROOT: Path to NSD data directoryVLM_FEATURES_PATH: Path to VLM feature JSON filesRDM_OUTPUT_PATH: Output directory for analysis results
- Natural Scenes Dataset (NSD) - specifically nsddata and nsddata_betas
- VLM feature files in JSON format (e.g.,
subj01_complete_features.json)
# Run analysis for specific subject and ROI
python main.py <subject_id> <roi_name>
# Example
python main.py 1 "Primary_Visual_Cortex_V1(EarlyVisualCortex)"# Run comprehensive analysis
python main_analysis.py
# Generate numerical plots
python main_numerical_plots.py
# Perform RSA model analysis
python rsa_model_analysis.py# Submit job to HPC scheduler
sbatch run_hpc.shThe pipeline generates organized outputs in the RDM_OUTPUT_PATH directory:
output/
├── subj01/
│ ├── <ROI_NAME>_RDM.npy # Pre-calculated RDMs
│ ├── embeddings/
│ │ ├── mds/ # MDS embeddings
│ │ └── tsne/ # t-SNE embeddings
│ ├── scene_plots/ # Scene-level feature plots
│ │ ├── <ROI_NAME>/
│ │ │ ├── mds/
│ │ │ └── tsne/
│ └── object_plots/ # Object-level feature plots
│ └── <ROI_NAME>/
│ ├── <criterion>/
│ │ ├── mds/
│ │ └── tsne/
└── logs/ # Analysis logs
The pipeline analyzes multiple brain regions organized into functional groups:
- Early Visual Cortex: V1, V2, V3, V4
- Dorsal Stream: V3A, V3B, V6, V7, V6A, IPS1, FEF, LIPd, AIP
- MT+ Complex: V3CD, MST, MT/V5, V4t, FST
- Ventral Stream: V8, VVC, PIT, FFC, VMV1-3, LOC1-3, PH, RSC, PHA1-3
- Scene-level features: Overall image properties (aesthetic quality, scene type, etc.)
- Object-level features: Individual object properties (material, size, function, etc.)
- Saliency criteria: Different methods for object selection (largest area, highest saliency, etc.)
- Representational Similarity Analysis (RSA): Compares neural and feature similarity structures
- Dimensionality Reduction: MDS and t-SNE for visualization
- Clustering Analysis: Silhouette scores and other quality metrics
- Statistical Validation: Comprehensive statistical testing of results
- Python 3.10
- CUDA Toolkit 11.8
- RAPIDS cuML and cuPy for GPU acceleration
- NumPy, SciPy, Pandas
- Scikit-learn
- Matplotlib for visualization
- Nibabel for neuroimaging data
- nsd-access for Natural Scenes Dataset
- mdscuda for GPU-accelerated MDS
Key configuration options in config.py:
N_SUBJECTS: Number of subjects to process (default: 8)OBJECT_SELECTION_CRITERION: Method for object selectionROI_GROUPSandROI_GROUP_NAMES: Brain region definitionsLOG_LEVEL: Logging verbosity
If you use this pipeline in your research, please cite the relevant papers for:
- Natural Scenes Dataset (NSD)
- Vision-Language Model features
- Representational Similarity Analysis methods
This project is released under standard academic use terms. Please see individual component licenses for specific restrictions.