Skip to content

saleeta/VIsual-stream-interpretation

Repository files navigation

Brain Representational Analysis Pipeline

A computational neuroscience pipeline for investigating the relationship between brain activity and semantic features of visual stimuli using Representational Similarity Analysis (RSA) on the Natural Scenes Dataset (NSD).

Overview

This project analyzes how the brain represents visual information by comparing neural responses (fMRI data) with high-level semantic features extracted by Vision-Language Models (VLMs). The pipeline uses RSA to quantify similarities between brain representations and semantic feature spaces across different Regions of Interest (ROIs).

Key Features

  • Modular Design: Each analysis step is handled by dedicated Python modules
  • GPU Acceleration: Leverages NVIDIA RAPIDS (cuML, cuPy) and mdscuda for high-performance computing
  • Comprehensive Analysis: Covers both object-level and scene-level feature processing
  • Automated Visualization: Generates structured plots and quantitative metrics
  • Reproducible Environment: Includes Conda environment specification

Pipeline Workflow

  1. RDM Calculation: Processes fMRI beta-series data to compute Representational Dissimilarity Matrices (RDMs) for various ROIs
  2. VLM Feature Processing: Loads and structures high-level semantic features extracted from stimulus images
  3. Alignment & Visualization: Aligns brain data with VLM features using dimensionality reduction (MDS, t-SNE)
  4. Cluster Quality Analysis: Quantifies feature clustering quality using silhouette scores and other metrics

File Structure

Core Pipeline Files

  • main.py - Main pipeline orchestrator
  • config.py - Central configuration for paths, ROI definitions, and parameters
  • data_loader.py - Handles fMRI beta data and stimulus information loading
  • rsa_analysis.py - Computes Representational Dissimilarity Matrices
  • feature_processing.py - Processes VLM feature data from JSON outputs
  • dim_reduction_and_viz.py - Dimensionality reduction and visualization functions
  • clustering_analysis.py - Clustering algorithms and quality metrics
  • roi_utils.py - ROI mask handling utilities

Analysis Scripts

  • main_analysis.py - Extended analysis pipeline
  • main_numerical_plots.py - Numerical feature visualization
  • rsa_model_analysis.py - RSA correlation analysis and plotting

Utility Scripts

  • run_hpc.sh - HPC job submission script
  • analysis_sub.sh - Analysis job submission
  • calculate_embeddings.sh - Embedding calculation script

Modules

  • nsd_access/ - Natural Scenes Dataset access utilities
  • utils/ - General utility functions for data processing

Setup and Installation

1. Create Conda Environment

# Create the environment
conda env create -f environment.yml

# Activate the environment
conda activate rsa-vlm-env

2. Configure Paths

Edit config.py to set up your data paths:

  • NSD_DATA_ROOT: Path to NSD data directory
  • VLM_FEATURES_PATH: Path to VLM feature JSON files
  • RDM_OUTPUT_PATH: Output directory for analysis results

3. Data Requirements

  • Natural Scenes Dataset (NSD) - specifically nsddata and nsddata_betas
  • VLM feature files in JSON format (e.g., subj01_complete_features.json)

Usage

Basic Pipeline Execution

# Run analysis for specific subject and ROI
python main.py <subject_id> <roi_name>

# Example
python main.py 1 "Primary_Visual_Cortex_V1(EarlyVisualCortex)"

Extended Analysis

# Run comprehensive analysis
python main_analysis.py

# Generate numerical plots
python main_numerical_plots.py

# Perform RSA model analysis
python rsa_model_analysis.py

HPC Execution

# Submit job to HPC scheduler
sbatch run_hpc.sh

Output Structure

The pipeline generates organized outputs in the RDM_OUTPUT_PATH directory:

output/
├── subj01/
│   ├── <ROI_NAME>_RDM.npy          # Pre-calculated RDMs
│   ├── embeddings/
│   │   ├── mds/                    # MDS embeddings
│   │   └── tsne/                   # t-SNE embeddings
│   ├── scene_plots/                # Scene-level feature plots
│   │   ├── <ROI_NAME>/
│   │   │   ├── mds/
│   │   │   └── tsne/
│   └── object_plots/               # Object-level feature plots
│       └── <ROI_NAME>/
│           ├── <criterion>/
│           │   ├── mds/
│           │   └── tsne/
└── logs/                           # Analysis logs

Key Components

ROI Definitions

The pipeline analyzes multiple brain regions organized into functional groups:

  • Early Visual Cortex: V1, V2, V3, V4
  • Dorsal Stream: V3A, V3B, V6, V7, V6A, IPS1, FEF, LIPd, AIP
  • MT+ Complex: V3CD, MST, MT/V5, V4t, FST
  • Ventral Stream: V8, VVC, PIT, FFC, VMV1-3, LOC1-3, PH, RSC, PHA1-3

Feature Types

  • Scene-level features: Overall image properties (aesthetic quality, scene type, etc.)
  • Object-level features: Individual object properties (material, size, function, etc.)
  • Saliency criteria: Different methods for object selection (largest area, highest saliency, etc.)

Analysis Methods

  • Representational Similarity Analysis (RSA): Compares neural and feature similarity structures
  • Dimensionality Reduction: MDS and t-SNE for visualization
  • Clustering Analysis: Silhouette scores and other quality metrics
  • Statistical Validation: Comprehensive statistical testing of results

Dependencies

Core Requirements

  • Python 3.10
  • CUDA Toolkit 11.8
  • RAPIDS cuML and cuPy for GPU acceleration

Scientific Computing

  • NumPy, SciPy, Pandas
  • Scikit-learn
  • Matplotlib for visualization
  • Nibabel for neuroimaging data

Specialized

  • nsd-access for Natural Scenes Dataset
  • mdscuda for GPU-accelerated MDS

Configuration

Key configuration options in config.py:

  • N_SUBJECTS: Number of subjects to process (default: 8)
  • OBJECT_SELECTION_CRITERION: Method for object selection
  • ROI_GROUPS and ROI_GROUP_NAMES: Brain region definitions
  • LOG_LEVEL: Logging verbosity

Citation

If you use this pipeline in your research, please cite the relevant papers for:

  • Natural Scenes Dataset (NSD)
  • Vision-Language Model features
  • Representational Similarity Analysis methods

License

This project is released under standard academic use terms. Please see individual component licenses for specific restrictions.

About

This computational neuroscience pipeline leverages GPU acceleration (via RAPIDS) to perform RSA on NSD fMRI beta-series data across various ROIs. It seeks to quantify the alignment between neural RDMs and semantic feature RDMs derived from VLMs, offering reproducible insights into the neural coding of scene-level and object-level semantics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors