V-EEEV Nat Hist RNA-seq Pipeline

Wrapper repo for running nf-core/rnaseq (3.23.0) on the V-EEEV natural history datasets on HPC. RNA-seq workflow used for analysis of alphavirus infection studies.
Supports mixed host/viral transcriptome analysis across multiple species (mouse and rat) with automated dataset staging and reference preparation.

It does four things:

stages a mixed FASTQ delivery into dataset-specific folders
builds combined host + virus references
generates nf-core samplesheets
launches nf-core/rnaseq

Expected Input

FASTQs:

Input FASTQs are paired-end and named like SAMPLE_R1_001.fastq.gz and SAMPLE_R2_001.fastq.gz
If starting from the mixed V-EEEV Nat Hist delivery, use bin/stage_nat_hist_inputs.py
That script classifies samples into:
- mouse_veev
- mouse_eeev
- rat_veev

References:

references/mouse/: one mouse FASTA and one mouse GTF
references/rat/: one rat FASTA and one rat GTF
references/VEEV/: virus.fa and virus.gtf
references/EEEV/: virus.fa and virus.gtf

Reference folders are split by role:

references/ holds the active pipeline-facing references used by runs
viral_reference_work/ holds viral source files and derived work products such as raw source annotations, curation tables, and polish outputs

The repo already includes curated viral references in references/VEEV/virus.fa, references/VEEV/virus.gtf, references/EEEV/virus.fa, and references/EEEV/virus.gtf.

Basic Workflow

Stage the mixed delivery:

python3 bin/stage_nat_hist_inputs.py "/path/to/V-EEEV Nat Hist"

Download host references if needed:

bash bin/download_host_references.sh all

Edit cluster/runtime settings:

nano settings.env

Important setting:

set ISAAC_ACCOUNT to the real account instead of ACF-UTKXXXX

Known Working ISAAC Runtime

Do not rely on the cluster default nextflow module alone. On ISAAC it can resolve to an old 20.04.1 launcher, which is too old for nf-core/rnaseq 3.23.0.

Known-good bootstrap:

mkdir -p "$HOME/bin"
cd "$HOME/bin"
curl -s https://get.nextflow.io | bash
chmod +x nextflow

module purge
module load openjdk/17.0.0_35
export PATH="$HOME/bin:$PATH"
export NXF_VER=25.04.3
export SKIP_MODULE_LOAD=1

which nextflow
java -version
nextflow -version

The wrapper now checks both runtimes before launch and prints the detected java and nextflow paths and versions. The corresponding defaults in settings.env are:

JAVA_MODULE=openjdk/17.0.0_35
NEXTFLOW_BIN_DIR=$HOME/bin
NXF_VER=25.04.3

Preflight check a dataset:

PREFLIGHT_ONLY=1 bash submit_rnaseq.sh mouse_veev

Run a dataset:

sbatch submit_rnaseq.sh mouse_veev
sbatch submit_rnaseq.sh mouse_eeev
sbatch submit_rnaseq.sh rat_veev

What `submit_rnaseq.sh` Does

For the selected dataset, the launcher:

checks inputs and references
builds references/build/<dataset>/combined.fa
builds references/build/<dataset>/combined.gtf
writes metadata/<dataset>_samplesheet.csv
runs nextflow run nf-core/rnaseq -r 3.23.0

The nf-core run uses:

--aligner star_salmon
-profile "$NFCORE_PROFILE"
-c nextflow.config

Useful Commands

Make a samplesheet manually:

bash bin/make_samplesheet.sh inputs/mouse_veev metadata/mouse_veev_samplesheet.csv

Build a combined reference manually:

bash bin/build_combined_reference.sh mouse_veev

Print the exact Nextflow command without running:

DRY_RUN=1 SKIP_MODULE_LOAD=1 bash submit_rnaseq.sh mouse_veev

Run a smoke test:

SMOKE_TEST=1 PREFLIGHT_ONLY=1 bash submit_rnaseq.sh mouse_veev
SMOKE_TEST=1 sbatch submit_rnaseq.sh mouse_veev

Pre-cache nf-core containers:

bash bin/precache_nfcore_containers.sh

Output Locations

Main outputs go to scratch-backed paths from settings.env:

RESULTS_BASE/<dataset>
WORK_ROOT/<dataset>
CONTAINER_CACHE

Smoke test outputs go to:

RESULTS_BASE_SMOKE/<dataset>
WORK_ROOT_SMOKE/<dataset>

Requirements

Slurm
Java >= 17
Nextflow >= 25.04.3
Singularity or Apptainer
Python 3

This repo is meant as a practical run wrapper, not a general-purpose distributed pipeline package.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V-EEEV Nat Hist RNA-seq Pipeline

Expected Input

Basic Workflow

Known Working ISAAC Runtime

What `submit_rnaseq.sh` Does

Useful Commands

Output Locations

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bin		bin
inputs		inputs
metadata		metadata
references		references
viral_reference_work		viral_reference_work
.gitignore		.gitignore
README.md		README.md
nextflow.config		nextflow.config
settings.env		settings.env
submit_rnaseq.sh		submit_rnaseq.sh
submit_virus_polish.sh		submit_virus_polish.sh

Folders and files

Latest commit

History

Repository files navigation

V-EEEV Nat Hist RNA-seq Pipeline

Expected Input

Basic Workflow

Known Working ISAAC Runtime

What submit_rnaseq.sh Does

Useful Commands

Output Locations

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

What `submit_rnaseq.sh` Does

Packages