Wrapper repo for running nf-core/rnaseq (3.23.0) on the V-EEEV natural history datasets on HPC.
RNA-seq workflow used for analysis of alphavirus infection studies.
Supports mixed host/viral transcriptome analysis across multiple species (mouse and rat) with automated dataset staging and reference preparation.
It does four things:
- stages a mixed FASTQ delivery into dataset-specific folders
- builds combined host + virus references
- generates nf-core samplesheets
- launches
nf-core/rnaseq
FASTQs:
- Input FASTQs are paired-end and named like
SAMPLE_R1_001.fastq.gzandSAMPLE_R2_001.fastq.gz - If starting from the mixed
V-EEEV Nat Histdelivery, usebin/stage_nat_hist_inputs.py - That script classifies samples into:
mouse_veevmouse_eeevrat_veev
References:
references/mouse/: one mouse FASTA and one mouse GTFreferences/rat/: one rat FASTA and one rat GTFreferences/VEEV/:virus.faandvirus.gtfreferences/EEEV/:virus.faandvirus.gtf
Reference folders are split by role:
references/holds the active pipeline-facing references used by runsviral_reference_work/holds viral source files and derived work products such as raw source annotations, curation tables, and polish outputs
The repo already includes curated viral references in references/VEEV/virus.fa, references/VEEV/virus.gtf, references/EEEV/virus.fa, and references/EEEV/virus.gtf.
Stage the mixed delivery:
python3 bin/stage_nat_hist_inputs.py "/path/to/V-EEEV Nat Hist"Download host references if needed:
bash bin/download_host_references.sh allEdit cluster/runtime settings:
nano settings.envImportant setting:
- set
ISAAC_ACCOUNTto the real account instead ofACF-UTKXXXX
Do not rely on the cluster default nextflow module alone. On ISAAC it can
resolve to an old 20.04.1 launcher, which is too old for nf-core/rnaseq 3.23.0.
Known-good bootstrap:
mkdir -p "$HOME/bin"
cd "$HOME/bin"
curl -s https://get.nextflow.io | bash
chmod +x nextflow
module purge
module load openjdk/17.0.0_35
export PATH="$HOME/bin:$PATH"
export NXF_VER=25.04.3
export SKIP_MODULE_LOAD=1
which nextflow
java -version
nextflow -versionThe wrapper now checks both runtimes before launch and prints the detected
java and nextflow paths and versions. The corresponding defaults in
settings.env are:
JAVA_MODULE=openjdk/17.0.0_35NEXTFLOW_BIN_DIR=$HOME/binNXF_VER=25.04.3
Preflight check a dataset:
PREFLIGHT_ONLY=1 bash submit_rnaseq.sh mouse_veevRun a dataset:
sbatch submit_rnaseq.sh mouse_veev
sbatch submit_rnaseq.sh mouse_eeev
sbatch submit_rnaseq.sh rat_veevFor the selected dataset, the launcher:
- checks inputs and references
- builds
references/build/<dataset>/combined.fa - builds
references/build/<dataset>/combined.gtf - writes
metadata/<dataset>_samplesheet.csv - runs
nextflow run nf-core/rnaseq -r 3.23.0
The nf-core run uses:
--aligner star_salmon-profile "$NFCORE_PROFILE"-c nextflow.config
Make a samplesheet manually:
bash bin/make_samplesheet.sh inputs/mouse_veev metadata/mouse_veev_samplesheet.csvBuild a combined reference manually:
bash bin/build_combined_reference.sh mouse_veevPrint the exact Nextflow command without running:
DRY_RUN=1 SKIP_MODULE_LOAD=1 bash submit_rnaseq.sh mouse_veevRun a smoke test:
SMOKE_TEST=1 PREFLIGHT_ONLY=1 bash submit_rnaseq.sh mouse_veev
SMOKE_TEST=1 sbatch submit_rnaseq.sh mouse_veevPre-cache nf-core containers:
bash bin/precache_nfcore_containers.shMain outputs go to scratch-backed paths from settings.env:
RESULTS_BASE/<dataset>WORK_ROOT/<dataset>CONTAINER_CACHE
Smoke test outputs go to:
RESULTS_BASE_SMOKE/<dataset>WORK_ROOT_SMOKE/<dataset>
- Slurm
- Java
>= 17 - Nextflow
>= 25.04.3 - Singularity or Apptainer
- Python 3
This repo is meant as a practical run wrapper, not a general-purpose distributed pipeline package.