Skip to content

What is in a pipeline

zburkett edited this page Jun 2, 2022 · 3 revisions

What is Nextflow

Nextflow is the software the coordinates the pipeline and all the calls to the Docker container to allow for asynchronous scheduling and execution of tasks. See the Nextflow docs for more.

How to test if the pipeline is working

In the README we have provided you some basic run commands so what you will need for a basic test from our files:

  1. Get the downsampled FASTQ files for hg38 from Dropbox. It's best to put these in a folder / directory where you can find it.
  2. Download the hg38 genome from Dropbox with the instructions from the README.
mkdir ./ref_data/genome-annotations
cd ./ref_data/genome-annotations
wget -O hg38.tar.gz https://www.dropbox.com/s/hm6kyp70dtbqovr/hg38.tar.gz?dl=0
tar xvzf hg38.tar.gz
  1. Run basic command with the data you have: nextflow run Sequoia_express_toolkit/main.nf --outDir ./output/ --reads '~/data/' --genome hg38 --genomes_base ./ref_data/genome-annotations -profile docker

Containerization

The current Toolkit setup is designed to be used on the command line using a nextflow run command. For users that do not want to install Nextflow, within the repository there is a set of files to further containerize the pipeline and allow execution of the pipeline using purely docker run commands with mapping. See the containerization folder for more information.

Docker -> Singularity

If the system of choice uses Singularity, use the following command to create a Singularity image from the Docker container provided from Dockerhub.

singularity pull sequoia-express.sif docker://bioradbdg/sequoia-express:latest 

Clone this wiki locally