Skip to content

Input Files

Stephen Bolaris edited this page May 31, 2022 · 14 revisions

Input Files FAQ

File Names and Inputs

Do my file names matter?

Yes, they should follow at a minimum having a R1 and R2 (or just R1 for single end sequencing)

How does the batch processing work and is if different than Sequoia Complete?

Currently yes, the toolkit for Sequoia Express uses a directory to process all of your fastq files at the same time, rather than one at a time. Thus allowing to run all your files with one command in parallel instead of calling multiple instances of the toolkit. Previously: --reads ~/path/to/my/files/{R1,R2} Current: --reads ~/path/to/my/files/

Do I need to concatenate my files?

Yes, on your local machine you will need to concatenate your own files, if you upload your fastq files to the SeqSense web application this will be done for you. A simple one line bash command will allow you to do this if your files are in illumina standard naming format. This only applies to those who have run the same sample on multiple lanes for any number of reasons.

Inputs at runtime

How many files can I run at one time?

The full upper limit has yet to be reached, currently we have run as many as 35 paired end files at the same time, with out issue. Though beware that the pdf batch reports may hit a limit on a the table to be hard to read, however the HTML and CSV batch reports will not have this issue.

Are my input files from sequoia complete or sequoia express? what happens if I run the wrong files

If you run the default mode of PE (paired end) you will likely get an error when the alignment begins since the main difference between express and complete is that the is a full R2 (express) vs just the umi in the R2 (complete). Otherwise if you run just the R1 it will work for both but need you to run the seqType option with SE. You can check for yourself but using the command. zcat your_file_R2.fastq.gz | head or cat your_file_R2.fastq to see what data is in the R2 file.

Clone this wiki locally