SAM_Processing

Basic Usage

The SAM_Processing handler sorts, de-duplicates, and adds read groups to the SAM files produced from Read_Mapping into finished BAM files. This script utilizes Picard or SAMTools (user choice) to carry out the processing of the SAM files. In addition, it creates before and after statistics using the flagstat function of SAMTools. To run SAM_Processing, all common variables and handler-specific variables must be defined within the configuration file. Once the variables have been defined, SAM_Processing can be submitted to a job scheduler with the following command:

sequence_handling SAM_Processing Config

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config. In addition to the handler-specific variables, all common variables must be defined.

Variable	Function	Method
METHOD	Which program should be used to process the SAM files. Choose from `'picard'` (recommended) or `'samtools'`.	Picard and SAMtools
SP_QSUB	QSub settings for batch submission. Recommended settings are "mem=12gb,nodes=1:ppn=8,walltime=24:00:00".	Picard and SAMtools
MAPPED_DIRECTORY	The full file path to the directory containing the read-mapped samples. If using Read_Mapping then leave as `"${OUT_DIR}/Read_Mapping"`.	Picard and SAMtools
PICARD_JAR	The full file path for the Picard jar file.	Picard
MAX_MEM	The most amount of memory that can be used, formatted like 15g.	Picard
MAX_FILES	The maximum number of file handles that can be used. For UNIX systems, the per-process maximum number of files that can be open may be found with `ulimit -n`. Set slightly under this value.	Picard
TMP	An optional variable that tells Picard where to store temporary files. Only use if you've had issues running out of temp space. Otherwise, leave blank.	Picard

Note: if you're using SAMtools to process the SAM files (METHOD=samtools), then the last four variables may be left blank since they are only used for processing with Picard.

Output

SAM_Processing creates sorted, deduplicated BAM files that have read groups marked. After the job has run, a list of sorted, deduplicated, and read-grouped BAM files will be generated at

${OUT_DIR}/SAM_Processing/${PROJECT}_Processed_BAM.txt

where ${OUT_DIR} and ${PROJECT} are specified in the configuration file.

For processing with SAMtools (not Picard), a reference genome is necessary. If your reference genome is not indexed, SAM_Processing generates an index file for the reference genome in the same directory as the reference genome. Please make sure you have write permissions for said directory. After indexing SAM_Processing will exit, so you will need to run SAM_Processing again to process SAM files.

After running SAM_Processing, there are two options for further processing.

Coverage_Mapping can be used to generate a coverage map for each BAM file.
Indel_Realignment can be used to realign reads near insertions and deletions.

Dependencies

SAM_Processing depends on Picard (which depends on Java 1.8) or SAMTools for all processing needs as well as SAMTools 1.3.1 for generating the alignment statistics. In addition, PBS and GNU Parallel are required for basic operation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAM_Processing

Basic Usage

Handler-Specific Variables

Output

Dependencies

Next: Coverage_Mapping or Indel_Realignment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Main Information

Recommended Workflow Handlers

Other Handlers

Clone this wiki locally