Skip to content

Latest commit

 

History

History
65 lines (51 loc) · 2.67 KB

File metadata and controls

65 lines (51 loc) · 2.67 KB

Damage-Seq Pipeline

This repository contains the full pipeline for processing and quality-controlling Damage-seq data. It includes trimming, alignment, filtering, deduplication, k-mer QC, and dipyrimidine filtering steps.

Directory Structure

damage-seq-pipeline/
├── src/
│   ├── 1_mycutadapt.py               # Adapter trimming
│   ├── 2_humanbowtie2pairend.py      # Alignment to genome using Bowtie2
│   ├── 3_samtobam_pair.py            # Convert SAM to BAM, paired-end filtering
│   ├── 4_ParseSort.py                # Parse BAM and sort BED
│   ├── 5_Deduplicate.py              # Deduplicate reads
│   ├── 6_sortbamtobedpetobed.sh      # Sort and convert BAM to BED
│   ├── 7_bedto10bed.sh               # Convert BED to 10nt windows
│   ├── 8_Sort_bed_LL.py              # Final sort of 10nt BED
│   ├── 9_fastatobed.py               # Convert filtered FASTA to BED
│   ├── 10_fa2kmerAbundanceTable.py   # Generate k-mer frequency table
│   ├── 11_dipyrimidine_filter.py     # Filter dipyrimidine-containing reads
│   └── 12_fasta2bed.py               # Convert filtered FASTA to BED
├── qc/
│   ├── dipyrimidine_qc_plot.R        # R script for dipyrimidine bar plots
│   ├── sequence.py                   # Dinucleotide sequence QC
│   └── fasta.py                      # K-mer frequency analysis
├── archive/
│   └── (legacy or unused scripts)

Requirements

  • Python 3
  • R with ggplot2, wesanderson, ggthemes, tidyr, dplyr
  • Samtools
  • Bedtools
  • Cutadapt
  • Bowtie2

Run Instructions

  1. Adapter trimming: src/1_mycutadapt.py
  2. Align with Bowtie2: src/2_humanbowtie2pairend.py
  3. Convert to BAM and filter pairs: src/3_samtobam_pair.py
  4. Sort BAM to BED: src/4_ParseSort.py and src/6_sortbamtobedpetobed.sh
  5. Deduplicate: src/5_Deduplicate.py
  6. Create 10nt fragments: src/7_bedto10bed.sh
  7. Sort 10nt BEDs: src/8_Sort_bed_LL.py
  8. Get FASTA and filter dipyrimidines: src/9_fastatobed.py, src/11_dipyrimidine_filter.py
  9. Generate QC plots: qc/dipyrimidine_qc_plot.R

Notes

  • Ensure all dependencies are loaded before running (e.g., via modules or conda).
  • Input data should be gzipped paired-end FASTQ files.
  • All scripts are configured for SLURM submission with sbatch.
  • filtered.bed for two biological repeats can be merged and deposited in a new folder "merged_bed"
  • for xrseq around histone markers: stranded xrseq merged bed files generated by strandedmergedbedfiles.sh
  • simulations applied to merged.bed

Author

Cansu Kose, 2025 – UNC Chapel Hill | Sancar Lab