This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
TagDust2 is a bioinformatics tool for processing next-generation sequencing (NGS) data. It extracts and labels sequences containing adapter, linker, barcode, and fingerprint sequences using Hidden Markov Models (HMMs). The tool can demultiplex reads, automatically detect library preparation methods, and filter out contaminants.
This project uses GNU Autotools for configuration and building:
# Initial setup (run once after cloning)
./autogen.sh
./configure
# Build the project
make
# Run tests
make check
# Install system-wide
make install# Enable debugging
./configure --enable-debugging
# Enable valgrind tests
./configure --enable-valgrind-tests
# Build with specific compiler flags
./configure CFLAGS="-O2 -Wall -std=gnu99"- Main test suite:
make check - Individual test programs are built in
src/:tagdustiotest,tagdust_rtest,simreads_rtest,evalres_rtest - Development tests in
dev/:sanity_test.sh,casava_test.sh,bar_read_test.sh
- main.c: Main tagdust program entry point
- barcode_hmm.c/h: HMM construction, training, and sequence matching
- interface.c/h: Command-line argument parsing and program initialization
- io.c/h: File I/O operations for FASTQ/FASTA formats
- nuc_code.c/h: Nucleotide encoding and sequence operations
- misc.c/h: Utility functions and data structures
- kslib.c/h: Sequence parsing and utility library
- tagdust: Main demultiplexing tool
- simreads: Read simulation utility
- evalres: Results evaluation tool
- merge: Sequence merging utility
- rename_qiime: QIIME format renaming tool
- HMM structures for sequence architecture modeling
- Sequence data structures with quality scores
- Barcode matching and scoring systems
Architecture files define the expected structure of reads:
- Located in
dev/andcasava_demo/directories - Examples:
casava_arch.txt, variousEDITTAG_*files - Format: Specify barcodes, read segments, and expected patterns
dev/: Contains test FASTQ files and expected output filesbenchmark/: Performance testing scripts and data- Gold standard files ending in
_gold.txtfor validation
- Modify relevant source files in
src/ - Update
src/Makefile.amif adding new source files - Run
maketo build - Run
make checkto ensure tests pass - Add new tests in
dev/if needed
- Configure with
--enable-debuggingfor debug builds - Use
--enable-debugging=1|2|3for different verbosity levels - Debug macros available via
debug_print()intagdust2.h
- Custom memory allocation macros in
malloc_macro.h - SIMD-aligned memory allocation support
- Memory debugging available with debug builds
To reproduce the benchmarks from the TagDust2 paper:
cd reproducibility/scripts
make -f run.mk benchmarkThis requires 64-bit system and may take considerable time.