CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

TagDust2 is a bioinformatics tool for processing next-generation sequencing (NGS) data. It extracts and labels sequences containing adapter, linker, barcode, and fingerprint sequences using Hidden Markov Models (HMMs). The tool can demultiplex reads, automatically detect library preparation methods, and filter out contaminants.

Build System

This project uses GNU Autotools for configuration and building:

Essential Build Commands

# Initial setup (run once after cloning)
./autogen.sh
./configure

# Build the project
make

# Run tests
make check

# Install system-wide
make install

Development Build Options

# Enable debugging
./configure --enable-debugging

# Enable valgrind tests
./configure --enable-valgrind-tests

# Build with specific compiler flags
./configure CFLAGS="-O2 -Wall -std=gnu99"

Testing

Main test suite: make check
Individual test programs are built in src/: tagdustiotest, tagdust_rtest, simreads_rtest, evalres_rtest
Development tests in dev/: sanity_test.sh, casava_test.sh, bar_read_test.sh

Architecture

Core Components

main.c: Main tagdust program entry point
barcode_hmm.c/h: HMM construction, training, and sequence matching
interface.c/h: Command-line argument parsing and program initialization
io.c/h: File I/O operations for FASTQ/FASTA formats
nuc_code.c/h: Nucleotide encoding and sequence operations
misc.c/h: Utility functions and data structures
kslib.c/h: Sequence parsing and utility library

Executables Built

tagdust: Main demultiplexing tool
simreads: Read simulation utility
evalres: Results evaluation tool
merge: Sequence merging utility
rename_qiime: QIIME format renaming tool

Key Data Structures

HMM structures for sequence architecture modeling
Sequence data structures with quality scores
Barcode matching and scoring systems

Configuration Files

Architecture Files

Architecture files define the expected structure of reads:

Located in dev/ and casava_demo/ directories
Examples: casava_arch.txt, various EDITTAG_* files
Format: Specify barcodes, read segments, and expected patterns

Test Data

dev/: Contains test FASTQ files and expected output files
benchmark/: Performance testing scripts and data
Gold standard files ending in _gold.txt for validation

Development Workflow

Adding New Features

Modify relevant source files in src/
Update src/Makefile.am if adding new source files
Run make to build
Run make check to ensure tests pass
Add new tests in dev/ if needed

Debugging

Configure with --enable-debugging for debug builds
Use --enable-debugging=1|2|3 for different verbosity levels
Debug macros available via debug_print() in tagdust2.h

Memory Management

Custom memory allocation macros in malloc_macro.h
SIMD-aligned memory allocation support
Memory debugging available with debug builds

Paper Reproducibility

To reproduce the benchmarks from the TagDust2 paper:

cd reproducibility/scripts
make -f run.mk benchmark

This requires 64-bit system and may take considerable time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Build System

Essential Build Commands

Development Build Options

Testing

Architecture

Core Components

Executables Built

Key Data Structures

Configuration Files

Architecture Files

Test Data

Development Workflow

Adding New Features

Debugging

Memory Management

Paper Reproducibility

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Build System

Essential Build Commands

Development Build Options

Testing

Architecture

Core Components

Executables Built

Key Data Structures

Configuration Files

Architecture Files

Test Data

Development Workflow

Adding New Features

Debugging

Memory Management

Paper Reproducibility