All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Summary output now lists identified taxon IDs in
taxons_identified(replacingtotal_taxon_count) - Summary output now includes zero entries for identified taxa with no direct assignments in
reads_extracted_per_taxon
- Attempt to detect headers in kraken report files and skip if it looks like a header. It does this by checking if there are 6 fields that are all strings (or more precisely if each field returns an error when parsing into an int/float). If so, it is assumed that this line is a header and is skipped. A warning message is printed in this case.
--no-header-detectflag to force parsing Kraken reports from the first line without auto-skipping headers, disabling the new header detection behaviour described above
- Kraken report/output parsing errors now include the line number and o line for easier debugging.
- Summary output now lists identified taxon IDs in
taxons_identified(replacingtotal_taxon_count) and includes zero entries for identified taxa with no direct assignments inreads_extracted_per_taxon.
- Prevent a panic when writing to output paths without an extension.
- Able to specifiy taxon ids that are not present, without kractor stopping. These are instead logged to stderr with a warning. This may be useful when running kractor in a wrapper script for several fastq files and just want to extract a set of taxonids from them all - without caring if they are present or not.
- Include a new field
missing_taxon_idsin the summary output.
- Under the hood refactoring, introducing structs for the processed kraken outputs and processed kraken trees to simplify the returned data.
- Unclassified reads being skipped in the tree building stage, meaning they were unable to be extracted
- Added a
reads_extracted_per_taxonfield to to summary report (#28) - Added a
proportion_extractedfield to summary report (#28) - Added the version to summary report (#28)
- Added an output format (
fastaorfastq) field to the summary report (#28) - Added a
--verboseflag (in addition to the existing-v)
- Removed
-Ofor compression type, now uses--compression-formatfor clarity. - Removed
-lfor compression level, now uses--compression-levelfor clarity. - Renamed
--json-reportto--summary - Improved the JSON report format to make it easier to read by removing
PairedandSinglefields and instead having a simpletotal_reads_inandtotal_reads_outfield.
- Removed duplicate log message for taxon IDs identified
- Clippy warnings
- Create subdirectories specified in output path if they don't exist (#24)
- Add output validation to prevent overwriting existing files (#25)
- Better error handling with color-eyre (#16)
- Proper panic handling (#16)
- Tests for most functions
- JSON report with accurate read count information (#15)
- Keep fastq parsing in bytes instead of converting to String (#17)
- Optimized functions to take
&strinstead ofString(#21) - Migrated to crossbeam scoped channels and refactored threading code
- Refactored JSON output and removed lazy_static dependency
- Root node no longer added to tree twice (#22)
- JSON report included in stdout upon successful completion (can be disabled with
--no-json)
- Project renamed
- Support for paired-end files
- Output FASTA file with
--output-fastaoption
- Major refactor to use Noodles for fastq parsing
- Switched to Niffler for compression handling
- Streamlined arguments related to compression types/level
- Logging functionality
- Code optimizations
--no-compressflag to output standard plaintext fastq files--excludeoption to exclude specified reads (works with--childrenand--parents)- Internal documentation (docstrings)
- Increased verbosity of user outputs
- Reduced memory usage
- Automatic detection and handling of gz and plain files
--compressionargument to select compression type--childrenand--parentsoptions to save children and parents based on kraken report
- Integrated zlib-ng for faster gzip handling
- Initial release