bit-itolwith subcommands forbinary-datasetcolorstripmaptext-dataset
bit-gen-readsfragment size and long-read read lengths are now pulled from a normal distribution rather than uniform-
bit-get-mapped-reads-pidhas been renamed to justbit-mapped-reads-pid
bit-colnamescan also just accept stdin nowbit-gen-kraken2-tax-plotschanged tobit-kraken2-tax-plotsbit-kraken2-to-taxon-summarieschanged tobit-kraken2-tax-summarybit-combine-kraken2-taxon-summaries- changed to
bit-kraken2-combine-tax-summaries -n,--sample-namesnow taken as a space-delimited list instead of a comma-delimited list (to match how input files are taken)
- changed to
bit-kraken2-to-taxon-summarieshas been replaced withbit-kraken2-tax-summarybit-kraken2-combine-tax-summariesremoved,bit-kraken2-tax-summaryautomatically does this now, and can take multiple input reportsbit-combine-bracken-and-add-lineageremoved, bracken reports are also done bybit-kraken2-tax-summarynow- as a result of this, like the lineage-building from kraken2 reports, this builds them based on what's in the report rather than based on taxids and new lookups. This ensures the output lineages will match the state of the taxonomy when kraken2/bracken was run (as it's only pulling from the report itself)
bit-gen-iToL-binary-datasetreplaced withbit-itol binary-datasetbit-gen-iToL-colorstripreplaced withbit-itol colorstripbit-gen-iToL-mapreplaced withbit-itol mapbit-gen-iToL-text-datasetreplaced withbit-itol text-dataset- removed
bit-reorder-fasta, that is now stored as a gist here: https://gist.github.com/AstrobioMike/6b91769ad13305ebd4779873afa9aa1f - removed
bit-prot-acc-to-taxid, that is now stored as a gist here: https://gist.github.com/AstrobioMike/2cc5fd147aa28c2b793c7c664502734d
- setup for argcomplete during conda install for commands with subcommands
- added flag
--skip-read-pidstobit-cov-statsso user can save time if they don't want that info - added coverage options to
bit-gen-readsfor easier generation of desired coverages than the read-proportions method allowed
- removed
bit-gff-to-anvio, that is now just stored as a gist here: https://gist.github.com/AstrobioMike/45e10adb3eaceb338a7eb10f49038355 - removed
bit-split-multifasta, that is now just stored as a gist here: https://gist.github.com/AstrobioMike/28cae086241bc0d68e05c215acd91a69 bit-get-cov-stats- now just
bit-cov-stats(due to popular demand) - output *-per-ref.tsv now includes "length" and "num_contigs" columns
- now just
bit-cov-analyzer- added "contig_length" column to output tsvs
bit-gen-reads-cnow corresponds to--coverageand-Cis for--circularize
bit-gen-reads- if generating paired-end reads with a fragment size shorter than the read size, the appropriate length quality scores are now output
bit-mutate-seqsnow has a tunable parameter for transitions/transversionsbit-extract-seqs- added
by-headerssubcommand (this replacesbit-parse-fasta-by-headers) by-coordssubcommand replacesbit-extract-seqs-by-coords
- added
bit-genbankwhich has subcommmands for all genbank helpersto-fastareplacesbit-genbank-to-fastato-AA-seqsreplacesbit-genbank-to-AA-seqsto-cds-seqsto-cds-tsvreplacesbit-genbank-to-cds-table
- removed
bit-figshare-upload, that is now just stored as a gist here: https://gist.github.com/AstrobioMike/9f86931747357ae8949f715145c5eec4 - removed
bit-locus-clean-slate, that is now just stored as a gist here: https://gist.github.com/AstrobioMike/17324583568b051cd3b190cf4767b524 - moved many imports from top of cli modules until they are needed to increase snappiness when just doing things like trying to view the help menu
bit-summarize-columnno longer has an-iflag, it expects the input table/file to be given as a positional argumentbit-count-basescan now take STDINbit-normalize-tablemodularized and tests addedbit-parse-fasta-by-headershas been removed and replaced withbit-extract-seqs by-headersbit-extract-seqs-by-coordshas been removed and replaced withbit-extract-seqs by-coordsbit-filter-seqs-by-lengthhas been renamed tobit-filter-fasta-by-lengthbit-genbank-to-fastahas been removed and replaced withbit-genbank to-fastabit-genbank-to-AA-seqshas been removed and replaced withbit-genbank to-AA-seqsbit-genbank-to-cds-tablehas been removed and replaced withbit-genbank to-cds-tsvbit-genbank-to-clean-slatehas been removed and replaced withbit-genbank clean-slate- moved
bit-GL-combine-contig-tax-tablesandbit-GL-combine-KO-and-tax-tablesfrom the primary bit package into the metagenomics-wf scripts bit-get-test-data metagenomepulls a smaller, super simple, single-sample dataset I'm hosting on github now rather than figshare
- to
bit-cov-analyzer- progress updates while running
- zero-coverage region outputs now also generated
- output region tsvs now have a "low_complexity" column that holds True or False
- this is based on:
- low_complexity = True if: unique 3-mers / all-possible-3mers <= 0.4
- this is based on:
bit-extract-seqs- enables pulling out target seqs from a fasta by bed file or specified primers via subcommands
- to
bit-gen-reads--fragment-size-rangeoption added, defaults to 10% of fragment size
bit-cov-analyzer-s | --sliding-window-sizechanged to-w | --window-size, and-S | --step-sizechanged to-s | --step-size(lower-case)- default window size change from 50 to 100, and default step size changed from 10 to 20
- drastic improvements to efficiency when working with large genomes (e.g., 3GB)
- histogram of coverages no longer plotted by default, only done now when adding the
--write-window-statsflag - no longer produces window-coverage-overview.txt as all of that info is captured within window-coverage-overview.tsv
bit-get-mapped-reads-pid- minor improvements to efficiency
bit-get-cov-stats- improvements to efficiency
- now also reports median percent id of mapped reads per ref and per contig (when provided an input bam file)
bit-summarize-assembly- adds commas when printing stats to terminal for readability
bit-extract-seqs-by-coordsis now combined intobit-extract-seqsbit-gen-reads- now has a
--fragment-size-rangethat defaults to 10% of fragment size - by default will not include regions with Ns in generated reads, add
--include-Nsto allow that
- now has a
bit-assemble- the threads parameter is now passed to bbnorm and fastp (if run) in addition to the assemblers
bit-gen-reads--type longwill no longer preferentially start reads at position 0 if the requested read size is larger than the contig; now it will start randomly and just produce a read that ends where the contig ends (unless--circularizeis added)
bit-cov-analyzer- no longer writes out individual window stats by default (to save spacetime), it needs to be turned on with the
--write-window-statsnow if wanted
- no longer writes out individual window stats by default (to save spacetime), it needs to be turned on with the
bit-gen-readspreviously may have by chance created reads with identical headers (since only coordinates were being added), now there is also a counter to prevent this
- to
bit-gen-reads- added single-end and long-read capabilities (through
--typeargument now, see Changed below) - single can be used up to any size, but if specifying
--long, it will also generate reads with lengths spanning a range around the specified read size (50% by default)
- added single-end and long-read capabilities (through
- to
bit-calc-variation-in-msa- 3Di as an option for
--type
- 3Di as an option for
bit-gen-reads- now has
--typeflag for paired-end, single-end, or long (paired-end still by default) - did more work than it's worth to ensure the exact number of requested reads are always returned
- now has
bit-calc-variation-in-msa--gaps-treatmentchanged to "include" by default
bit-add-insertion
bit-update-ncbi-taxonomyreplaced withget-ncbi-tax-data(prior still retained for now)- dropped
bit-calc- if you are the one other person that ever used this and you want it back, you can add this to your ~/.bashrc:
bit-calc () { awk "BEGIN { print $1 }"; }:)
- if you are the one other person that ever used this and you want it back, you can add this to your ~/.bashrc:
- modified
bit-colnamesto try to autodetect delimeter
- added back in setup.py glob portion needed for scripts not fully integrated into python-packaging yet
bit-get-cov-stats- the
--include-non-primaryflag now in addition to calculating percent ID including supplemental and secondary alignments also runs mosdepth with--flag 1540
- the
bit-dl-ncbi-assemblies- in python now instead of bash (i hope this doesn't hinder performance too much...)
- default concurrent downloads is 10 now instead of 1
- default format is fasta now instead of gbk
- downloads only happen in http now, no more ftp, so the -P flag to specify http has been removed
- added optional output dir
- no longer keeping stubs in scripts/, instead keeping a ton of entry points in pyproject.toml
bit-filter-seqs-by-lengthrenamed tobit-filter-fasta-by-lengthto be more specific (prior retained for now)
bit-get-cov-statsby default now produces per-contig level info also (can be shut off with--skip-per-contig)
bit-get-cov-stats- the original ref-based output file is now called <output-prefix>-per-ref.tsv (changed from <output-prefix>.tsv)
- outputs include median coverage in addition to mean
- for speed (and consistency with expectations of known most-frequent users), when
bit-get-cov-statsruns mosdepth, it uses the-x | --fast-modeflag now - added progress bar when parsing coverage info
bit-assemble- re-arranging of help menu
- memory setting now passable to spades too
report_messagefunction from modules.general slightly altered- this is more a note to myself for if/when i see weird things in terminal-printing format show up later
- general help-menu formatting
--circularizeoption added tobit-gen-reads
bit-get-cov-statsnow also reports mean percent ID of mapped reads for each input reference when the input includes a bam file (leveragingbit-get-mapped-reads-pid)
bit-get-mapped-reads-pidto pull out percent-identity information of mapped reads from an input bam- for each mapped read:
- calculated percent ID = (full_aligned_length - NM) / full_aligned_length * 100
- where full_aligned_length = Matches + Mismatches + Insertions + Deletions
- calculated percent ID = (full_aligned_length - NM) / full_aligned_length * 100
- for each mapped read:
- added a 'genome' option to
bit-get-test-datathat pulls an E. coli genome
- modularized and added tests for
bit-get-workflow,bit-get-test-data,bit-dedupe-fasta-headers,bit-fasta-to-genbank, andbit-fasta-to-bed - added more tests to
bit-gen-reads - improved coverage on some other modules with more unit tests
- moved more setup info into pyproject.toml, but retained minimal setup.py to be able to glob because bit has a lot of separate scripts/entry points
bit-count-bases-per-seqhas been removed with its function combined intobit-count-bases- if input fasta has one sequence, it prints the length to the terminal; if it has 2 or more, it will print out summary stats; in either of the two prior cases, if an output file is specified, the program will additionally write lengths of all sequences to that specified file
- modularized, test added
- modularized and added unit tests for
bit-lineage-to-tsv bit-mutate-seqs--seed-for-randomizationlong-parameter shortened to just--seed- modularized and test added
bit-assemblenow properly filters spades-assembled contigs based on user-specific min-contig length
- updates to
bit-get-cov-stats- can start from bam file now in addition to mosdepth per-base.bed.gz (will generate the mosdepth output if starting from bam)
- modularized, integration test added
- updates to
bit-check-for-fastq-dup-headers- autodetect gzipped or not
- modularized, test added
- more test coverage of
bit-ez-screen - unit tests for
bit-gen-kraken2-tax-plots,bit-kraken2-to-taxon-summaries, andbit-calc-variation-in-msa - integration test for
bit-cov-analyzer
- modularized
bit-calc-variation-in-msa - updates to
bit-gen-kraken2-tax-plots- modularized
- appropriately adds domain letter to plots from GTDB tax kraken2 reports now
- updates to
bit-kraken2-to-taxon-summaries- modularized
- no longer takes the larger kraken.out file, it now works off of the kraken.report
- no longer works based on taxid lookup, it now works based on the taxonomy in the kraken report
- this means it will exactly match the taxonomy used in the kraken2 db, and not swap anything if taxids or rank names changed
- this also means it now works with GTDB-kraken2-db produced reports (which previously would not work with the standard taxid lookup method)
- modularized
bit-filter-seqs-by-lengthand added a test for it- also changed the name to
bit-filter-fasta-by-length, though i'm retaining a stub for the old name so it still works when called that way too
- also changed the name to
- modularized
bit-summarize-columnand added tests - added a temp fix for the latest megahit osx-64 build not working (see smk/envs/assemble-osx-64.yaml; if that's gone it was no longer needed and removed in the future)
- fix to
bit-summarize-columnwhen standard input is only one column (was erroring out before)
- added
bit-assemble- command-line wrapper for an assembly workflow with optional qc and digital normalization
- added more integration tests
- modularized
bit-gen-reads
- modifications to
bit-cov-analyzer- default --min-region-length set to 500 to help reduce overwhelming output and focus on larger regions
- added a column for "zero_cov_bases" to output low- and high-coverage region tsvs
- modifications to
bit-cov-analyzer- added N filtering
- if an an identified low-coverage region is more than 50% Ns, it won't be reported in the output low-coverage table
- added
--min-region-lengthparameter, though the default is 0 (so smallest window size) - modularized this script
- added N filtering
- adding seed option to
bit-gen-reads
- added a reads mode to
bit-ez-screen
bit-ez-screennow has subcommands for assembly vs reads modes
- fixed read-header formatting for paired read from
bit-gen-reads - fixed conda recipe meta.yaml and update-conda-package.sh
- started making bit a python package (to facilitate sharing functions, formatting, etc. overtime and moving forward)
- added conda recipe here instead of being maintained elsewhere
- fixed
bit-get-workflowto be able to pull from all prior released workflows instead of just recent ones
Previous version changes are only tracked on the releases page.