All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
This patch release of wf-transcriptomes updates internal workflow naming, and does not affect any workflow outputs.
- Removed workflow suffix from workflow title. This has no effect on the workflow.
- Updated to wf-template v5.6.2, changing:
- Reduce verbosity of debug logging from fastcat which can occasionally occlude errors found in FASTQ files during ingress.
- Log banner art to say "EPI2ME" instead of "EPI2ME Labs" to match current branding. This has no effect on the workflow outputs.
- pre-commit configuration to resolve an internal dependency problem with flake8. This has no effect on the workflow.
- Stringtie updated to v2.2.3, which fixes stalling at transcriptome assembly step.
- Gffcompare updated to v0.12.6, which fixes issue where ref_gene_id was assigned an nan value.
- Updated to wf-template v5.6.2, fixing:
- Sequence summary read length N50 incorrectly displayed minimum read length, it now correctly shows the N50.
- Sequence summary component alignment and coverage plots failed to plot under some conditions.
- Error in
deAnalysisprocess -mode(counts) %in% "numeric" is not TRUE- caused by hyphens in sample sheet aliases. - Error in
deAnalysisprocess -values in 'transcripts$tx_strand' must be "+" or "-".- The workflow will now filter out any unstranded annotations from downstream analysis and log a warning.
- Missing
results_dexseq.tsvfile when--de_analysisenabled.
split_bamandbuild_minimap_index_transcriptomeprocess memory allocation increased.- Updated recommended memory requirement.
- Updated project description.
- A common user issue is providing a ref_annotation and ref_genome parameter that have mismatched reference IDs, which causes the DE_analysis to fail. The workflow will now do an upfront check and give an error message if no overlap is found or a warning if some IDs are present in one file but not in the other.
- Reconciled workflow with wf-template v5.5.0.
- Sort the columns and rows of the gene and transcript count files.
- DE_analysis alignment summary stats table no longer includes MAPQ or quality scores. MAPQ is not relevant for transcript alignment and quality scores are already available in the read summary section of the report.
all_gene_counts.tsvcontained the DE counts results.- Reduced memory usage of the report workflow process.
- Output BAM alignments in all cases unless the workflow is run with
transcriptome_sourceset toprecomputed. - Corrected the demo command in the
README.md. - The merged transcriptome generated for differential expression analysis now only contains the exons and not the full genomic sequence.
- Output the gene name annotated differential expression analysis count files only.
- Only use full length reads in the differential expression analysis.
merge_gff_comparefailing with empty GFF files.
- v1.5.0 bug; access to undefined channel output bug when using precomputed transcriptome.
- Bug where incorrect gene_id assigned in the DE tables.
- Workflow report updated to use
ezcharts.
- Exons per isoforms histogram reporting incorrect numbers.
- Output the
results_dexseq.tsvfile when--de_analysisenabled.
- per-class gffcompare tracking files as there exists a combine tracking file.
--igvparameter (default: false) for outputting IGV config allowing visualisation of read alignments in the EPI2ME App.- If required for IGV, reference indexes are output in to a
igv_referencedirectory
- BAMS are output in to a BAMS directory.
- Reconcile with template 5.2.6.
- Fusion detection subworkflow, as the functionality is not robust enough for general use at this time.
- Updated pychopper to 2.7.10
- new
cdna_kitoptions: PCS114 and PCB111/114
- Increase some memory and CPU allocations.
- Workflow now accepts BAM or FASTQ files as input (using the --bam or --fastq parameters, respectively).
- MA plot in the
results_dge.pdfhas been updated to match the MA plot in the report.
- Error message when running in
de_analysismode andref_annotationinput file contains unstranded annotations.
- Improved handling of different annotation file types (eg.
.gtf/.gff/.gff3) inde_analysismode. - Improved handling of annotation files that do not contain version numbers in transcript_id (such as gtf's from Ensembl).
- Differential expression failing with 10 or more samples.
- Regression causing the DE analysis numeric parameters to not be evaluated correctly.
- Improve documentation around filtering of transcripts done before DTU analysis.
- Renamed files:
de_analysis/all_counts_filtered.tsvtode_analysis/filtered_transcript_counts_with_genes.tsvde_analysis/de_tpm_transcript_counts.tsvtode_analysis/unfiltered_tpm_transcript_counts.tsv
- Minimum memory requirements to
32 GB.
- Published isoforms table to output directory.
- Output additional
de_analysis/cpm_gene_counts.tsvwith counts per million gene counts. - Output additional
de_analysis/unfiltered_transcript_counts_with_genes.tsvwith unfiltered transcript counts with associated gene IDs. - Add gene name column to the de_analysis counts TSV files.
- Mapping stage using a single thread only.
- More memory assigned to the fusion detection process.
- When no
--ref_annotationis provided the workflow will still run but the output transcripts will not be annotated. However--de_analysismode still requires a--ref_annotation.
- Published minimap2 and pychopper results to output directory.
- Two extra pychopper parameters
--cdna_kitand--pychopper_backend.--pychopper_optionsis still available to define any other options. - Memory requirements for each process.
- Documentation.
- When Jaffa is run only output one report.
- Sample sheet must include a
controltype to indicate which samples are the reference for the differential expression pipeline.
- Default local executor CPU and RAM limits.
- Updated docker container with Pychopper to support LSK114.
- Remove dead links from README
- Denovo
--transcriptome_sourceoption.
- Handling for input reference transcriptome headers that contain
|
- Improve differential expression outputs.
- Include transcript and gene count tables in DE_final folder.
- If differential expression subworkflow is used a non redundant transcriptome will be output which includes novel transcripts.
- Added wording to the report about how to identify novel transcripts in the DE tables.
- Nextflow minimum required version to 23.04.2
--minimap_index_optsparameter has been changed tominimap2_index_optsfor consistency.
- An additional gene name column to the differential gene expression results. This is especially handy for transcriptomes where the gene ID is not the same as gene name (e.g. Ensembl).
- Wording to the report about how to identify novel transcripts in the DE tables.
- Any sample aliases that contain spaces will be replaced with underscores.
- Updated documentation to explain we only support Ensembl, NCBI and ENCODE annotation file types.
- Documentation parameter examples corrected.
- Handling for annotation files that use gene as gene_id attribute.
- Handling for Ensembl annotation files.
- GitHub issue templates
- Condition sheet is no longer required. The sample sheet is now used to indicate condition instead.
- For differential expression, the sample sheet must have a
conditioncolumn to indicate which condition group each sample in the sample sheet belongs to. - Values for the condition may be any two distinct strings, for example: treated/untreated; sample/control etc.
- For differential expression, the sample sheet must have a
- Remove default of null for
--ref_transcriptome. - Read mapping summary table in the report has correct sample_ids.
- Handling for GFF3 reference_annotation file type.
- Warning for the
--transcriptome_sourcedenovo pipeline option.
- Enum choices are enumerated in the
--helpoutput - Enum choices are enumerated as part of the error message when a user has selected an invalid choice
- Bumped minimum required Nextflow version to 22.10.8
- Replaced
--threadsoption in fastqingress with hardcoded values to remove warning about undefinedparam.threads - Fix for the
--transcriptome_sourcedenovo pipeline option.
- Handling for GFF3 reference_annotation file type.
- Handling gzip input reference and annotation parameters.
- Handling for NCBI gtfs that contain some empty transcript ID fields.
- LICENSE to Oxford Nanopore Technologies PLC. Public License Version 1.0.
- Configuration for running demo data in AWS
- Condition sheet parameter description fixed to CSV
- Update fastqingress
- Simplify JAFFAL docs
- Description in manifest
-profile condais no longer supported, users should use-profile standard(Docker) or-profile singularityinsteadnextflow run epi2me-labs/wf-transcriptomes --versionwill now print the workflow version number and exit- Use parameter
--transcriptome-sourceto define precalculated, reference-based or denovo
- Removed sanitize option
- Reduce size of differential expression data.
- Improved DE explanation in docs
- Option to turn off transcript assembly steps with param transcript_assembly
- Fix JAFFAL terminating workflow when no fusions found.
- Error if condition sheet and sample sheet don't match.
- Failed to plot DE graphs when one of data sets is 0 length.
- Differential transcript and gene expression subworkflow
- JAFFAL fusion detection subworkflow
- Args parser for fastqingress
- Set out_dir option type to ensure output is written to correct directory on Windows
- Skip unnecessary conversion to fasta from fastq
- Fastqingress metadata map
- Changed workflow name to wf-transcriptomes
- Better help text on cli
- Use EPI2ME Labs-maintained version of pychopper
- direct_rna option
- Some extra error handling
- Minor report display improvements
- Incorrect numbers and of transcripts caused by merging gff files with same gene and transcript ids
- Error handling in de novo pipeline. Skip clusters in build_backbones that cause an isONclust2 error
- Several small fixes in report plotting
- Added the denovo pipeline
- Updates to the report plots
- First release
- Initial port of Snakemake WF from https://github.com/nanoporetech/pipeline-nanopore-ref-isoforms