From 31ffe7a0bcf6c8d59512e91573c097ada94b7ba6 Mon Sep 17 00:00:00 2001 From: Susanna Kiwala Date: Thu, 4 Dec 2025 14:48:43 -0600 Subject: [PATCH 1/4] Second attempt at updating the course to pVACtools 6.0 --- 01-intro.Rmd | 6 +- 02-prerequisites.Rmd | 26 ++++- 03-running_pvactools.Rmd | 190 ++++++++++++++++++++++++++++++++--- 04-outputs.Rmd | 208 ++++++++++++++++++++++++++++++--------- 05-pvacview_tour.Rmd | 95 ++++++++++-------- 5 files changed, 421 insertions(+), 104 deletions(-) diff --git a/01-intro.Rmd b/01-intro.Rmd index b9eec92..9fa0f96 100644 --- a/01-intro.Rmd +++ b/01-intro.Rmd @@ -5,7 +5,7 @@ ottrpal::set_knitr_image_path() # Introduction -This course has been developed recently (Summer 2023). We welcome any feedback at help@pvactools.org or by submission of [GitHub issues](https://github.com/griffithlab/pVACtools_Intro_Course/issues). +This course was developed in Summer 2023 and updated in Fall 2025. We welcome any feedback at help@pvactools.org or by submission of [GitHub issues](https://github.com/griffithlab/pVACtools_Intro_Course/issues). ## Motivation @@ -19,12 +19,12 @@ allele expression, peptide binding affinities, and determination of whether a mu users to efficiently generate, review, and interpret results, selecting candidate peptides for individual experiments or patient vaccine designs. Additional modules support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All -of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq, pVACfuse, and pVACbind), +of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq, pVACfuse, pVACsplice, and pVACbind), prioritization, and selection using a graphical Web-based interface (pVACview), and design of DNA vector–based vaccines (pVACvector) and synthetic long peptide vaccines. pVACtools is available at [http://www.pvactools.org](http://www.pvactools.org). ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "pVACtools is a cancer immunotherapy tools suite"} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g22b1533a196_0_0") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a37485c18b_1_0") ``` ## Background diff --git a/02-prerequisites.Rmd b/02-prerequisites.Rmd index a827e43..56c2c79 100644 --- a/02-prerequisites.Rmd +++ b/02-prerequisites.Rmd @@ -63,6 +63,9 @@ install.packages("RCurl", dependencies=TRUE) install.packages("curl", dependencies=TRUE) install.packages("string", dependencies=TRUE) install.packages("shinycssloaders", dependencies=TRUE) +install.packages("plotly", dependencies=TRUE) +install.packages("shinyWidgets", dependencies=TRUE) +install.packages("colourpicker", dependencies=TRUE) ``` ## Data @@ -97,6 +100,10 @@ For pVACfuse: - `star-fusion.fusion_predictions.tsv`: A STARFusion prediction file with fusion read support and expression information. +For pVACsplice: + +- `HCC1395.splice_junctions.tsv`: A RegTools junctions output TSV file + General: - `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use @@ -109,7 +116,24 @@ wget https://raw.githubusercontent.com/griffithlab/pVACtools_Intro_Course/main/H unzip HCC1395_inputs.zip ``` +Additionally, to run pVACsplice, a set of reference files are required that +are too large to include in the HCC1395_inputs.zip archive. To download these +manually please run the following commands: + +```{r, engine = 'bash', eval = FALSE} +cd HCC1395_inputs +wget https://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/Homo_sapiens.GRCh38.105.chr.gtf.gz +wget http://genomedata.org/pmbio-workshop/references/genome/all/ref_genome.tar +tar -xf ref_genome.tar ref_genome.fa.gz ref_genome.fa.fai +gunzip ref_genome.fa.gz +rm -rf ref_genome.tar +``` + +This will add the following reference files + +- `ref_genome.fa` and `.fai`: A reference DNA FASTA file and index +- `Homo_sapiens.GRCh38.105.chr.gtf.gz`: A reference GTF file + This course will not cover the required pre-processing steps for the pVACtools input data but extensive instructions on how to prepare your own data for use with pVACtools can be found at [pvactools.org](http://www.pvactools.org). - diff --git a/03-running_pvactools.Rmd b/03-running_pvactools.Rmd index 190feb8..b4c148f 100644 --- a/03-running_pvactools.Rmd +++ b/03-running_pvactools.Rmd @@ -23,11 +23,11 @@ mkdir pVACtools_outputs docker run \ -v ${PWD}/HCC1395_inputs:/HCC1395_inputs \ -v ${PWD}/pVACtools_outputs:/pVACtools_outputs \ --it griffithlab/pvactools:4.0.0 \ +-it griffithlab/pvactools:6.0.3 \ /bin/bash ``` -This will pull the 4.0.0 version of the griffithlab/pvactools Docker image and +This will pull the 6.0.3 version of the griffithlab/pvactools Docker image and start an interactive session (`-it`) of that Docker image using the bash shell (`/bin/bash`). The `-v ${PWD}/HCC1395_inputs:/HCC1395_inputs` part of the command will mount the @@ -65,12 +65,12 @@ following order: alleles can be specified using a comma-separated list. These should be the HLA alleles of your patient/sample. You might have clinical typing information for your patient. If not, you will need to computationally predict the patient's - HLA type using software such as OptiType. The the HLA allele names should - be in the following format: `HLA-A*02:01`. + HLA type using software such as OptiType. The HLA allele names should + be in the following format: `HLA-A*02:01`. - `prediction_algorithms`: The epitope prediction algorithms to use. Multiple prediction algorithms can be specified, separated by spaces. Use `all` to run all available prediction algorithms. pVACseq will automatically determine - which algorithms are valid for each HLA allele. + which algorithms are valid for each HLA allele. - `output_dir`: The directory for writing all result files. ### Optional Parameters for pVACseq @@ -123,8 +123,14 @@ your run. Here are a list of parameters we generally recommend: - `--percentile-threshold`: When considering the peptide-MHC binding affinity for filtering and prioritizing neoantigen candidates, by default only the IC50 value is being used. Setting this parameter will additionally also filter - on the predicted percentile. We recommend a value of 0.01 (1%) for this + on the predicted percentile. We recommend a value of 2 (2%) for this threshold. +- `--percentile-threshold-strategy`: When running pVACseq with a + `--percentile-threshold` set, this parameter will influence how both the + IC50 cutoff and the percentile cutoff are applied. The default, + `conservative`, will require a candidate to pass both the binding and the + percentile threshold, while the `exploratory` option will require a candidate + to only pass either the binding or the percentile threshold. Additionally there are a number of parameters that might be useful depending on your specific analysis needs: @@ -141,6 +147,18 @@ on your specific analysis needs: unstable. This parameter allows users to set their own rules as to which peptides are considered problematic and peptides meeting those rules will be marked in the pVACseq results and deprioritized. +- `--transcript-prioritization-strategy` and + `--maximum-transcript-support-level`: Generally, multiple transcripts of a + gene may code for a neoantigen candidate. When picking the best transcript + coding for a candidate, the transcript prioritization strategy controls + which factors to consider. By default the MANE Select status, the canonical + status, and the Transcript Support Level (TSL) are all considered and any + transcript meeting at least one of the specified factors will be considered + as the best transcript. However, a more stringent approach might be desired, + in which case the strategy could be adjusted to, for example, only consider + the MANE Select status or the canonical status of a transcript. The maximum + transcript support level parameters controls the TSL cutoff when considering + TSL as a factor. - `--threads`: This argument will allow pVACseq to run in multi-processing mode. - `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACseq. @@ -191,7 +209,7 @@ all \ --iedb-install-directory /opt/iedb \ --pass-only \ --allele-specific-binding-thresholds \ ---percentile-threshold 0.01 \ +--percentile-threshold 2 \ --allele-specific-anchors \ --run-reference-proteome-similarity \ --peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \ @@ -203,7 +221,7 @@ all \ ## Running pVACfuse -pVACfuse is run to in order to predict neoantigens from fusion events. The +pVACfuse is run in order to predict neoantigens from fusion events. The pipeline uses annotated fusion calls from either AGFusion or Arriba for this purpose. These annotators already include the fusion peptide sequence in their outputs which pVACfuse uses to extract neoantigens around the fusion position. @@ -231,7 +249,7 @@ following order: ### Optional Parameters for pVACfuse -In addition to the required parameters, the `pvacseq run` command also offers +In addition to the required parameters, the `pvacfuse run` command also offers optional arguments to fine-tune your run. You will find a lot of overlap between pVACfuse and pVACseq parameters and the same general considerations usually apply. Here are a list of parameters we generally recommend: @@ -259,8 +277,14 @@ usually apply. Here are a list of parameters we generally recommend: - `--percentile-threshold`: When considering the peptide-MHC binding affinity for filtering and prioritizing neoantigen candidates, by default only the IC50 value is being used. Setting this parameter will additionally also filter - on the predicted percentile. We recommend a value of 0.01 (1%) for this + on the predicted percentile. We recommend a value of 2 (2%) for this threshold. +- `--percentile-threshold-strategy`: When running pVACfuse with a + `--percentile-threshold` set, this parameter will influence how both the + IC50 cutoff and the percentile cutoff are applied. The default, + `conservative`, will require a candidate to pass both the binding and the + percentile threshold, while the `exploratory` option will require a candidate + to only pass either the binding or the percentile threshold. Additionally there are a number of parameters that might be useful depending on your specific analysis needs: @@ -273,7 +297,7 @@ on your specific analysis needs: Cysteine is commonly considered problematic as it makes the peptide unstable. This parameter allows users to set their own rules as to which peptides are considered problematic and peptides meeting those rules will be marked in the - pVACseq results and deprioritized. + pVACfuse results and deprioritized. - `--threads`: This argument will allow pVACfuse to run in multi-processing mode. - `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACfuse. @@ -294,7 +318,7 @@ identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the patient's class II a For pVACfuse the sample name is not used for any parsing so it doesn't need to match any specific information in the AGFusion results. It is only used for naming result files. For consistency we will use the same `HCC1395_TUMOR_DNA` -sample name we used in pVACfuse. +sample name we used in pVACseq. For our test run, please execute the `pvacfuse run` command below. The prediction run might take a while but pVACfuse will output progress messages as @@ -309,7 +333,147 @@ all \ /pVACtools_outputs/pvacfuse_predictions \ --iedb-install-directory /opt/iedb \ --allele-specific-binding-thresholds \ ---percentile-threshold 0.01 \ +--percentile-threshold 2 \ +--run-reference-proteome-similarity \ +--peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \ +--problematic-amino-acids C \ +--downstream-sequence-length 100 \ +--n-threads 8 \ +--keep-tmp-files +``` + +## Running pVACsplice + +pVACsplice is run in order to predict neoantigens from tumor-specific alternative +splicing patterns. The pipeline uses splice site variants predicted by RegTools +for this purpose. The RegTools output is used by pVACsplice in combination with a +GTF file to construct peptide sequences for the alternative splicing patterns and +extract neoantigens around the splice site. + +The pVACsplice pipeline is run using the `pvacsplice run` command. + +### Required Parameters for pVACsplice + +The `pvacsplice run` command takes a number of required parameters in the +following order: + +- `input_file`: A RegTools junctions output TSV file. +- `sample_name`: The name of the tumor sample being processed. +- `allele(s)`: The name of the HLA allele to use for epitope prediction. Multiple + alleles can be specified using a comma-separated list. These should be the + HLA alleles of your patient. You might have clinical typing information for + your patient. If not, you will need to computationally predict the patient's + HLA type using software such as OptiType. +- `prediction_algorithms`: The epitope prediction algorithms to use. Multiple + prediction algorithms can be specified, separated by spaces. Use `all` to + run all available prediction algorithms. +- `output_dir`: The directory for writing all result files. +- `annotated_vcf`: A VEP-annotated single- or multi-sample VCF containing + genotype and transcript information. This is generally the same input VCF + used for pVACseq. +- `ref_fasta`: A reference DNA FASTA file. +- `gtf_file`: A reference GTF file. + +### Optional Parameters for pVACsplice + +In addition to the required parameters, the `pvacsplice run` command also offers +optional arguments to fine-tune your run. You will find a lot of overlap +between pVACsplice, pVACfuse, and pVACseq parameters, and the same general considerations +usually apply. Here is a list of parameters we generally recommend: + +- `--iedb-install-directory`: For speed and reliability, we generally recommend + that users use a standalone installation of the IEDB software. The pVACtools + Docker containers already come with this software pre-installed in the + `/opt/iedb` directory. +- `--allele-specific-binding-thresholds`: When filtering and tiering + neoantigen candidates, one main criteria is the predicted peptide-MHC + binding affinity. By default, pVACfuse uses a cutoff of <500 nmol IC50. + However, for some HLA alleles, other cutoffs are more appropriate depending + on the distribution of binding affinities across peptides. Setting + this flag enables allele-specific binding cutoffs as recommended by + [IEDB](https://help.iedb.org/hc/en-us/articles/114094152371-What-thresholds-cut-offs-should-I-use-for-MHC-class-I-and-II-binding-predictions). +- `--run-reference-proteome-similarity`: One consideration when selecting + neoantigen candidates is that the neoantigen should not occur natively in + the patient's proteome. When this flag is set, pVACfuse will search for each + neoantigen candidate in the reference proteome and report any hits found. + By default this is done using BLASTp, but we recommend using a proteome FASTA + file via the `--peptide-fasta` parameter to speed up this step. +- `--percentile-threshold`: When considering the peptide-MHC binding affinity + for filtering and prioritizing neoantigen candidates, by default only the + IC50 value is being used. Setting this parameter will additionally filter + on the predicted percentile. We recommend a value of 2 (2%) for this + threshold. +- `--percentile-threshold-strategy`: When running pVACsplice with a + `--percentile-threshold` set, this parameter will influence how both the + IC50 cutoff and the percentile cutoff are applied. The default, + `conservative`, will require a candidate to pass both the binding and the + percentile threshold, while the `exploratory` option will require a candidate + to only pass either the binding or the percentile threshold. + +Additionally there are a number of parameters that might be useful depending +on your specific analysis needs: + +- `--class-i-epitope-length` and `--class-ii-epitope-length`: By default 8, + 9, 10, 11 and 12, 13, 14, 15, 16, 17, 18 are set for these parameters, + respectively, but different lengths might be desired. +- `--problematic-amino-acids`: Some vaccine manufacturers will consider certain amino + acids in the neoantigen candidates difficult to manufacture. For example, a + Cysteine is commonly considered problematic as it makes the peptide + unstable. This parameter allows users to set their own rules as to which + peptides are considered problematic and peptides meeting those rules will be marked in the + pVACsplice results and deprioritized. +- `--transcript-prioritization-strategy` and + `--maximum-transcript-support-level`: Generally, multiple transcripts of a + gene may code for a neoantigen candidate. When picking the best transcript + coding for a candidate, the transcript prioritization strategy controls + which factors to consider. By default the MANE Select status, the canonical + status, and the Transcript Support Level (TSL) are all considered and any + transcript meeting at least one of the specified factors will be considered + as the best transcript. However, a more stringent approach might be desired, + in which case the strategy could be adjusted to, for example, only consider + the MANE Select status or the canonical status of a transcript. The maximum + transcript support level parameters controls the TSL cutoff when considering + TSL as a factor. +- `--threads`: This argument will allow pVACfuse to run in multi-processing + mode. +- `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACsplice. +- `--downstream-sequence-length`: For frameshift fusions, the downstream + sequence can potentially be very long, which can be computationally + expensive. This parameter limits how many amino acids of the downstream + sequence are included in the prediction. We often set a limit of `100`. + +### pVACsplice Command + +Given the considerations outlined above, let's run pVACfuse on our sample data. + +As with pVACsplice and pVACfuse, we can use the `optitype_normal_result.tsv` file to identify the patient's +class I HLA alleles. These are HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02. +We also have clinical typing information that confirms these class I alleles as well as +identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles. + +As with pVACseq, the sample name needs to match the tumor sample ID in the input +VCF #CHROM header. Because the input VCF used in pVACsplice is the same as the +one used in pVACseq, we will use the same `HCC1395_TUMOR_DNA` sample name. + +For our test run, please execute the `pvacsplice run` command below. The +prediction run might take a while but pVACsplice will output progress messages as +it runs through the pipeline. + + +```{r, engine = 'bash', eval = FALSE} +pvacsplice run \ +/HCC1395_inputs/HCC1395.splice_junctions.tsv \ +HCC1395_TUMOR_DNA \ +HLA-A*29:02,HLA-B*45:01,HLA-B*82:02,HLA-C*06:02,DQA1*03:03,DQB1*03:02,DRB1*04:05 \ +all \ +/pVACtools_outputs/pvacsplice_predictions \ +/HCC1395_inputs/annotated.expression.vcf.gz \ +/HCC1395_inputs/ref_genome.fa \ +/HCC1395_inputs/Homo_sapiens.GRCh38.105.chr.gtf.gz \ +--normal-sample-name HCC1395_NORMAL_DNA +--iedb-install-directory /opt/iedb \ +--allele-specific-binding-thresholds \ +--percentile-threshold 2 \ --run-reference-proteome-similarity \ --peptide-fasta /HCC1395_inputs/Homo_sapiens.GRCh38.pep.all.fa.gz \ --problematic-amino-acids C \ diff --git a/04-outputs.Rmd b/04-outputs.Rmd index e68c9ea..5788074 100644 --- a/04-outputs.Rmd +++ b/04-outputs.Rmd @@ -14,7 +14,7 @@ This chapter will cover: ## pVACtools Output Files -Both pVACseq and pVACfuse produce three main output files: +pVACseq, pVACfuse, and pVACsplice all produce three main output files: - The `all_epitopes.tsv` file is a TSV file with all predicted neoantigens and all information obtained during the run. @@ -23,13 +23,13 @@ Both pVACseq and pVACfuse produce three main output files: the user during the run. The filters will be further explained in subsequent sections. - The `aggregated.tsv` is a condensed output file that contains only the - information most pertinent to interpret the results. It has contains only + information most pertinent to interpret the results. It contains only the best neoantigen candidate for each variant. Our heuristic for determining the best neoantigen is described in subsequent sections of this course. -There are also a number of a secondary output files produced by pVACseq and -pVACfuse. The most important are: +There are also a number of a secondary output files produced by pVACseq, +pVACfuse, and pVACsplice. The most important are: - `aggregated.metrics.json`: The file is only produced by pVACseq. It contains metadata needed for visualizing your results in pVACview. @@ -54,9 +54,9 @@ methods on the calls for each neoantigen candidate and HLA allele combination: (1) pVACtools calculates the median IC50 binding affinity for all selected prediction algorithms (reported in the `Median [MT] IC50 Score` column), and (2) pVACtools selects the IC50 binding affinity prediction with the lowest value (reported in the -`Best [MT] IC50 Score)` column. By default, +`Best [MT] IC50 Score` column). By default, the binding filter is applied to the median IC50 score unless -users set the `--top-score-metric` parameters to `lowest`. +users set the `--top-score-metric` parameter to `lowest`. The binding filter discards candidates where the binding affinity is above the `--binding-threshold` (default: 500). However, users may set the @@ -85,6 +85,12 @@ enabled additional filtering on related metrics: percentile scores for the range of scores reported by the prediction algorithms chosen by the user and which on is used for filtering is again controlled by the `--top-score-metric` parameter. +- `--percentile-threshold-strategy`: When running pVACtools with a + `--percentile-threshold` set, this parameter will influence how both the + binding cutoff and the percentile cutoff are applied. The default, + `conservative`, will require a candidate to pass both the binding and the + percentile threshold while the `exploratory` option will require a candiate + to only pass either the binding or the percentile threshold. ### Coverage Filter @@ -93,19 +99,19 @@ enough read support or expression. This ensures that the remaining variants are not just artifacts and that the genes are actually expressed in the patient's RNA. -For pVACseq, this generally relies on your VCF being annotated with coverage +For both pVACseq and pVACsplice, this generally relies on your VCF being annotated with coverage and expression data. In our example, the VCF has already been annotated with this data. For more information about how to add [coverage](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html) and [expression data](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html) to your own VCFs, please see our docs. Additionally, filtering on the normal DNA depth and variant allele frequency (VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample -to be identifies in your pVACseq run using the `--normal-sample-name` +to be identified in your pVACseq/pVACsplice run using the `--normal-sample-name` parameter. If a coverage metric doesn't apply because the underlying data is not available, `NA` is reported by pVACtools. By default, the filter will skip evaluating a coverage criteria when a neoantigen's value for it is `NA`. -The following thresholds are applied in pVACseq by this filter: +The following thresholds are applied in pVACseq and pVACsplice by this filter: - `--normal-cov`: Normal coverage cutoff. Minimum number of required reads in the normal DNA (default: 5). - `--tdna-cov`: Tumor DNA coverage cutoff. Minimum number of required reads in the tumor DNA (default: 10). @@ -119,7 +125,9 @@ For pVACfuse, this filter evaluates a fusion variant's fusion read support and f Arriba natively outputs a number of read metrics. These are the number of supporting split fragments with an anchor in gene1 or gene2, respectively, as well as the number of pairs (fragments) of discordant mates supporting the fusion (a.k.a. spanning reads or bridge reads). The sum of these three values is -reported as Read Support in pVACfuse. The fusion transcript expression is +reported as Read Support in pVACfuse. For AGFusion, the read support is parsed +from the `--starfusion-file` as the sum of the JunctionReadCount and +SpanningFragCount. For both types of input, the fusion transcript expression is parsed from the `--starfusion-file`, when provided. This is reported as FFPM (fusion fragments per million total reads). @@ -128,18 +136,24 @@ The following thresholds are applied in pVACfuse by this filter: - `--read-support`: Read Support cutoff. Sites above this cutoff will be considered (default: 5). - `--expn-val`: Expression cutoff. Sites above this cutoff will be considered (default: 0.1). -### Transcript Support Level Filter +### Transcript Filter -The Transcript Support Level (TSL) Filter removes neoantigen candidates for -transcripts with a high TSL, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl). -The cutoff for this filter is set by the `--maximum-transcript-support-level` -parameter. Transcripts with a TSL of NA will always be filtered out. +The Transcript Filter removes neoantigens resulting from transcripts that are +considered poor candidates. To determine whether a transcript is poor, the +`--transcript-prioritization-strategy` parameter is used. This parameter +defines a list of criteria to consider. These are: -Annotation with TSL values through VEP is only available for GRCh38. For other -species and older builds, a value of "Not Supported" is written to the report -and the TSL filter will skip those variants. +- `mane_select`: MANE Select status of the transcript +- `canonical`: Canonical status of the transcripts +- `tsl`: Whether or not the Transcript Support Level (TSL) meets the + `--maximum-transcript-support-level`, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl). -This filter is currently only run by pVACseq. +Users may specify one or more of these criteria as their strategy. A transcript +meeting at least one of the specified criteria is considered a +good transcript and passes the transcript filter. Only transcripts meeting none of the +specified criteria will be filtered out. + +This filter is currently only run by pVACseq and pVACsplice. ### Top Score Filter @@ -151,14 +165,18 @@ variant we first group the transcripts into sets where all transcripts in a set code for the same set of neoantigen candidates. For each transcript set we then determine the best neoantigen candidate as follows: -- Pick all neoantigens with a variant transcript that have a protein_coding Biotype -- Of the remaining candidates, pick the ones with a variant transcript having a - TSL less then the `--maximum-transcript-support-level`. +- Pick all neoantigens with a variant transcript that doesn't have a Transcript CDS Flag +- Of the remaining candidates, pick the ones with a variant transcript that have a protein_coding Biotype +- Of the remaining candidates, pick the ones with a passing transcript according + to the selected `--transcript-prioritization-strategy` and `--maximum-transcript-support-level`. - Of the remaining candidates, pick the entries with no Problematic Positions. - Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in more detail further below). -- Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best - depending on the `--top-score-metric`), lowest TSL, and longest transcript. +- Sort the remaining candidates on the lowest MT IC50 Score or Percentile (Median or Best + depending on the `--top-score-metric`, Score or Percentile depending on the + `--top-score-metric2`), the transcript's MANE Select Status, the + transcript's Canonical status, the transcript's TSL, the transcript length, + and the transcript expression. Select the highest sorted entry. This filter then reports the best neoantigen candidate for each transcript set. @@ -169,6 +187,23 @@ for each transcript set is determined by picking the candidate with the lowest MT IC50 Score (Median or Best depending on the `--top-score-metric`) and the highest fusion transcript expression. +For pVACsplice, the neoantigen candidates are grouped into sets with the same splice +site Junction. From there, the best neoantigen candidate for each set is +determined very similarly to pVACseq: + +- Pick all neoantigens with a variant transcript that doesn't have a Transcript CDS Flag +- Of the remaining candidates, pick the ones with a variant transcript that have a protein_coding Biotype +- Of the remaining candidates, pick the ones with a passing transcript according + to the selected `--transcript-prioritization-strategy` and `--maximum-transcript-support-level`. +- Of the remaining candidates, pick the entries with no Problematic Positions. +- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in + more detail further below). +- Sort the remaining candidates on the lowest MT IC50 Score or Percentile (Median or Best + depending on the `--top-score-metric`, Score or Percentile depending on the + `--top-score-metric2`), the transcript's MANE Select Status, the + transcript's Canonical status, the transcript's TSL, the transcript WT + Protein Length, and the transcript expression. Select the highest sorted entry. + ## Interpreting the aggregated.tsv File The `aggregated.tsv` is a condensed output file that shows the best neoantigen @@ -179,23 +214,46 @@ a tier based on its suitability for vaccine manufacturing. Only epitopes meeting the `--aggregate-inclusion-threshold` are included in this report (default: 5000). Depending on the value used for the `--top-score-metric`, all neoantigen candidates with a Median or Best MT IC50 Score below the selected `--aggregate-inclusion-threshold` -are included in creating this report. +are included in creating this report. If this cutoff yields a lot of candidates, the +included neoantigens are pared to the best `--aggregate-inclusion-count-limit` candidates. ### Determining the Best Transcript and Best Peptide of a Variant In pVACseq, for each variant, all neoantigen candidates meeting the `--aggregate-inclusion-threshold` are evaluated as follows: -- Pick all entries with a variant transcript that have a protein_coding Biotype. -- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`. -- Of the remaining entries, pick the entries with no Problematic Positions. -- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below). -- Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best - depending on the `--top-score-metric`), lowest Transcript Support Level, and longest transcript. - -In pVACfuse, the neoantigen candidate with the lowest IC50 binding affinity for each variant is selected. +- Pick all neoantigens with a variant transcript that doesn't have a Transcript CDS Flag +- Of the remaining candidates, pick the ones with a variant transcript that have a protein_coding Biotype +- Of the remaining candidates, pick the ones with a passing transcript according + to the selected `--transcript-prioritization-strategy` and `--maximum-transcript-support-level`. +- Of the remaining candidates, pick the entries with no Problematic Positions. +- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in + more detail further below). +- Sort the remaining candidates on the lowest MT IC50 Score or Percentile (Median or Best + depending on the `--top-score-metric`, Score or Percentile depending on the + `--top-score-metric2`), the transcript's MANE Select Status, the + transcript's Canonical status, the transcript's TSL, the transcript length, + and the transcript expression. Select the highest sorted entry. + +In pVACfuse, the neoantigen candidate with the lowest IC50 binding affinity for each variant +and highest transcript expression is selected. The value used for the `--top-score-metric` determines whether the lowest or median binding affinity is used for this comparison. +For pVACsplice, the best neoantigen candidate for each variant is determined very similarly to pVACseq: + +- Pick all neoantigens with a variant transcript that doesn't have a Transcript CDS Flag +- Of the remaining candidates, pick the ones with a variant transcript that have a protein_coding Biotype +- Of the remaining candidates, pick the ones with a passing transcript according + to the selected `--transcript-prioritization-strategy` and `--maximum-transcript-support-level`. +- Of the remaining candidates, pick the entries with no Problematic Positions. +- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in + more detail further below). +- Sort the remaining candidates on the lowest MT IC50 Score or Percentile (Median or Best + depending on the `--top-score-metric`, Score or Percentile depending on the + `--top-score-metric2`), the transcript's MANE Select Status, the + transcript's Canonical status, the transcript's TSL, the transcript WT + Protein Length, and the transcript expression. Select the highest sorted entry. + The chosen entry determines the best neoantigen candidate and the best transcript coding for it. @@ -212,12 +270,16 @@ The Tiers available in pVACseq are: tabl <- " | Tier | Criteria | |------|----------| -| Pass | Best Peptide passes the binding, expression, tsl, clonal, and anchor criteria | -| Anchor | Best Peptide fails the anchor criteria but passes the binding, expression, tsl, and clonal criteria | -| Subclonal | Best Peptide fails the clonal criteria but passes the binding, tsl, and anchor criteria | -| LowExpr | Best Peptide meets the Low Expression Criteria and passes the binding, tsl, clonal, and anchor criteria | +| Pass | Best Peptide passes the binding, reference match, expression, transcript, clonal, problematic position, and anchor criteria | +| PoorBinder | Best Peptide fails the binding criteria but passes the reference match, expression, transcript, clonal, problematic position, and anchor criteria | +| RefMatch | Best Peptide fails the reference match criteria but passes the binding, expression, transcript, clonal, problematic position, and anchor criteria | +| PoorTranscript | Best Peptide fails the transcript criteria but passes the binding, reference match, expression, clonal, problematic position, and anchor criteria | +| LowExpr | Best Peptide meets the low expression criteria and passes the binding, reference match, transcript, clonal, problematic position, and anchor criteria | +| Anchor | Best Peptide fails the anchor criteria but passes the binding, reference match, expression, transcript, clonal, and problematic position criteria | +| Subclonal | Best Peptide fails the clonal criteria but passes the binding, reference match, expression, transcript, problematic position, and anchor criteria | +| ProbPos | Best Peptide fails the problematic position criteria but passes the binding, reference match, expression, transcript, clonal, and anchor criteria | +| Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria | | NoExpr | Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) | -| Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria | " cat(tabl) ``` @@ -228,16 +290,19 @@ cat(tabl) tabl <- " | Criteria | Description | Evaluation | |----------|-------------|------------| -| Binding Criteria | Pass if Best Peptide is a strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. | +| Binding Criteria | Pass if Best Peptide is a strong binder | binding score criteria: `IC50 MT < --binding-threshold` (`--allele-specific-binding-thresholds` flag is respected)
percentile score criteria (if `--percentile-threshold` parameters is set): `%ile MT < --percentile-threshold` (if parameter is set)
`conservative` `--percentile-threshold-strategy`: needs to pass BOTh the binding score criteria AND the percentile score criteria
`exploratory` `--percentile-threshold-strategy`: needs to pass EITHER the binding score criteria OR the percentile score criteria| | Expression Criteria | Pass if Best Transcript is expressed | Allele Expr > `--trna-vaf` * `--expn-val` | +| Reference Match Criteria | Pass if there are no reference protome matches | `Ref Match == False` | +| Transcript Criteria | Pass if Best Transcript matches any of the user-specified `--transcript-prioritization-strategy` criteria | `TSL <= --maximum-transcript-support level` (if strategy includes `tsl`)
`MANE Select == True` (if strategy includes `mane_select`)
`Canonical == True` (if strategy includes `canonical`) | | Low Expression Criteria | Peptide has low expression or no expression but RNA VAF and coverage | (0 < Allele Expr < `--trna-vaf` * `--expn-val`) OR (RNA Expr == 0 AND RNA Depth > `--trna-cov` AND RNA VAF > `--trna-vaf`) | -| TSL Criteria | Pass if Best Transcript has good transcript support level | TSL <= `--maximum-transcript-support-level` | -| Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > `--tumor-purity` / 4 | | Anchor Criteria | Fail if all mutated amino acids of the Best Peptide (Pos) are at an anchor position and the WT peptide has good binding (IC50 WT < `--binding-threshold`). `--allele-specific-binding-thresholds` flag is respected. | +| Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > `--tumor-purity` / 4 | +| Problematic Position Criteria | Best Peptide does not contain a problematic amino acid as defined by the `--problematic-amino-acids` parameter | `Prob Pos == None` " cat(tabl) ``` + #### Tiering in pVACfuse The Tiers available in pVACfuse are: @@ -246,23 +311,70 @@ The Tiers available in pVACfuse are: tabl <- " | Tier | Criteria | |------|---------| -| Pass | Best Peptide passes the binding, read support, and expression criteria | -| LowReadSupport | Best Peptide fails the read support criteria but passes the binding and expression criteria | -| LowExpr | Best Peptide fails the expression criteria but passes the binding and read support criteria | -| Poor | Best Peptide doesn’t fit any of the above tiers, usually if it fails two or more criteria or if it fails the binding criteria | +| Pass | Best Peptide passes the binding, reference match, read support, expression, and problematic position criteria | +| PoorBinder | Best Peptide fails the binding criteria but passes the reference match, read support, expression, and problematic position criteria | +| RefMatch | Best Peptide fails the reference match criteria but passes the binding, read support, expression, and problematic position criteria | +| LowReadSupport | Best Peptide fails the read support criteria but passes the binding, reference match, expression, and problematic position criteria | +| LowExpr | Best Peptide fails the expression criteria but passes the binding, reference match, read support, and problematic position criteria | +| ProbPos | Best Peptide fails the problematic position criteria but passes the binding, reference match, read support, and expression | +| Poor | Best Peptide doesn’t fit any of the above tiers, usually if it fails two or more criteria | " cat(tabl) ``` + **Criteria Details** ```{r pvacfuse_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} tabl <- " | Criteria | Description | Evaluation | |----------|-------------|------------| -| Binding Criteria | Pass if Best Peptide is strong binder | IC50 MT < `--binding-threshold` and %ile MT < `--percentile-threshold` (if parameter is set). `--allele-specific-binding-thresholds` flag is respected. | -| Read Support Criteria | Pass if the variant has read support | Read Support < `--read-support` | -| Expression Criteria | Pass if Best Transcript is expressed | Expr < `--expn-val` | +| Binding Criteria | Pass if Best Peptide is a strong binder | binding score criteria: `IC50 MT < --binding-threshold` (`--allele-specific-binding-thresholds` flag is respected)
percentile score criteria (if `--percentile-threshold` parameters is set): `%ile MT < --percentile-threshold` (if parameter is set)
`conservative` `--percentile-threshold-strategy`: needs to pass BOTh the binding score criteria AND the percentile score criteria
`exploratory` `--percentile-threshold-strategy`: needs to pass EITHER the binding score criteria OR the percentile score criteria| +| Read Support Criteria | Pass if variant has read support | Read Support < `--read-support` | +| Expression Criteria | Pass if Best Transcript is expressed | Expr > `--expn-val` | +| Reference Match Criteria | Pass if there are no reference protome matches | `Ref Match == False` | +| Problematic Position Criteria | Best Peptide does not contain a problematic amino acid as defined by the `--problematic-amino-acids` parameter | `Prob Pos == None` +" +cat(tabl) +``` + + +#### Tiering in pVACsplice + +The Tiers available in pVACsplice are: + +```{r pvacsplice_tiers, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} +tabl <- " + +| Tier | Criteria | +|------|----------| +| Pass | Best Peptide passes the binding, reference match, expression, transcript, clonal, and problematic position criteria | +| PoorBinder | Best Peptide fails the binding criteria but passes the reference match, expression, transcript, clonal, and problematic position criteria | +| RefMatch | Best Peptide fails the reference match criteria but passes the binding, expression, transcript, clonal, and problematic position criteria +| PoorTranscript | Best Peptide fails the transcript criteria but passes the binding, reference match, expression, clonal, and problematic position criteria | +| LowExpr | Best Peptide meets the low expression criteria and passes the binding, reference match, transcript, clonal, and problematic position criteria | +| Subclonal | Best Peptide fails the clonal criteria but passes the binding, reference match, expression, transcript, and problematic position criteria | +| ProbPos | Best Peptide fails the problematic position criteria but passes the binding, reference match, expression, transcript, and clonal criteria | +| Poor | Best Peptide doesn’t fit in any of the above tiers, usually if it fails two or more criteria | +| NoExpr | Best Peptide is not expressed (RNA Expr == 0 or RNA VAF == 0) | +" +cat(tabl) +``` + + +**Criteria Details** + +```{r pvacfuse_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} +tabl <- " +| Criteria | Description | Evaluation | +|----------|-------------|------------| +| Binding Criteria | Pass if Best Peptide is a strong binder | binding score criteria: `IC50 MT < --binding-threshold` (`--allele-specific-binding-thresholds` flag is respected)
percentile score criteria (if `--percentile-threshold` parameters is set): `%ile MT < --percentile-threshold` (if parameter is set)
`conservative` `--percentile-threshold-strategy`: needs to pass BOTh the binding score criteria AND the percentile score criteria
`exploratory` `--percentile-threshold-strategy`: needs to pass EITHER the binding score criteria OR the percentile score criteria| +| Expression Criteria | Pass if Best Transcript is expressed | Allele Expr > `--trna-vaf` * `--expn-val` | +| Reference Match Criteria | Pass if there are no reference protome matches | `Ref Match == False` | +| Transcript Criteria | Pass if Best Transcript matches any of the user-specified `--transcript-prioritization-strategy` criteria | `TSL <= --maximum-transcript-support level` (if strategy includes `tsl`)
`MANE Select == True` (if strategy includes `mane_select`)
`Canonical == True` (if strategy includes `canonical`) | +| Low Expression Criteria | Peptide has low expression or no expression but RNA VAF and coverage | (0 < Allele Expr < `--trna-vaf` * `--expn-val`) OR (RNA Expr == 0 AND RNA Depth > `--trna-cov` AND RNA VAF > `--trna-vaf`) | +| Clonal Criteria | Best Peptide is likely in the founding clone of the tumor | DNA VAF > `--tumor-purity` / 4 | +| Problematic Position Criteria | Best Peptide does not contain a problematic amino acid as defined by the `--problematic-amino-acids` parameter | `Prob Pos == None` " cat(tabl) ``` diff --git a/05-pvacview_tour.Rmd b/05-pvacview_tour.Rmd index 6b56724..c816193 100644 --- a/05-pvacview_tour.Rmd +++ b/05-pvacview_tour.Rmd @@ -80,17 +80,24 @@ The main table in the Aggregate Report of Best Candidates by Variant panel shows the best neoantigen candidate for each variant. It lists the gene and amino acid change of the variant as well as additional information about the best peptide and the best transcript coding for it. These -include, from left to right, the transcript support level, the best-binding HLA +include, from left to right, the transcript's MANE Select status, the transcript's +canonical status, the transcript support level, the best-binding HLA allele, the mutated positions of the best peptide, any positions in the peptide -where the amino acid might be problematic for manufacturing, and the total number +where the amino acid might be problematic for manufacturing, the total number of +neoantigen candidates resulting from this variant that are included with +additional detailed metadata, and the total number of neoantigen candidates passing the binding affinity threshold set by the user. If a gene of interest list was uploaded, variants on those genes have their gene highlighted with a green border. -Next, this table lists the IC50 peptide MHC binding affinity for both the mutant -and the wild type. It also shows the percentile scores of the binding affinity values. -For the mutant values, a heatmap coloring is applied to make it easier to visually -identify well-binding peptides. +Next, this table lists the IC50 peptide MHC binding affinity for both the Best Peptide +(MT) and the matched wild type (WT). These values are either a median of all of the +binding predictions made by the predictiors selected in your pVACseq run, or +the lowest binding prediction made. This depends on the value set for the +`--top-score-metric` in your run. The table also shows the median/lowest percentile +scores of all predictors that provide this value, again depending on the +`--top-score-metric`. For the mutant values, a heatmap coloring is applied to make it +easier to visually identify well-binding peptides. The next few columns show the coverage and expression of the best transcript with a bar plot background to represent where specific values fall across the entire @@ -105,21 +112,21 @@ The Ref Match column reflects whether or not the best peptide was found in the reference proteome which is undesired since such peptides are not novel and including them in a vaccine might lead to an auto immune response. -Users are able to set a status for each candidate in the Evaluation column to mark -them as Accept, Reject, or requiring further review. +Users are able to set a status for each candidate by clicking the appropriate +buttons: thumbs-up (Accept), thumbs-down (Reject), or Flag (requires further review). -The Investigate button can be clicked to see more detail for a variant. This +Clicking on the row for a variant will select it. This will update the lower panels with details for the selected variant. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Upon successfully uploading the relevant data files, you can explore the different aspects of your neoantigen candidates."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g2491f283519_0_8") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_0") ``` For candidates not sorted into the Pass tier, red borders visually highlight the attributes failed by the candidate. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Neoantigen candidates are binned into tiers depending on their suitability for vaccine creation."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_8") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_6") ``` ### Variant Information @@ -128,7 +135,7 @@ The Variant Information panel shows more variant-level details of the selected neoantigen candidate. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Variant Information tab shows more details for the selected variant."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_14") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_12") ``` On the left is a three-tab section. The first tab, "Transcript Sets of Selected @@ -176,12 +183,12 @@ neoantigen candidates is desired so this panel makes it easy to review how many neoantigen candidates are still needed. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Peptide Evaluation Overview section shows how many neoantigen candidates have been accepted, rejected, marked for review, or are pending."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_80") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_18") ``` ### Transcript Set Detailed Data -When selecting a transcript set in the Variant Info panel, this panel will +When selecting a transcript set in the Variant Information panel, this panel will show details about the neoantigen candidates the transcripts in the set code for and as well as details on the transcripts themselves. @@ -189,7 +196,10 @@ The Peptide Candidates from Selected Transcript Set tab shows a list of mutant and matched wildtype peptides and their IC50 binding affinity to the patient HLA alleles. Only neoantigen candidates where at least one peptide-MHC binding prediction falls within the `--aggregate-inclusion-threshold` -will be shown in this table. For HLA alleles where the peptide is not +will be shown in this table. However, for variants resulting in a large number +of well-binding candidates (e.g. frameshift variant with a long downstream +sequence), the number of included candidates will be limited to the best +`--aggregate-inclusion-count-limit` candidates.For HLA alleles where the peptide is not well-binding the prediction details will show `X`. This table also shows the mutant position, whether or not the neoantigen candidate has any problematic positions, and whether or not it failed the anchor criteria. This helps in @@ -197,17 +207,33 @@ determining whether a neoantigen candidate was deprioritized when selecting the Best Peptide. The Best Peptide is highlighted in green. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Transcript Set Detailed Data panel shows binding prediction details for the neoantigens the transcripts in the set code for."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_32") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_24") ``` +The Anchor Heatmap tab shows a heatmap overlayed over each included neoantigen +candidate from the selected transcript set. A darker color represents a higher +probability that a position in the peptide is an anchor. Mutated positions are +represented by red letters. More information +about how to interpret the heatmap can be found in the graph on the right of this +panel. + +```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Anchor heatmaps show which positions in a peptide are likely to be anchors."} +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_30") +``` + +This tab also includes a table listing the anchor positions for each allele +and length combination. It also includes a table listing the +per-allele-and-length anchor weights at each position. + The Transcripts in Set tabs shows details of the transcripts in the selected -set such as the transcript Ensembl ID, the transcript expression, the -transcript support level, the biotype, and the transcript length. This -reflects the criteria used in determining the Best Transcript. The Best -Transcript is highlighted in green. +set such as the transcript Ensembl ID, the the transcript expression, the +transcript's MANE Select status, the transcript's canonical status, the +transcript support level, the biotype, whether the transcript has any CDS +Flags, and the transcript length. This reflects the criteria used in determining +the Best Transcript. The Best Transcript is highlighted in green. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Transcripts in Set tab shows details about the transcripts in the set."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_38") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_36") ``` ### Additional Peptide Information @@ -225,7 +251,7 @@ the median or lowest IC50 binding affinity used elsewhere. A solid line is used to represent the median score. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Additional Peptide Information panel shows more information for the peptide selected in the Transcript Set Detailed Data panel."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_44") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_42") ``` The %ile Plot tab shows a similar violin plot but for the predicted percentile @@ -233,32 +259,21 @@ scores as opposed to the IC50 binding affinity. A solid line is also used here to represent the median score. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The %ile Plot tab shows violin plots of the percentile score predicted by each algorithm."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_50") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_48") ``` The next tab, "Binding Data", shows the IC50 binding affinity and percentile score but in table format. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Binding Data tab shows a table of the IC50 binding affinity and percentile predicted by each algorithm."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_56") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_54") ``` The Elution Table tab shows the predicted elution scores and percentiles, if the appropriate prediction algorithm(s) were chosen. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Elution Table tab shows elution prediction scores and precentiles for the selected peptide."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_62") -``` - -Lastly, the Anchor Heatmap tab shows a heatmap overlayed over each neoantigen -candidate from the selected transcript set. A darker color represents a higher -probability that a position in the peptide is an anchor. Mutated positions are -represented by red letters. More information -about how to interpret the heatmap can be found in the graph on the right of this -panel. - -```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "The Anchor heatmaps show which positions in a peptide are likely to be anchors."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_68") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_60") ``` ## Regenerate Tiers with Custom Parameters @@ -267,14 +282,16 @@ During review of your data it might become apparent that different tiering thresholds would've been more approriate. pVACview allows you to re-tier your data with custom parameters by adjusting the sliders and inputs in the "Advanced Options: Regenerate Tiering with different parameters" panel and -pressing the "Recalculate Tiering with new parameters" button. +pressing the "Recalculate Tiering with new parameters" button. The tiering +thresholds currently applied to your data are listed in the "Current Parameters +for Tiering" panel. The parameters that were used in the original pVACseq run can still be viewed in the "Original Parameters for Tiering" panel and the tiers can be reset to those parameters by pressing the "Reset to original parameters" button. ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Users can re-tier the neoantigen candidates by adjusting the tiering thresholds."} -ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_87") +ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_66") ``` ## Adding Comments to Variants @@ -296,6 +313,6 @@ be exported by switching to the Export interface via the sidebar. This will export the Aggregated Report with the updated Evaluation column and comments added. The report can be exported in either TSV or Excel format. -```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Users can leave comments on each variant."} +```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Users can export the neoantigen candidate table after review has been completed."} ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g25ad9ce8c9b_0_99") ``` From efe26db94823ba859d9e919f979c7da4e1c9cd5a Mon Sep 17 00:00:00 2001 From: Susanna Kiwala Date: Thu, 4 Dec 2025 14:56:51 -0600 Subject: [PATCH 2/4] Fix table ID --- 04-outputs.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/04-outputs.Rmd b/04-outputs.Rmd index 5788074..e73ab4e 100644 --- a/04-outputs.Rmd +++ b/04-outputs.Rmd @@ -364,7 +364,7 @@ cat(tabl) **Criteria Details** -```{r pvacfuse_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} +```{r pvacsplice_tier_criteria, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} tabl <- " | Criteria | Description | Evaluation | |----------|-------------|------------| From 2cff9af0dc2525bba75ea9e63ddd4da7ecc7cb74 Mon Sep 17 00:00:00 2001 From: Susanna Kiwala Date: Thu, 4 Dec 2025 14:57:08 -0600 Subject: [PATCH 3/4] Fix typos --- 04-outputs.Rmd | 2 +- 05-pvacview_tour.Rmd | 4 ++-- 06-conclusions.Rmd | 2 +- resources/dictionary.txt | 6 ++++++ 4 files changed, 10 insertions(+), 4 deletions(-) diff --git a/04-outputs.Rmd b/04-outputs.Rmd index e73ab4e..10a2431 100644 --- a/04-outputs.Rmd +++ b/04-outputs.Rmd @@ -89,7 +89,7 @@ enabled additional filtering on related metrics: `--percentile-threshold` set, this parameter will influence how both the binding cutoff and the percentile cutoff are applied. The default, `conservative`, will require a candidate to pass both the binding and the - percentile threshold while the `exploratory` option will require a candiate + percentile threshold while the `exploratory` option will require a candidate to only pass either the binding or the percentile threshold. ### Coverage Filter diff --git a/05-pvacview_tour.Rmd b/05-pvacview_tour.Rmd index c816193..50eb40a 100644 --- a/05-pvacview_tour.Rmd +++ b/05-pvacview_tour.Rmd @@ -92,7 +92,7 @@ highlighted with a green border. Next, this table lists the IC50 peptide MHC binding affinity for both the Best Peptide (MT) and the matched wild type (WT). These values are either a median of all of the -binding predictions made by the predictiors selected in your pVACseq run, or +binding predictions made by the predictors selected in your pVACseq run, or the lowest binding prediction made. This depends on the value set for the `--top-score-metric` in your run. The table also shows the median/lowest percentile scores of all predictors that provide this value, again depending on the @@ -279,7 +279,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCG ## Regenerate Tiers with Custom Parameters During review of your data it might become apparent that different tiering -thresholds would've been more approriate. pVACview allows you to re-tier your +thresholds would've been more appropriate. pVACview allows you to re-tier your data with custom parameters by adjusting the sliders and inputs in the "Advanced Options: Regenerate Tiering with different parameters" panel and pressing the "Recalculate Tiering with new parameters" button. The tiering diff --git a/06-conclusions.Rmd b/06-conclusions.Rmd index d6fddbb..8e51eda 100644 --- a/06-conclusions.Rmd +++ b/06-conclusions.Rmd @@ -15,7 +15,7 @@ This chapter will summarize: ## Key conclusions In this course you will have gained a better understanding of the current best -practices for neoantigen indentification and prioritization. You will have +practices for neoantigen identification and prioritization. You will have learned how to run pVACtools, interpret pVACtools results, and select neoantigen candidates suitable for vaccine manufacturing using pVACview. diff --git a/resources/dictionary.txt b/resources/dictionary.txt index 4ea6712..5f7b3ad 100644 --- a/resources/dictionary.txt +++ b/resources/dictionary.txt @@ -9,6 +9,7 @@ Bloomberg Bookdown bioinformatics biotype +CDS CHROM CLI ClinVar @@ -37,6 +38,7 @@ favicon frameshift fyi GDSCN +GTF GenBank GH GitHub @@ -66,6 +68,7 @@ indel indels inframe itcrtraining +JunctionReadCount json junctional Leanpub @@ -103,17 +106,20 @@ proteomics pVAC pVACfuse pVACseq +pVACsplice pvactools pVACtools pVACvector pVACview pVACviz RefSeq +RegTools reproducibility somatically subclonal summarization STARFusion +SpanningFragCount tbi tiering TSL From 1363de3ca607da9b9f436792a3dd354bb3df8176 Mon Sep 17 00:00:00 2001 From: Susanna Kiwala Date: Fri, 5 Dec 2025 07:08:50 -0600 Subject: [PATCH 4/4] Apply suggestions from code review Co-authored-by: Thomas B. Mooney --- 02-prerequisites.Rmd | 2 +- 05-pvacview_tour.Rmd | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/02-prerequisites.Rmd b/02-prerequisites.Rmd index 56c2c79..fe071c9 100644 --- a/02-prerequisites.Rmd +++ b/02-prerequisites.Rmd @@ -129,7 +129,7 @@ gunzip ref_genome.fa.gz rm -rf ref_genome.tar ``` -This will add the following reference files +This will add the following reference files: - `ref_genome.fa` and `.fai`: A reference DNA FASTA file and index - `Homo_sapiens.GRCh38.105.chr.gtf.gz`: A reference GTF file diff --git a/05-pvacview_tour.Rmd b/05-pvacview_tour.Rmd index 50eb40a..e2bddbc 100644 --- a/05-pvacview_tour.Rmd +++ b/05-pvacview_tour.Rmd @@ -113,7 +113,7 @@ reference proteome which is undesired since such peptides are not novel and including them in a vaccine might lead to an auto immune response. Users are able to set a status for each candidate by clicking the appropriate -buttons: thumbs-up (Accept), thumbs-down (Reject), or Flag (requires further review). +buttons: thumbs-up (accept), thumbs-down (reject), or flag (requires further review). Clicking on the row for a variant will select it. This will update the lower panels with details for the selected variant. @@ -210,7 +210,7 @@ Best Peptide. The Best Peptide is highlighted in green. ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a65970eb36_0_24") ``` -The Anchor Heatmap tab shows a heatmap overlayed over each included neoantigen +The Anchor Heatmap tab shows a heatmap overlaying each included neoantigen candidate from the selected transcript set. A darker color represents a higher probability that a position in the peptide is an anchor. Mutated positions are represented by red letters. More information