-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathBenchmarks.qmd
More file actions
147 lines (100 loc) · 13.3 KB
/
Benchmarks.qmd
File metadata and controls
147 lines (100 loc) · 13.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "ProteoMaker Benchmarks"
format: html
editor: visual
---
This document provides a detailed description of the benchmarks used in the ProteoMaker pipeline. ProteoMaker is a tool for generating an in-silico bottom-up LC-MS dataset from proteoforms. The pipeline simulates various stages of the experimental process, from the generation of ground truth data to mass spectrometry (MS) analysis and statistical testing. These benchmarks are essential for comparing the results and evaluating the performance of the pipeline. For more information about the project, visit [ProteoMaker on GitHub](https://github.com/computproteomics/ProteoMaker).
## List of Benchmarks
The following values and distributions are collected and used for assessing quantitative and statistical fidelity of the simulation and benchmarking pipeline.
### Peptidoform Level
- **Total Number of Peptidoforms** (`numPeptides`, label: **#Total peptidoforms (mod+unmod)**): Total count of all peptidoforms (modified and unmodified). Fixed PTMs are not included.
- **Number of Protein Accessions** (`numProteins`, label: **#Protein accessions (all peptidoforms)**): Distinct protein accessions represented by at least one peptidoform.
- **Proportion of Unique Peptidoforms** (`propUniquePep`, label: **% Unique peptidoforms (single protein)**): Fraction of peptidoforms mapping to exactly one protein accession.
- **Total Number of Unique Peptide Sequences** (`uniqueStrippedPep`, label: **#Unique peptide sequences**): Unique peptide sequences after removing PTM annotations.
- **Percentage Missingness** (`percMissingPep`, label: **% Missing peptidoform values**): Percentage of missing values across peptidoform-level intensities.
- **AUC of ROC Curve for Correct Differentially Regulated Peptidoforms** (`aucDiffRegPeptides.FDR_limma 2 vs 1.AUC`, label: *Peptidoform AUC (truth vs limma FDR)*): Area under the ROC curve for identifying truly regulated peptidoforms using the limma-based FDR estimates.
- **TPR (True Positive Rate)** (`tprPep0.01.FDR_limma 2 vs 1.TPR`, `tprPep0.05.FDR_limma 2 vs 1.TPR`, label: *TPR peptidoforms (FDR \< 0.01/0.05)*): True positive rate at estimated FDR thresholds of 0.01 and 0.05.
- **True FDR for Estimated FDR** (`tFDRPep0.01.FDR_limma 2 vs 1.tFDR`, `tFDRPep0.05.FDR_limma 2 vs 1.tFDR`, label: *True FDR (peptidoforms)*): Actual (true) FDR at estimated FDR levels of 0.01 and 0.05.
- **Miscleavage Count Distribution** (`propMisCleavedPeps`, label: *Miscleavage distribution*): Vector of proportions for peptides grouped by their number of missed cleavages (e.g., fractions for 0, 1, 2, ... missed cleavages).
- **Dynamic Range** (`dynRangePep`, label: *Dynamic range (peptidoforms)*): Log2 range of peptidoform intensities.
- **Mean of Squared Residuals Towards Actual Fold-Changes** (`meanSquareDiffFCPep`, label: *Fold-change error (peptidoforms)*): Fold-change error (ground truth vs. measured), computed as squared differences for log2 FC.
- **Mean of Std. Dev. Within Replicates** (`sdWithinRepsPep`, label: *Replicate SD (peptidoforms)*): Within-group variation of regulated peptidoforms.
- **Skewness** (`skewnessPeps`, label: *Skewness (peptidoforms)*): Asymmetry of peptidoform intensity distribution.
- **Kurtosis** (`kurtosisPeps`, label: *Kurtosis (peptidoforms)*): Peakedness (tailedness) of peptidoform intensity distribution.
- **Standard Deviation** (`sdPeps`, label: *Overall SD (peptidoforms)*): Spread of peptidoform intensity distribution.
### Protein-Group Level
- **Number of Quantified Protein Groups** (`numQuantProtGroups`, label: **#Quantified protein groups**): Count of protein groups that passed summarisation (respecting the configured shared/modified peptide settings) and retain at least one quantified value.
- **Proportion of Single-Protein Groups** (`propUniqueProts`, label: **% Single-protein groups**): Fraction of protein groups with a single protein accession.
- **Percentage Missingness** (`percMissingProt`, label: **% Missing protein-group values**): Missingness of values across protein-group intensities.
- **Mean Number of Peptide Sequences per Protein Group** (`meanPepPerProt`, label: *Peptide sequences per protein group*): Mean count of peptide sequences per quantified protein group.
- **AUC of ROC Curve for Correct Differentially Regulated Features** (`aucDiffRegProteins.FDR_PolySTest 2 vs 1.AUC`, label: *Protein-group AUC (truth vs FDR)*): AUC for identifying differentially regulated protein groups, applying PolySTest.
- **TPR (True Positive Rate)** (`tprProt0.01.FDR_PolySTest 2 vs 1.TPR`, `tprProt0.05.FDR_PolySTest 2 vs 1.TPR`, label: *TPR protein groups (FDR \< 0.01/0.05)*): True positive rate for regulated protein groups at PolySTest-estimated FDR thresholds of 0.01 and 0.05.
- **True FDR for Estimated FDR** (`tFDRProt0.01.FDR_PolySTest 2 vs 1.tFDR`, `tFDRProt0.05.FDR_PolySTest 2 vs 1.tFDR`, label: *True FDR (protein groups)*): Actual false-discovery rate realised at those PolySTest thresholds.
- **Mean of Squared Residuals Towards Actual Fold-Changes** (`meanSquareDiffFCProt`, label: *Fold-change error (protein groups)*): Average squared error between simulated and measured log2 fold-changes.
- **Dynamic Range** (`dynRangeProt`, label: *Dynamic range (protein groups)*): Log2 range of protein-group intensities.
- **Mean of Std. Dev. Within Replicates** (`sdWithinRepsProt`, label: *Replicate SD (protein groups)*): Within-group variation of regulated protein groups.
- **Proportion of Protein Groups with Miscleaved Peptides** (`propMisCleavedProts`, label: **% Miscleaved protein groups**): Fraction of quantified protein groups that contain at least one peptide with a missed cleavage.
- **Proportion of Regulated Protein Groups with Wrong Identified Peptides** (`propDiffRegWrongIDProt0.01.FDR_PolySTest 2 vs 1`, `propDiffRegWrongIDProt0.05.FDR_PolySTest 2 vs 1`, label: **% Wrong-ID (protein groups)**): Fraction of regulated protein groups with wrongly identified peptides at 0.01 and 0.05 PolySTest FDR thresholds.
- **Skewness** (`skewnessProts`, label: *Skewness (protein groups)*): Asymmetry in distribution of protein-group intensities.
- **Kurtosis** (`kurtosisProts`, label: *Kurtosis (protein groups)*): Peakedness of protein-group intensity distribution.
- **Standard Deviation** (`sdProts`, label: *Overall SD (protein groups)*): Overall spread of protein-group intensity values.
### Proteoform Level
- **Number and Mean of Proteoforms per Protein** (`numProteoforms`, `meanProteoformsPerProt`, label: **#Proteoforms**, *Proteoforms per protein*): Total and mean count of distinct proteoforms across protein groups.
- **Number of Modified Peptidoforms** (`numModPeptides`, label: **#Modified peptidoforms**): Total count of peptidoforms carrying simulated PTMs.
- **Proportion of Modified Peptidoforms with Identical Unmodified Form** (`propModAndUnmodPep`, label: **% Modified with unmodified match**): Fraction of modified peptidoforms that have a corresponding unmodified form.
- **AUC for Correctly Regulated Modified Peptidoforms** (`aucDiffRegAdjModPep.FDR_limma 2 vs 1.AUC`, label: *AUC (adj. mod. peptidoforms)*): AUC calculated after adjustment for parent protein abundance (reported only when at least 200 modified peptidoforms have matching protein quantification).
- **TPR** (`tprAdjModPep0.01.FDR_limma 2 vs 1.TPR`, `tprAdjModPep0.05.FDR_limma 2 vs 1.TPR`, label: *TPR (adj. mod. peptidoforms)*): True positive rate for regulated modified peptidoforms at estimated FDRs 0.01 and 0.05.
- **True FDR** (`tFDRAdjModPep0.01.FDR_limma 2 vs 1.tFDR`, `tFDRAdjModPep0.05.FDR_limma 2 vs 1.tFDR`, label: *True FDR (mod. peptidoforms)*): Actual FDR of modified peptidoform results at FDRs 0.01 and 0.05.
- **Share of Significant Modified Peptidoforms** (`propDiffRegPepWrong0.01.FDR_PolySTest 2 vs 1`, `propDiffRegPepWrong0.05.FDR_PolySTest 2 vs 1`, label: **% Wrongly significant (mod. peptidoforms)**): Fraction of modified peptidoforms whose FDR values fall below 0.01 or 0.05, respectively.
- **Proportion of Modified Peptidoforms with Quantified Protein Group** (`percOverlapModPepProt`, label: **% Mod peptidoforms with protein quant**): Fraction of modified peptidoforms with quantifiable protein-group background.
- **Mean of Squared Residuals Towards Actual Fold-Changes** (`meanSquareDiffFCModPep`, label: *Fold-change error (mod. peptidoforms)*): Squared differences between simulated and measured log2 fold-changes of modified peptidoforms.
## Benchmark Overview Table
| Benchmarking Metrics | Name | Label (for figures) | Category |
|----|----|----|----|
| Total Number of Peptidoforms | numPeptides | #Total peptidoforms (mod+unmod) | Peptidoform |
| Number of Protein Accessions | numProteins | #Protein accessions (all peptidoforms) | Peptidoform |
| Proportion of Unique Peptidoforms | propUniquePep | \% Unique peptidoforms (single protein) | Peptidoform |
| Unique Peptide Sequences | uniqueStrippedPep | #Unique peptide sequences | Peptidoform |
| Percentage Missingness | percMissingPep | \% Missing peptidoform values | Peptidoform |
| AUC of ROC (Peptidoforms) | aucDiffRegPeptides.FDR_limma 2 vs 1.AUC | Peptidoform AUC (truth vs limma FDR) | Peptidoform |
| TPR at FDR \< 0.01 | tprPep0.01.FDR_limma 2 vs 1.TPR | TPR peptidoforms (FDR \< 0.01) | Peptidoform |
| TPR at FDR \< 0.05 | tprPep0.05.FDR_limma 2 vs 1.TPR | TPR peptidoforms (FDR \< 0.05) | Peptidoform |
| True FDR at 0.01 | tFDRPep0.01.FDR_limma 2 vs 1.tFDR | True FDR (peptidoforms, 0.01) | Peptidoform |
| True FDR at 0.05 | tFDRPep0.05.FDR_limma 2 vs 1.tFDR | True FDR (peptidoforms, 0.05) | Peptidoform |
| Miscleavage Distribution | propMisCleavedPeps.{0,1,...} | Miscleavage distribution (per missed-cleavage count) | Peptide |
| Dynamic Range | dynRangePep | Dynamic range (peptidoforms) | Peptidoform |
| Fold-Change Error | meanSquareDiffFCPep | Fold-change error (peptidoforms) | Peptidoform |
| Within-Replicate SD | sdWithinRepsPep | Replicate SD (peptidoforms) | Peptidoform |
| Skewness | skewnessPeps | Skewness (peptidoforms) | Peptidoform |
| Kurtosis | kurtosisPeps | Kurtosis (peptidoforms) | Peptidoform |
| Overall SD | sdPeps | Overall SD (peptidoforms) | Peptidoform |
| Quantified Protein Groups | numQuantProtGroups | #Quantified protein groups | Protein group |
| Single-Protein Groups | propUniqueProts | \% Single-protein groups | Protein group |
| Percentage Missingness | percMissingProt | \% Missing protein-group values | Protein group |
| Mean Peptide Sequences per Protein Group | meanPepPerProt | Peptide sequences per protein group | Protein group |
| AUC of ROC (Protein Groups) | aucDiffRegProteins.FDR_PolySTest 2 vs 1.AUC | Protein-group AUC (truth vs FDR) | Protein group |
| TPR at FDR \< 0.01 | tprProt0.01.FDR_PolySTest 2 vs 1.TPR | TPR protein groups (FDR \< 0.01) | Protein group |
| TPR at FDR \< 0.05 | tprProt0.05.FDR_PolySTest 2 vs 1.TPR | TPR protein groups (FDR \< 0.05) | Protein group |
| True FDR at 0.01 | tFDRProt0.01.FDR_PolySTest 2 vs 1.tFDR | True FDR (protein groups, 0.01) | Protein group |
| True FDR at 0.05 | tFDRProt0.05.FDR_PolySTest 2 vs 1.tFDR | True FDR (protein groups, 0.05) | Protein group |
| Fold-Change Error | meanSquareDiffFCProt | Fold-change error (protein groups) | Protein group |
| Within-Replicate SD | sdWithinRepsProt | Replicate SD (protein groups) | Protein group |
| Miscleaved Protein Groups | propMisCleavedProts | \% Miscleaved protein groups | Protein group |
| Wrong ID Protein Groups at 0.01 | propDiffRegWrongIDProt0.01.FDR_PolySTest 2 vs 1 | \% Wrong-ID (protein groups, 0.01) | Protein group |
| Wrong ID Protein Groups at 0.05 | propDiffRegWrongIDProt0.05.FDR_PolySTest 2 vs 1 | \% Wrong-ID (protein groups, 0.05) | Protein group |
| Skewness | skewnessProts | Skewness (protein groups) | Protein group |
| Kurtosis | kurtosisProts | Kurtosis (protein groups) | Protein group |
| Overall SD | sdProts | Overall SD (protein groups) | Protein group |
| Total Proteoforms | numProteoforms | #Proteoforms | Proteoform |
| Mean Proteoforms per Protein | meanProteoformsPerProt | Proteoforms per protein | Proteoform |
| Number of Modified Peptidoforms | numModPeptides | #Modified peptidoforms | Proteoform |
| Modified with Unmodified Match | propModAndUnmodPep | \% Modified with unmodified match | Proteoform |
| AUC (Adj. Modified Peptidoforms) | aucDiffRegAdjModPep.FDR_limma 2 vs 1.AUC | AUC (adj. mod. peptidoforms) | Proteoform |
| TPR (Adj. Mod Peptidoforms) at 0.01 | tprAdjModPep0.01.FDR_limma 2 vs 1.TPR | TPR (adj. mod. peptidoforms, 0.01) | Proteoform |
| TPR (Adj. Mod Peptidoforms) at 0.05 | tprAdjModPep0.05.FDR_limma 2 vs 1.TPR | TPR (adj. mod. peptidoforms, 0.05) | Proteoform |
| True FDR (Adj. Mod Peptidoforms) at 0.01 | tFDRAdjModPep0.01.FDR_limma 2 vs 1.tFDR | True FDR (mod. peptidoforms, 0.01) | Proteoform |
| True FDR (Adj. Mod Peptidoforms) at 0.05 | tFDRAdjModPep0.05.FDR_limma 2 vs 1.tFDR | True FDR (mod. peptidoforms, 0.05) | Proteoform |
| Wrongly Significant Mod Peptidoforms (0.01) | propDiffRegPepWrong0.01.FDR_PolySTest 2 vs 1 | \% Wrongly significant (mod. peptidoforms, 0.01) | Proteoform |
| Wrongly Significant Mod Peptidoforms (0.05) | propDiffRegPepWrong0.05.FDR_PolySTest 2 vs 1 | \% Wrongly significant (mod. peptidoforms, 0.05) | Proteoform |
| Modified Peptidoforms with Protein Quant | percOverlapModPepProt | \% Mod peptidoforms with protein quant | Proteoform |
| Fold-Change Error (Mod Peptidoforms) | meanSquareDiffFCModPep | Fold-change error (mod. peptidoforms) | Proteoform |