Systematic comparison of LC gradient lengths and data-dependent acquisition (DDA) Top-N settings using well-characterized reference samples, run on a Thermo Scientific Q Exactive (classic) coupled to an IonOpticks Aurora Ultimate 25 × 75 µm, 1.7 µm analytical column.
Each sub-project targets a specific reference standard and QC question.
All notebooks share a common helper library and follow the same structure, so
extending the analysis to new raw files requires only dropping files into
data/ and re-running the notebook.
| Directory | Standard | Purpose |
|---|---|---|
BSA/ |
Bovine serum albumin tryptic digest (Sigma-Aldrich, P02769) | Baseline sensitivity; single-protein sequence coverage as a focused LC-MS health check |
HeLa/ |
HeLa cell tryptic digest | Complex-matrix benchmarking; whole-proteome identification depth and reproducibility |
TMT_Yeast/ |
Pierce™ TMT11plex Yeast Digest Standard (BY4741) | End-to-end TMT quantification quality; labeling efficiency, ratio accuracy, and differential abundance of three built-in knockouts |
Method Optimization/
├── README.md
├── helpers/ # Shared R helper library
│ ├── global.r # save_and_show_plot/table, get_method_levels
│ ├── load_data.r # load_psm_data()
│ ├── summarize.r # PSM counts, coverage, CV, peak sampling, …
│ ├── plots.r # plot_metric_distribution, plot_psm_counts, …
│ ├── fwhm.r # get_peak_fwhm(), plot_fwhm_vs_rt()
│ └── spectral_quality.r # get_spectral_quality()
├── BSA/
│ ├── README.md
│ ├── QC_BSA.ipynb
│ ├── data/ # PD .txt exports + Thermo .raw files (git-ignored)
│ ├── results/ # Generated CSV tables
│ └── graphs/ # Generated plots (PNG/PDF)
├── HeLa/
│ ├── README.md
│ ├── QC_HeLa.ipynb
│ ├── data/
│ ├── results/
│ └── graphs/
└── TMT_Yeast/
├── README.md
├── QC_TMT_Yeast.ipynb
├── helpers/
│ └── tmt_quantification.r # TMT-specific metrics and plots
├── data/
├── results/
└── graphs/
All notebooks source the helpers from ../helpers/ (relative to their own
directory). The library provides:
load_psm_data(filepaths)— reads and merges one or more Proteome Discoverer 3.0.1 PSM export.txtfiles, parses LC gradient, MS method, and replicate number from the spectrum file name, and removes duplicate scans. Handles multi-node exports (e.g. fixed + dynamic TMT search) by includingIdentifying.Node.Noin the deduplication key.summarize.r— PSM counts, unique peptide counts, PSM redundancy, charge distribution, RT distribution, sequence coverage (single-protein or proteome-distribution mode), peptide intensity CV, peak sampling, cycle time.plots.r—plot_metric_distribution()(violin + box), bar and density plots for all identification and spectral quality metrics.fwhm.r— XIC-based chromatographic peak FWHM viarawrr::readChromatogram(), with per-file RDS caching.spectral_quality.r— AGC fill fraction, MS2 TIC, MS2 S/N, and fragment count from Thermo.rawfiles viarawrr::readSpectrum(), with per-file RDS caching.
Spectrum file names must follow this pattern:
<prefix>_<LC_gradient>_<MS_method>_<replicate>.raw
Examples: BSA_90min_Top10_1.raw, HeLa_120min_Top15_3.raw,
Yeast_90min_Top10_2.raw
Method metadata (LC gradient length, MS method, replicate number) are parsed automatically. No manual configuration is required when new files follow this convention.
The quickest way to get a fully configured environment with all required packages, JupyterLab, and .NET pre-installed is to use the accompanying Docker image:
SamThilmany/JupyterLab-with-R_Docker-Environment
| Package | Source | Purpose |
|---|---|---|
tidyverse |
CRAN | Data manipulation and ggplot2 plotting |
patchwork |
CRAN | Multi-panel figure assembly |
scales |
CRAN | Axis label formatting |
ggrepel |
CRAN | Non-overlapping text labels (volcano plots) |
conflicted |
CRAN | Explicit resolution of namespace conflicts |
IRdisplay |
CRAN | Inline HTML display in JupyterLab |
rawrr |
Bioconductor | Reading Thermo .raw files |
MSstatsTMT |
Bioconductor | TMT protein summarization and group comparison |
Install CRAN packages with install.packages(); install Bioconductor
packages with BiocManager::install().
rawrr ≥ 1.15.3 requires the .NET runtime on all platforms (macOS,
Linux, and Windows — Mono was removed in 1.15.3). Follow the rawrr
installation instructions at https://bioconductor.org/packages/rawrr.
- Place Proteome Discoverer PSM export
.txtfiles in the relevantdata/directory. - Place the corresponding Thermo
.rawfiles in the samedata/directory (they are excluded from version control by.gitignore). - Re-run the notebook —
load_psm_data()picks up all.txtfiles indata/automatically.