Skip to content

SamThilmany/Proteomics-Method-Optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proteomics Method Optimization

Systematic comparison of LC gradient lengths and data-dependent acquisition (DDA) Top-N settings using well-characterized reference samples, run on a Thermo Scientific Q Exactive (classic) coupled to an IonOpticks Aurora Ultimate 25 × 75 µm, 1.7 µm analytical column.

Each sub-project targets a specific reference standard and QC question. All notebooks share a common helper library and follow the same structure, so extending the analysis to new raw files requires only dropping files into data/ and re-running the notebook.


Sub-projects

Directory Standard Purpose
BSA/ Bovine serum albumin tryptic digest (Sigma-Aldrich, P02769) Baseline sensitivity; single-protein sequence coverage as a focused LC-MS health check
HeLa/ HeLa cell tryptic digest Complex-matrix benchmarking; whole-proteome identification depth and reproducibility
TMT_Yeast/ Pierce™ TMT11plex Yeast Digest Standard (BY4741) End-to-end TMT quantification quality; labeling efficiency, ratio accuracy, and differential abundance of three built-in knockouts

Repository structure

Method Optimization/
├── README.md
├── helpers/                        # Shared R helper library
│   ├── global.r                    # save_and_show_plot/table, get_method_levels
│   ├── load_data.r                 # load_psm_data()
│   ├── summarize.r                 # PSM counts, coverage, CV, peak sampling, …
│   ├── plots.r                     # plot_metric_distribution, plot_psm_counts, …
│   ├── fwhm.r                      # get_peak_fwhm(), plot_fwhm_vs_rt()
│   └── spectral_quality.r          # get_spectral_quality()
├── BSA/
│   ├── README.md
│   ├── QC_BSA.ipynb
│   ├── data/                       # PD .txt exports + Thermo .raw files (git-ignored)
│   ├── results/                    # Generated CSV tables
│   └── graphs/                     # Generated plots (PNG/PDF)
├── HeLa/
│   ├── README.md
│   ├── QC_HeLa.ipynb
│   ├── data/
│   ├── results/
│   └── graphs/
└── TMT_Yeast/
    ├── README.md
    ├── QC_TMT_Yeast.ipynb
    ├── helpers/
    │   └── tmt_quantification.r    # TMT-specific metrics and plots
    ├── data/
    ├── results/
    └── graphs/

Shared helper library

All notebooks source the helpers from ../helpers/ (relative to their own directory). The library provides:

  • load_psm_data(filepaths) — reads and merges one or more Proteome Discoverer 3.0.1 PSM export .txt files, parses LC gradient, MS method, and replicate number from the spectrum file name, and removes duplicate scans. Handles multi-node exports (e.g. fixed + dynamic TMT search) by including Identifying.Node.No in the deduplication key.
  • summarize.r — PSM counts, unique peptide counts, PSM redundancy, charge distribution, RT distribution, sequence coverage (single-protein or proteome-distribution mode), peptide intensity CV, peak sampling, cycle time.
  • plots.rplot_metric_distribution() (violin + box), bar and density plots for all identification and spectral quality metrics.
  • fwhm.r — XIC-based chromatographic peak FWHM via rawrr::readChromatogram(), with per-file RDS caching.
  • spectral_quality.r — AGC fill fraction, MS2 TIC, MS2 S/N, and fragment count from Thermo .raw files via rawrr::readSpectrum(), with per-file RDS caching.

File naming convention

Spectrum file names must follow this pattern:

<prefix>_<LC_gradient>_<MS_method>_<replicate>.raw

Examples: BSA_90min_Top10_1.raw, HeLa_120min_Top15_3.raw, Yeast_90min_Top10_2.raw

Method metadata (LC gradient length, MS method, replicate number) are parsed automatically. No manual configuration is required when new files follow this convention.


Requirements

Easiest setup — JupyterLab Docker Environment

The quickest way to get a fully configured environment with all required packages, JupyterLab, and .NET pre-installed is to use the accompanying Docker image:

SamThilmany/JupyterLab-with-R_Docker-Environment

Manual installation — R packages

Package Source Purpose
tidyverse CRAN Data manipulation and ggplot2 plotting
patchwork CRAN Multi-panel figure assembly
scales CRAN Axis label formatting
ggrepel CRAN Non-overlapping text labels (volcano plots)
conflicted CRAN Explicit resolution of namespace conflicts
IRdisplay CRAN Inline HTML display in JupyterLab
rawrr Bioconductor Reading Thermo .raw files
MSstatsTMT Bioconductor TMT protein summarization and group comparison

Install CRAN packages with install.packages(); install Bioconductor packages with BiocManager::install().

Raw file access

rawrr ≥ 1.15.3 requires the .NET runtime on all platforms (macOS, Linux, and Windows — Mono was removed in 1.15.3). Follow the rawrr installation instructions at https://bioconductor.org/packages/rawrr.


Adding new data

  1. Place Proteome Discoverer PSM export .txt files in the relevant data/ directory.
  2. Place the corresponding Thermo .raw files in the same data/ directory (they are excluded from version control by .gitignore).
  3. Re-run the notebook — load_psm_data() picks up all .txt files in data/ automatically.

About

LC-MS method optimization for Q Exactive: BSA, HeLa, and TMT11plex yeast reference standards

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors