Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
153 changes: 153 additions & 0 deletions CLI_MIGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# Migrating to msgf-rust from Java MS-GF+

msgf-rust accepts both the canonical Rust-idiomatic CLI form (named values, kebab-case) and the legacy Java MS-GF+ form (numeric IDs and short flag names) silently — running scripts written against Java MS-GF+ unchanged is supported.

This page is a quick-reference for porting commands. For the full CLI reference, see [`DOCS.md`](DOCS.md) §1.

## Table A — Java MS-GF+ flag → msgf-rust flag

| Java MS-GF+ | msgf-rust canonical | msgf-rust legacy alias |
|---|---|---|
| `-s <FILE>` | `--spectrum <FILE>` | — |
| `-d <FILE>` | `--database <FILE>` | — |
| `-o <FILE>` | `--output-pin <FILE>` | — |
| `-mod <FILE>` | `--mods <FILE>` | `--mod <FILE>` |
| `-t 20ppm` | `--precursor-tol-ppm 20` | — |
| `-ti -1,2` | `--isotope-error-min -1 --isotope-error-max 2` | — |
| `-m 3` (HCD) | `--fragmentation HCD` | `--fragmentation 3` |
| `-inst 3` (QExactive) | `--instrument QExactive` | `--instrument 3` |
| `-protocol 4` (TMT) | `--protocol TMT` | `--protocol 4` |
| `-ntt 2` (fully specific) | `--enzyme-specificity fully` | `--ntt 2` |
| `-tda 1` (target+decoy) | (omit — decoys always auto-generated) | — |
| `-e 1` (Trypsin) | (omit — Trypsin is the only enzyme) | — |
| `-outputFormat 1` (TSV) | `--output-tsv <FILE>` | — |
| `-thread N` | `--threads N` | — |
| `-minLength 6` | `--min-length 6` | — |
| `-maxLength 40` | `--max-length 40` | — |
| `-maxMissedCleavages 1` | `--max-missed-cleavages 1` | — |
| `-minNumPeaks 10` | `--min-peaks 10` | — |

## Table B — Numeric-legacy → named values

| Flag | Legacy numeric | Canonical named |
|---|---|---|
| `--fragmentation` | `0` | `auto` |
| `--fragmentation` | `1` | `CID` |
| `--fragmentation` | `2` | `ETD` |
| `--fragmentation` | `3` | `HCD` |
| `--fragmentation` | `4` | `UVPD` |
| `--instrument` | `0` | `low-res` |
| `--instrument` | `1` | `high-res` |
| `--instrument` | `2` | `TOF` |
| `--instrument` | `3` | `QExactive` |
| `--protocol` | `0` | `auto` |
| `--protocol` | `1` | `phospho` |
| `--protocol` | `2` | `iTRAQ` |
| `--protocol` | `3` | `iTRAQ-phospho` |
| `--protocol` | `4` | `TMT` |
| `--protocol` | `5` | `standard` |
| `--enzyme-specificity` (aliases: `--ntt`) | `0` | `non-specific` |
| `--enzyme-specificity` | `1` | `semi` |
| `--enzyme-specificity` | `2` | `fully` |

clap parses named values case-insensitively, so `--fragmentation hcd` works the same as `--fragmentation HCD`.

## Worked examples

### (a) Plain Trypsin DDA, 20 ppm precursor tolerance

**Java MS-GF+:**

```bash
java -Xmx4G -jar MSGFPlus.jar \
-s spectra.mzML \
-d uniprot.fasta \
-tda 1 \
-t 20ppm \
-ti -1,2 \
-o results.pin
```

**msgf-rust (canonical):**

```bash
msgf-rust \
--spectrum spectra.mzML \
--database uniprot.fasta \
--precursor-tol-ppm 20 \
--isotope-error-min -1 --isotope-error-max 2 \
--output-pin results.pin
```

**msgf-rust (legacy-form, drop-in for existing quantms scripts):**

The Java-style flags above don't translate verbatim — `-s`, `-d`, `-o` are Java-only. But the search-parameter flags do; for example, an existing quantms script that calls msgf-rust with `--fragmentation 3 --instrument 3 --protocol 4` keeps working unchanged.

### (b) TMT 10-plex search

**Java MS-GF+:**

```bash
java -Xmx8G -jar MSGFPlus.jar \
-s tmt_spectra.mzML \
-d hsapiens.fasta \
-tda 1 \
-t 20ppm \
-inst 3 \
-m 3 \
-protocol 4 \
-mod tmt_mods.txt \
-o results.pin
```

**msgf-rust:**

```bash
msgf-rust \
--spectrum tmt_spectra.mzML \
--database hsapiens.fasta \
--precursor-tol-ppm 20 \
--instrument QExactive \
--fragmentation HCD \
--protocol TMT \
--mods tmt_mods.txt \
--output-pin results.pin
```

### (c) Phospho STY search

**Java MS-GF+:**

```bash
java -Xmx4G -jar MSGFPlus.jar \
-s phospho.mzML \
-d uniprot.fasta \
-tda 1 \
-t 10ppm \
-inst 1 \
-m 3 \
-protocol 1 \
-mod phospho_mods.txt \
-o results.pin
```

**msgf-rust:**

```bash
msgf-rust \
--spectrum phospho.mzML \
--database uniprot.fasta \
--precursor-tol-ppm 10 \
--instrument high-res \
--fragmentation HCD \
--protocol phospho \
--mods phospho_mods.txt \
--output-pin results.pin
```

## Notes

- `-tda 1` (target+decoy database analysis) is always on in msgf-rust — decoys are generated by reversing target sequences at search time. The decoy prefix is configurable via `--decoy-prefix` (default `XXX_`).
- The Java `-e` enzyme flag is not exposed; Trypsin is hardcoded. For non-tryptic searches, use a custom `.param` file via `--param-file`.
- mzXML, MS2, PKL, and `_dta.txt` inputs are not supported. Use mzML or MGF.
- mzIdentML output is not supported. Use PIN (with Percolator) or TSV.
Loading
Loading