High-performance Rust parser for NASDAQ TotalView-ITCH 5.0 binary protocol with Parquet output.
- Fast: Parses 400M+ messages in minutes (5.6 GB compressed → 8.6 GB Parquet)
- Complete: All 21 ITCH 5.0 message types supported
- Streaming: Memory-efficient with 256 MB chunk processing
- Validated: Byte-for-byte match with reference Python implementation
git clone https://github.com/ml4t/itch-parser.git
cd itch-parser
cargo build --releaseSee Releases for Linux/macOS/Windows binaries.
# Parse gzipped ITCH file
./target/release/itch_parser data.itch.gz ./output 01302020
# Arguments:
# <input_file> - Path to ITCH file (supports .gz)
# <output_dir> - Directory for Parquet output
# <MMDDYYYY> - Trade date for timestamp conversionoutput/
├── A/ # Add Order (no MPID attribution)
│ ├── part-0.parquet
│ ├── part-1.parquet
│ └── ...
├── D/ # Order Delete
├── E/ # Order Executed
├── F/ # Add Order (with MPID)
├── P/ # Trade (non-cross)
├── U/ # Order Replace
├── X/ # Order Cancel
└── ... # All 21 message types
| Type | Name | Description |
|---|---|---|
| Order Book | ||
| A | Add Order | New order (no attribution) |
| F | Add Order MPID | New order with market maker ID |
| D | Order Delete | Order removed from book |
| E | Order Executed | Order matched |
| C | Order Executed with Price | Order matched at different price |
| X | Order Cancel | Partial cancel |
| U | Order Replace | Modify order (price/size) |
| Trades | ||
| P | Trade | Non-cross trade message |
| Q | Cross Trade | Cross/auction trade |
| B | Broken Trade | Trade broken/cancelled |
| System | ||
| S | System Event | Market open/close events |
| R | Stock Directory | Instrument definitions |
| H | Trading Action | Trading halt/resume |
| h | Operational Halt | LULD halt notification |
| Y | Reg SHO | Short sale restriction |
| L | Market Participant Position | Market maker position |
| Auction | ||
| I | NOII | Net Order Imbalance Indicator |
| J | Auction Collar | LULD collar prices |
| K | IPO Quoting | IPO release info |
| Circuit Breaker | ||
| V | MWCB Decline Level | MWCB thresholds |
| W | MWCB Status | MWCB triggered |
| Column | Type | Description |
|---|---|---|
| timestamp | Int64 | Nanoseconds from midnight |
| stock_locate | UInt16 | Stock identifier |
| order_reference_number | UInt64 | Unique order ID |
| buy_sell_indicator | String | "B" or "S" |
| shares | UInt32 | Order quantity |
| stock | String | Stock symbol (8 chars) |
| price | UInt32 | Price × 10,000 |
| Column | Type | Description |
|---|---|---|
| timestamp | Int64 | Nanoseconds from midnight |
| order_reference_number | UInt64 | Order that was executed |
| buy_sell_indicator | String | "B" or "S" |
| shares | UInt32 | Trade quantity |
| stock | String | Stock symbol |
| price | UInt32 | Trade price × 10,000 |
| match_number | UInt64 | Unique trade ID |
Tested on NASDAQ TotalView-ITCH data (January 30, 2020):
| Metric | Value |
|---|---|
| Input size | 5.6 GB (compressed) |
| Output size | 8.6 GB (Parquet) |
| Messages parsed | 417,149,497 |
| Processing time | ~3 minutes |
| Throughput | ~120 MB/s (compressed) |
| Type | Count | Description |
|---|---|---|
| A | 184,735,355 | Add orders |
| D | 180,285,101 | Delete orders |
| U | 36,777,372 | Replace orders |
| E | 8,415,610 | Executions |
| X | 4,990,972 | Cancels |
| P | 1,779,727 | Trades |
Validated against Python reference implementation with exact match on all fields:
Type Python Rows Rust Rows Match
--------------------------------------------------
A 184,735,355 184,735,355 ✓
D 180,285,101 180,285,101 ✓
E 8,415,610 8,415,610 ✓
P 1,779,727 1,779,727 ✓
X 4,990,972 4,990,972 ✓
U 36,777,372 36,777,372 ✓
Schema columns and data types match exactly between implementations.
Official source: NASDAQ TotalView-ITCH
Alternative source with free samples: databento.com/pcaps
- Nasdaq TotalView-ITCH 5.0: Available from 2018-05-01
- Includes raw network packets for protocol edge case testing
- Visit databento.com/pcaps
- Find "Nasdaq TotalView-ITCH 5.0" section
- Download a sample file (e.g.,
ny4-xnas-tvitch-a-20230822T133000.pcap.zst)
Sample files are zstd-compressed PCAP with:
- Nanosecond-resolution timestamps
- MoldUDP64 transport layer
- Raw ITCH 5.0 binary messages
Use the validation script to test the parser against PCAP samples:
# Decompress if needed
zstd -d ny4-xnas-tvitch-a-20230822T133000.pcap.zst
# Run validation
cd tests
python validate_pcap.py ../sample.pcap --trade-date 08222023The script:
- Extracts ITCH messages from PCAP (handles MoldUDP64, VLAN tags)
- Runs the Rust parser on extracted data
- Validates message counts match exactly
- Spot-checks field content (order refs, prices, symbols)
Tested against DataBento sample (August 22, 2023, 9:30-9:40 AM):
| Metric | Value |
|---|---|
| PCAP size | 1.9 GB (474 MB compressed) |
| Messages | 20,288,210 |
| Duration | 10 minutes |
| Sequence gaps | 0 |
| Match rate | 100% |
Message type breakdown:
| Type | Count | Match |
|---|---|---|
| A (Add Order) | 8,824,973 | ✓ |
| D (Delete) | 7,882,019 | ✓ |
| U (Replace) | 1,889,620 | ✓ |
| E (Execute) | 1,044,952 | ✓ |
| X (Cancel) | 305,668 | ✓ |
| P (Trade) | 266,399 | ✓ |
| Others | 74,579 | ✓ |
[dependencies]
arrow = "54.3"
parquet = { version = "54.3", features = ["arrow"] }
bytes = "1.11"
flate2 = "1.1"
chrono = "0.4"
anyhow = "1.0"
indicatif = "0.17"
once_cell = "1.21"# Debug build
cargo build
# Release build (recommended)
cargo build --release
# Run tests
cargo test
# Run benchmarks
cargo bench- ml4t/ml4t-code - Machine Learning for Trading code
- DataBento - Market data provider with harmonized feeds
MIT
Issues and PRs welcome. Please run cargo fmt and cargo clippy before submitting.
Developed as part of Machine Learning for Algorithmic Trading, 3rd Edition.