Skip to content

ml4t/itch-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NASDAQ ITCH 5.0 Parser

High-performance Rust parser for NASDAQ TotalView-ITCH 5.0 binary protocol with Parquet output.

Features

  • Fast: Parses 400M+ messages in minutes (5.6 GB compressed → 8.6 GB Parquet)
  • Complete: All 21 ITCH 5.0 message types supported
  • Streaming: Memory-efficient with 256 MB chunk processing
  • Validated: Byte-for-byte match with reference Python implementation

Installation

From Source

git clone https://github.com/ml4t/itch-parser.git
cd itch-parser
cargo build --release

Pre-built Binaries

See Releases for Linux/macOS/Windows binaries.

Usage

# Parse gzipped ITCH file
./target/release/itch_parser data.itch.gz ./output 01302020

# Arguments:
#   <input_file>   - Path to ITCH file (supports .gz)
#   <output_dir>   - Directory for Parquet output
#   <MMDDYYYY>     - Trade date for timestamp conversion

Output Structure

output/
├── A/          # Add Order (no MPID attribution)
│   ├── part-0.parquet
│   ├── part-1.parquet
│   └── ...
├── D/          # Order Delete
├── E/          # Order Executed
├── F/          # Add Order (with MPID)
├── P/          # Trade (non-cross)
├── U/          # Order Replace
├── X/          # Order Cancel
└── ...         # All 21 message types

Message Types

Type Name Description
Order Book
A Add Order New order (no attribution)
F Add Order MPID New order with market maker ID
D Order Delete Order removed from book
E Order Executed Order matched
C Order Executed with Price Order matched at different price
X Order Cancel Partial cancel
U Order Replace Modify order (price/size)
Trades
P Trade Non-cross trade message
Q Cross Trade Cross/auction trade
B Broken Trade Trade broken/cancelled
System
S System Event Market open/close events
R Stock Directory Instrument definitions
H Trading Action Trading halt/resume
h Operational Halt LULD halt notification
Y Reg SHO Short sale restriction
L Market Participant Position Market maker position
Auction
I NOII Net Order Imbalance Indicator
J Auction Collar LULD collar prices
K IPO Quoting IPO release info
Circuit Breaker
V MWCB Decline Level MWCB thresholds
W MWCB Status MWCB triggered

Parquet Schema Examples

Add Order (Type A)

Column Type Description
timestamp Int64 Nanoseconds from midnight
stock_locate UInt16 Stock identifier
order_reference_number UInt64 Unique order ID
buy_sell_indicator String "B" or "S"
shares UInt32 Order quantity
stock String Stock symbol (8 chars)
price UInt32 Price × 10,000

Trade (Type P)

Column Type Description
timestamp Int64 Nanoseconds from midnight
order_reference_number UInt64 Order that was executed
buy_sell_indicator String "B" or "S"
shares UInt32 Trade quantity
stock String Stock symbol
price UInt32 Trade price × 10,000
match_number UInt64 Unique trade ID

Performance

Tested on NASDAQ TotalView-ITCH data (January 30, 2020):

Metric Value
Input size 5.6 GB (compressed)
Output size 8.6 GB (Parquet)
Messages parsed 417,149,497
Processing time ~3 minutes
Throughput ~120 MB/s (compressed)

Message Distribution

Type Count Description
A 184,735,355 Add orders
D 180,285,101 Delete orders
U 36,777,372 Replace orders
E 8,415,610 Executions
X 4,990,972 Cancels
P 1,779,727 Trades

Validation

Validated against Python reference implementation with exact match on all fields:

Type    Python Rows       Rust Rows    Match
--------------------------------------------------
A       184,735,355     184,735,355        ✓
D       180,285,101     180,285,101        ✓
E         8,415,610       8,415,610        ✓
P         1,779,727       1,779,727        ✓
X         4,990,972       4,990,972        ✓
U        36,777,372      36,777,372        ✓

Schema columns and data types match exactly between implementations.

Data Sources

NASDAQ Historical Data

Official source: NASDAQ TotalView-ITCH

DataBento PCAP Samples

Alternative source with free samples: databento.com/pcaps

  • Nasdaq TotalView-ITCH 5.0: Available from 2018-05-01
  • Includes raw network packets for protocol edge case testing

Downloading PCAP Samples

  1. Visit databento.com/pcaps
  2. Find "Nasdaq TotalView-ITCH 5.0" section
  3. Download a sample file (e.g., ny4-xnas-tvitch-a-20230822T133000.pcap.zst)

Sample files are zstd-compressed PCAP with:

  • Nanosecond-resolution timestamps
  • MoldUDP64 transport layer
  • Raw ITCH 5.0 binary messages

PCAP Validation

Use the validation script to test the parser against PCAP samples:

# Decompress if needed
zstd -d ny4-xnas-tvitch-a-20230822T133000.pcap.zst

# Run validation
cd tests
python validate_pcap.py ../sample.pcap --trade-date 08222023

The script:

  1. Extracts ITCH messages from PCAP (handles MoldUDP64, VLAN tags)
  2. Runs the Rust parser on extracted data
  3. Validates message counts match exactly
  4. Spot-checks field content (order refs, prices, symbols)

Validation Results

Tested against DataBento sample (August 22, 2023, 9:30-9:40 AM):

Metric Value
PCAP size 1.9 GB (474 MB compressed)
Messages 20,288,210
Duration 10 minutes
Sequence gaps 0
Match rate 100%

Message type breakdown:

Type Count Match
A (Add Order) 8,824,973
D (Delete) 7,882,019
U (Replace) 1,889,620
E (Execute) 1,044,952
X (Cancel) 305,668
P (Trade) 266,399
Others 74,579

Dependencies

[dependencies]
arrow = "54.3"
parquet = { version = "54.3", features = ["arrow"] }
bytes = "1.11"
flate2 = "1.1"
chrono = "0.4"
anyhow = "1.0"
indicatif = "0.17"
once_cell = "1.21"

Building from Source

# Debug build
cargo build

# Release build (recommended)
cargo build --release

# Run tests
cargo test

# Run benchmarks
cargo bench

Related Projects

License

MIT

Contributing

Issues and PRs welcome. Please run cargo fmt and cargo clippy before submitting.

Acknowledgments

Developed as part of Machine Learning for Algorithmic Trading, 3rd Edition.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors