This document provides a detailed comparison between the rust_diff Binary Ninja plugin and the current smartdiff MCP implementation, highlighting features to port and architectural differences.
| Feature | rust_diff (BN Plugin) | smartdiff MCP | Port Priority |
|---|---|---|---|
| Binary Analysis | ✅ Full support | ❌ None | 🔴 Critical |
| Source Code Analysis | ❌ None | ✅ Full support | ✅ Keep |
| MCP Protocol | ❌ None | ✅ Full support | ✅ Keep |
| Binary Ninja Integration | ✅ Direct API | ❌ None | 🔴 Critical |
| Function Matching | ✅ Binary-optimized | ✅ Source-optimized | 🟡 Merge |
| CFG Analysis | ✅ Binary CFG | ✅ AST-based | 🟡 Merge |
| Similarity Scoring | ✅ Binary metrics | ✅ Tree edit distance | 🟡 Merge |
| Multi-phase Matching | ✅ 4 phases | ✅ Hungarian + Graph | 🟡 Merge |
| Parallel Processing | ✅ Rayon | ✅ Rayon | ✅ Keep |
| Export Formats | ✅ JSON/CSV/HTML | ✅ JSON | 🟢 Nice-to-have |
| GUI | ✅ Qt-based | ❌ Web-based | 🟢 Nice-to-have |
| Stateful Comparisons | ❌ None | ✅ Full support | ✅ Keep |
| AI Agent Interface | ❌ None | ✅ MCP tools | ✅ Keep |
Location: rust_diff/src/lib.rs lines 124-218
Key Features:
- Extracts function metadata from Binary Ninja BinaryView
- Captures basic blocks with instruction details
- Computes CFG and call graph hashes
- Calculates cyclomatic complexity
Data Structures:
pub struct FunctionInfo {
pub name: String,
pub address: u64,
pub size: u64,
pub basic_blocks: Vec<BasicBlockInfo>,
pub instructions: Vec<InstructionInfo>,
pub cyclomatic_complexity: u32,
pub call_graph_hash: String,
pub cfg_hash: String,
pub instruction_count: usize,
pub call_count: usize,
}
pub struct BasicBlockInfo {
pub address: u64,
pub size: u64,
pub instructions: Vec<InstructionInfo>,
pub edges: Vec<u64>,
pub mnemonic_hash: String,
pub instruction_count: usize,
}
pub struct InstructionInfo {
pub address: u64,
pub mnemonic: String,
pub operands: Vec<String>,
pub bytes: Vec<u8>,
pub length: usize,
}Port Strategy: Create equivalent structures in crates/binary-ninja-bridge/
Location: rust_diff/src/lib.rs lines 220-424
Phase 1: Exact Hash Matching (lines 248-286)
- Uses combined CFG + call graph hash
- O(n) lookup via HashMap
- Highest confidence matches
- Port Priority: 🔴 Critical
Phase 2: Name Matching (lines 288-327)
- Matches functions by name
- Validates with similarity threshold
- Medium-high confidence
- Port Priority: 🔴 Critical
Phase 3: Structural Matching (lines 329-373)
- Compares basic block count, complexity, size
- Finds best match per function
- Medium confidence
- Port Priority: 🔴 Critical
Phase 4: Heuristic Matching (lines 375-424)
- Parallel processing with rayon
- Detailed similarity calculation
- Lowest confidence
- Port Priority: 🟡 Important
Comparison to smartdiff:
- smartdiff uses Hungarian algorithm for optimal matching
- smartdiff uses tree edit distance for similarity
- Both approaches are valid, can be merged
Location: rust_diff/src/lib.rs lines 435-470
Weighted Similarity Formula:
weighted_similarity =
cfg_similarity * 0.5 + // 50% weight
bb_similarity * 0.15 + // 15% weight
instruction_similarity * 0.10 + // 10% weight
edge_similarity * 0.25 // 25% weightIndividual Metrics:
-
CFG Similarity (lines 437):
- Binary: exact hash match (1.0 or 0.0)
- smartdiff: tree edit distance on AST
-
Basic Block Similarity (lines 472-488):
- Ratio of min/max basic block counts
- Simple but effective for binaries
-
Instruction Similarity (lines 490-506):
- Ratio of min/max instruction counts
- Binary-specific metric
-
Edge Similarity (lines 508-520):
- Based on cyclomatic complexity
- Measures control flow similarity
-
Name Similarity (lines 522-538):
- Exact match, substring match, or character overlap
- Useful for both source and binary
-
Call Similarity (lines 540-556):
- Ratio of function call counts
- Binary-specific metric
Port Strategy:
- Keep smartdiff's tree edit distance for source code
- Add binary-specific metrics for binary analysis
- Create unified similarity interface
Location: rust_diff/src/lib.rs lines 558-585
Confidence Boosting:
base_confidence = similarity
// Boost for similar sizes (< 10% difference)
if size_diff < 0.1:
confidence += 0.1
// Boost for similar complexity (< 2 difference)
if complexity_diff < 2:
confidence += 0.1
// Boost for similar basic block count (< 2 difference)
if bb_diff < 2:
confidence += 0.1
// Boost for same name
if name_match:
confidence += 0.2
confidence = min(confidence, 1.0)Port Strategy: Add to binary matching engine
Location: rust_diff/__init__.py
Key Components:
-
BinaryDiffTask (lines 42-108):
- Background thread for long-running analysis
- Progress reporting
- Cancellation support
-
Feature Extraction (lines 109-280):
- Extracts features from BinaryView
- Handles instruction iteration
- Computes hashes and metrics
-
Function Matching (lines 282-632):
- Implements matching phases in Python
- Uses Binary Ninja's analysis results
- Handles edge cases
-
GUI Integration (lines 636-778):
- Qt-based results viewer
- Sortable/filterable table
- Export functionality
Port Strategy:
- Rust implementation in
crates/binary-ninja-bridge/ - MCP tools replace Python plugin interface
- Optional: Keep GUI as separate tool
┌─────────────────────────────────────┐
│ Binary Ninja Plugin (Python) │
│ - UI Integration │
│ - BinaryView Access │
│ - Feature Extraction │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Rust Core (lib.rs) │
│ - Matching Algorithms │
│ - Similarity Scoring │
│ - C FFI Exports │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ MCP Server (JSON-RPC) │
│ - Tools (compare, list, diff) │
│ - Resources (results, summaries) │
│ - Comparison Manager │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Diff Engine │
│ - Function Matcher │
│ - Tree Edit Distance │
│ - Hungarian Algorithm │
│ - Change Classifier │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Parser Engine │
│ - Tree-sitter Parsers │
│ - AST Extraction │
│ - Multi-language Support │
└─────────────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ MCP Server (JSON-RPC) │
│ - Source Code Tools (existing) │
│ - Binary Analysis Tools (NEW) │
│ - Unified Resources │
│ - Comparison Manager (extended) │
└──────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Diff Engine (Extended) │
│ - Source Function Matcher (existing) │
│ - Binary Function Matcher (NEW) │
│ - Unified Similarity Interface │
│ - Change Classifier (extended) │
└──────────────┬──────────────────────────────────────┘
│
┌─────┴─────┐
▼ ▼
┌─────────────┐ ┌─────────────────────┐
│ Parser │ │ Binary Ninja Bridge │
│ Engine │ │ (NEW) │
│ (existing)│ │ - BinaryView Access │
│ │ │ - Feature Extract │
│ │ │ - CFG Analysis │
└─────────────┘ └─────────────────────┘
Source: rust_diff/src/lib.rs lines 248-286
Algorithm:
- Build HashMap of combined hashes for binary B
- For each function in binary A:
- Compute combined hash (CFG + call graph)
- Lookup in HashMap
- If found and not used, create match
- Mark as used to prevent duplicates
Complexity: O(n + m) where n, m are function counts
Port to: crates/diff-engine/src/binary_matcher.rs
Source: rust_diff/src/lib.rs lines 426-433
Algorithm:
fn is_structurally_similar(func_a, func_b) -> bool {
let bb_diff = abs(func_a.basic_blocks.len() - func_b.basic_blocks.len());
let complexity_diff = abs(func_a.complexity - func_b.complexity);
let size_ratio = abs(func_a.size - func_b.size) / max(func_a.size, func_b.size);
bb_diff <= 2 && complexity_diff <= 2 && size_ratio < 0.3
}Port to: crates/diff-engine/src/binary_matcher.rs
Source: rust_diff/src/lib.rs lines 375-424
Algorithm:
- Use rayon to parallelize over functions in binary A
- For each function A, find best match in binary B:
- Calculate similarity with all unmatched functions in B
- Track best match above threshold
- Collect all candidate matches
- Resolve conflicts (multiple A's matching same B)
- Add non-conflicting matches
Complexity: O(n * m) but parallelized
Port to: crates/diff-engine/src/binary_matcher.rs
Binary File (BNDB)
↓
Binary Ninja Analysis
↓
BinaryView API
↓
Feature Extraction (Python)
↓
Rust Matching Engine (C FFI)
↓
Match Results
↓
GUI Display / Export
Source File
↓
Tree-sitter Parser
↓
AST Extraction
↓
Diff Engine
↓
Comparison Manager
↓
MCP Tools
↓
AI Agent (Claude)
Binary File (BNDB) ──┐
├──> Binary Ninja Bridge
Source File ─────────┤ ↓
└──> Parser Engine
↓
Unified Function Representation
↓
Diff Engine
(Source or Binary Matcher)
↓
Comparison Manager
↓
MCP Tools
↓
AI Agent (Claude)
- Create
crates/binary-ninja-bridge/crate - Implement
BinaryLoaderfor BNDB files - Implement
FunctionExtractorfor binary functions - Port
FunctionInfo,BasicBlockInfo,InstructionInfostructs - Implement CFG hash computation
- Implement call graph hash computation
- Add error handling for Binary Ninja API
- Write unit tests
- Create
crates/diff-engine/src/binary_matcher.rs - Port exact hash matching algorithm
- Port name matching algorithm
- Port structural matching algorithm
- Port heuristic matching algorithm
- Implement binary similarity scoring
- Implement confidence calculation
- Add parallel processing support
- Write comprehensive tests
- Design MCP tool schemas for binary analysis
- Implement
compare_binariestool - Implement
list_binary_function_matchestool - Implement
get_binary_function_difftool - Implement
load_binary_in_binjatool - Implement
list_binary_functionstool - Add binary-specific resources
- Update MCP server documentation
- Extend
ComparisonManagerfor binary comparisons - Add binary comparison state management
- Implement result caching
- Add export formats (JSON, CSV, HTML)
- Write integration tests
- Performance benchmarking
- Documentation updates
- Accuracy: Binary function matching accuracy ≥ 90% (same as rust_diff)
- Performance: Binary comparison < 5 seconds for typical binaries
- MCP Compliance: All tools follow MCP specification
- Architecture: Clean separation of concerns, no tight coupling
- Testing: > 80% code coverage for new components
- Documentation: Comprehensive docs for all new features
The integration of Binary Ninja diff capabilities into smartdiff via MCP is feasible and valuable. The key is to:
- Preserve smartdiff's architecture: Use MCP layer, maintain separation of concerns
- Port proven algorithms: Bring over rust_diff's effective binary matching
- Unify interfaces: Create common abstractions for source and binary analysis
- Maintain quality: Comprehensive testing and documentation
This will enable AI agents to perform sophisticated binary analysis while maintaining the clean architecture that makes smartdiff powerful and maintainable.