Skip to content

Latest commit

 

History

History
454 lines (371 loc) · 15.2 KB

File metadata and controls

454 lines (371 loc) · 15.2 KB

Binary Ninja Diff Feature Comparison

Overview

This document provides a detailed comparison between the rust_diff Binary Ninja plugin and the current smartdiff MCP implementation, highlighting features to port and architectural differences.

Feature Matrix

Feature rust_diff (BN Plugin) smartdiff MCP Port Priority
Binary Analysis ✅ Full support ❌ None 🔴 Critical
Source Code Analysis ❌ None ✅ Full support ✅ Keep
MCP Protocol ❌ None ✅ Full support ✅ Keep
Binary Ninja Integration ✅ Direct API ❌ None 🔴 Critical
Function Matching ✅ Binary-optimized ✅ Source-optimized 🟡 Merge
CFG Analysis ✅ Binary CFG ✅ AST-based 🟡 Merge
Similarity Scoring ✅ Binary metrics ✅ Tree edit distance 🟡 Merge
Multi-phase Matching ✅ 4 phases ✅ Hungarian + Graph 🟡 Merge
Parallel Processing ✅ Rayon ✅ Rayon ✅ Keep
Export Formats ✅ JSON/CSV/HTML ✅ JSON 🟢 Nice-to-have
GUI ✅ Qt-based ❌ Web-based 🟢 Nice-to-have
Stateful Comparisons ❌ None ✅ Full support ✅ Keep
AI Agent Interface ❌ None ✅ MCP tools ✅ Keep

Detailed Feature Analysis

1. Binary Function Extraction (rust_diff)

Location: rust_diff/src/lib.rs lines 124-218

Key Features:

  • Extracts function metadata from Binary Ninja BinaryView
  • Captures basic blocks with instruction details
  • Computes CFG and call graph hashes
  • Calculates cyclomatic complexity

Data Structures:

pub struct FunctionInfo {
    pub name: String,
    pub address: u64,
    pub size: u64,
    pub basic_blocks: Vec<BasicBlockInfo>,
    pub instructions: Vec<InstructionInfo>,
    pub cyclomatic_complexity: u32,
    pub call_graph_hash: String,
    pub cfg_hash: String,
    pub instruction_count: usize,
    pub call_count: usize,
}

pub struct BasicBlockInfo {
    pub address: u64,
    pub size: u64,
    pub instructions: Vec<InstructionInfo>,
    pub edges: Vec<u64>,
    pub mnemonic_hash: String,
    pub instruction_count: usize,
}

pub struct InstructionInfo {
    pub address: u64,
    pub mnemonic: String,
    pub operands: Vec<String>,
    pub bytes: Vec<u8>,
    pub length: usize,
}

Port Strategy: Create equivalent structures in crates/binary-ninja-bridge/

2. Multi-Phase Function Matching (rust_diff)

Location: rust_diff/src/lib.rs lines 220-424

Phase 1: Exact Hash Matching (lines 248-286)

  • Uses combined CFG + call graph hash
  • O(n) lookup via HashMap
  • Highest confidence matches
  • Port Priority: 🔴 Critical

Phase 2: Name Matching (lines 288-327)

  • Matches functions by name
  • Validates with similarity threshold
  • Medium-high confidence
  • Port Priority: 🔴 Critical

Phase 3: Structural Matching (lines 329-373)

  • Compares basic block count, complexity, size
  • Finds best match per function
  • Medium confidence
  • Port Priority: 🔴 Critical

Phase 4: Heuristic Matching (lines 375-424)

  • Parallel processing with rayon
  • Detailed similarity calculation
  • Lowest confidence
  • Port Priority: 🟡 Important

Comparison to smartdiff:

  • smartdiff uses Hungarian algorithm for optimal matching
  • smartdiff uses tree edit distance for similarity
  • Both approaches are valid, can be merged

3. Similarity Scoring (rust_diff)

Location: rust_diff/src/lib.rs lines 435-470

Weighted Similarity Formula:

weighted_similarity = 
    cfg_similarity * 0.5 +        // 50% weight
    bb_similarity * 0.15 +         // 15% weight
    instruction_similarity * 0.10 + // 10% weight
    edge_similarity * 0.25          // 25% weight

Individual Metrics:

  1. CFG Similarity (lines 437):

    • Binary: exact hash match (1.0 or 0.0)
    • smartdiff: tree edit distance on AST
  2. Basic Block Similarity (lines 472-488):

    • Ratio of min/max basic block counts
    • Simple but effective for binaries
  3. Instruction Similarity (lines 490-506):

    • Ratio of min/max instruction counts
    • Binary-specific metric
  4. Edge Similarity (lines 508-520):

    • Based on cyclomatic complexity
    • Measures control flow similarity
  5. Name Similarity (lines 522-538):

    • Exact match, substring match, or character overlap
    • Useful for both source and binary
  6. Call Similarity (lines 540-556):

    • Ratio of function call counts
    • Binary-specific metric

Port Strategy:

  • Keep smartdiff's tree edit distance for source code
  • Add binary-specific metrics for binary analysis
  • Create unified similarity interface

4. Confidence Calculation (rust_diff)

Location: rust_diff/src/lib.rs lines 558-585

Confidence Boosting:

base_confidence = similarity

// Boost for similar sizes (< 10% difference)
if size_diff < 0.1:
    confidence += 0.1

// Boost for similar complexity (< 2 difference)
if complexity_diff < 2:
    confidence += 0.1

// Boost for similar basic block count (< 2 difference)
if bb_diff < 2:
    confidence += 0.1

// Boost for same name
if name_match:
    confidence += 0.2

confidence = min(confidence, 1.0)

Port Strategy: Add to binary matching engine

5. Binary Ninja Python Integration (rust_diff)

Location: rust_diff/__init__.py

Key Components:

  1. BinaryDiffTask (lines 42-108):

    • Background thread for long-running analysis
    • Progress reporting
    • Cancellation support
  2. Feature Extraction (lines 109-280):

    • Extracts features from BinaryView
    • Handles instruction iteration
    • Computes hashes and metrics
  3. Function Matching (lines 282-632):

    • Implements matching phases in Python
    • Uses Binary Ninja's analysis results
    • Handles edge cases
  4. GUI Integration (lines 636-778):

    • Qt-based results viewer
    • Sortable/filterable table
    • Export functionality

Port Strategy:

  • Rust implementation in crates/binary-ninja-bridge/
  • MCP tools replace Python plugin interface
  • Optional: Keep GUI as separate tool

Architecture Mapping

rust_diff Architecture

┌─────────────────────────────────────┐
│   Binary Ninja Plugin (Python)     │
│   - UI Integration                  │
│   - BinaryView Access               │
│   - Feature Extraction              │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│   Rust Core (lib.rs)                │
│   - Matching Algorithms             │
│   - Similarity Scoring              │
│   - C FFI Exports                   │
└─────────────────────────────────────┘

smartdiff MCP Architecture

┌─────────────────────────────────────┐
│   MCP Server (JSON-RPC)             │
│   - Tools (compare, list, diff)     │
│   - Resources (results, summaries)  │
│   - Comparison Manager              │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│   Diff Engine                       │
│   - Function Matcher                │
│   - Tree Edit Distance              │
│   - Hungarian Algorithm             │
│   - Change Classifier               │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│   Parser Engine                     │
│   - Tree-sitter Parsers             │
│   - AST Extraction                  │
│   - Multi-language Support          │
└─────────────────────────────────────┘

Proposed Integrated Architecture

┌─────────────────────────────────────────────────────┐
│   MCP Server (JSON-RPC)                             │
│   - Source Code Tools (existing)                    │
│   - Binary Analysis Tools (NEW)                     │
│   - Unified Resources                               │
│   - Comparison Manager (extended)                   │
└──────────────┬──────────────────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────────────────┐
│   Diff Engine (Extended)                            │
│   - Source Function Matcher (existing)              │
│   - Binary Function Matcher (NEW)                   │
│   - Unified Similarity Interface                    │
│   - Change Classifier (extended)                    │
└──────────────┬──────────────────────────────────────┘
               │
         ┌─────┴─────┐
         ▼           ▼
┌─────────────┐ ┌─────────────────────┐
│   Parser    │ │ Binary Ninja Bridge │
│   Engine    │ │ (NEW)               │
│   (existing)│ │ - BinaryView Access │
│             │ │ - Feature Extract   │
│             │ │ - CFG Analysis      │
└─────────────┘ └─────────────────────┘

Key Algorithms to Port

1. Exact Hash Matching

Source: rust_diff/src/lib.rs lines 248-286

Algorithm:

  1. Build HashMap of combined hashes for binary B
  2. For each function in binary A:
    • Compute combined hash (CFG + call graph)
    • Lookup in HashMap
    • If found and not used, create match
    • Mark as used to prevent duplicates

Complexity: O(n + m) where n, m are function counts

Port to: crates/diff-engine/src/binary_matcher.rs

2. Structural Similarity Check

Source: rust_diff/src/lib.rs lines 426-433

Algorithm:

fn is_structurally_similar(func_a, func_b) -> bool {
    let bb_diff = abs(func_a.basic_blocks.len() - func_b.basic_blocks.len());
    let complexity_diff = abs(func_a.complexity - func_b.complexity);
    let size_ratio = abs(func_a.size - func_b.size) / max(func_a.size, func_b.size);
    
    bb_diff <= 2 && complexity_diff <= 2 && size_ratio < 0.3
}

Port to: crates/diff-engine/src/binary_matcher.rs

3. Parallel Heuristic Matching

Source: rust_diff/src/lib.rs lines 375-424

Algorithm:

  1. Use rayon to parallelize over functions in binary A
  2. For each function A, find best match in binary B:
    • Calculate similarity with all unmatched functions in B
    • Track best match above threshold
  3. Collect all candidate matches
  4. Resolve conflicts (multiple A's matching same B)
  5. Add non-conflicting matches

Complexity: O(n * m) but parallelized

Port to: crates/diff-engine/src/binary_matcher.rs

Data Flow Comparison

rust_diff Data Flow

Binary File (BNDB)
    ↓
Binary Ninja Analysis
    ↓
BinaryView API
    ↓
Feature Extraction (Python)
    ↓
Rust Matching Engine (C FFI)
    ↓
Match Results
    ↓
GUI Display / Export

smartdiff MCP Data Flow

Source File
    ↓
Tree-sitter Parser
    ↓
AST Extraction
    ↓
Diff Engine
    ↓
Comparison Manager
    ↓
MCP Tools
    ↓
AI Agent (Claude)

Proposed Integrated Data Flow

Binary File (BNDB) ──┐
                     ├──> Binary Ninja Bridge
Source File ─────────┤        ↓
                     └──> Parser Engine
                              ↓
                     Unified Function Representation
                              ↓
                        Diff Engine
                     (Source or Binary Matcher)
                              ↓
                     Comparison Manager
                              ↓
                         MCP Tools
                              ↓
                      AI Agent (Claude)

Implementation Checklist

Phase 1: Binary Ninja Bridge

  • Create crates/binary-ninja-bridge/ crate
  • Implement BinaryLoader for BNDB files
  • Implement FunctionExtractor for binary functions
  • Port FunctionInfo, BasicBlockInfo, InstructionInfo structs
  • Implement CFG hash computation
  • Implement call graph hash computation
  • Add error handling for Binary Ninja API
  • Write unit tests

Phase 2: Binary Matching Engine

  • Create crates/diff-engine/src/binary_matcher.rs
  • Port exact hash matching algorithm
  • Port name matching algorithm
  • Port structural matching algorithm
  • Port heuristic matching algorithm
  • Implement binary similarity scoring
  • Implement confidence calculation
  • Add parallel processing support
  • Write comprehensive tests

Phase 3: MCP Tools

  • Design MCP tool schemas for binary analysis
  • Implement compare_binaries tool
  • Implement list_binary_function_matches tool
  • Implement get_binary_function_diff tool
  • Implement load_binary_in_binja tool
  • Implement list_binary_functions tool
  • Add binary-specific resources
  • Update MCP server documentation

Phase 4: Integration

  • Extend ComparisonManager for binary comparisons
  • Add binary comparison state management
  • Implement result caching
  • Add export formats (JSON, CSV, HTML)
  • Write integration tests
  • Performance benchmarking
  • Documentation updates

Success Metrics

  1. Accuracy: Binary function matching accuracy ≥ 90% (same as rust_diff)
  2. Performance: Binary comparison < 5 seconds for typical binaries
  3. MCP Compliance: All tools follow MCP specification
  4. Architecture: Clean separation of concerns, no tight coupling
  5. Testing: > 80% code coverage for new components
  6. Documentation: Comprehensive docs for all new features

Conclusion

The integration of Binary Ninja diff capabilities into smartdiff via MCP is feasible and valuable. The key is to:

  1. Preserve smartdiff's architecture: Use MCP layer, maintain separation of concerns
  2. Port proven algorithms: Bring over rust_diff's effective binary matching
  3. Unify interfaces: Create common abstractions for source and binary analysis
  4. Maintain quality: Comprehensive testing and documentation

This will enable AI agents to perform sophisticated binary analysis while maintaining the clean architecture that makes smartdiff powerful and maintainable.