fix: resolve ONNX weight extraction failure#589
Conversation
Expand ALWAYS_DANGEROUS_MODULES with ~45 new modules not covered by PR #518: network (smtplib, imaplib, poplib, nntplib, xmlrpc, socketserver, ssl, requests, aiohttp), compilation (codeop, marshal, types, compileall, py_compile), FFI (_ctypes), debugging (bdb, trace), operator bypasses (_operator, functools), pickle recursion (pickle, _pickle, dill, cloudpickle, joblib), filesystem (tempfile, filecmp, fileinput, glob, distutils, pydoc, pexpect), venv/pip (venv, ensurepip, pip), threading (signal, _signal, threading), and more (webbrowser, asyncio, mmap, select, selectors, logging, syslog, tarfile, zipfile, shelve, sqlite3, _sqlite3, doctest, idlelib, lib2to3, uuid). Add uuid as novel detection (VULN-6: _get_command_stdout calls subprocess.Popen) and pkgutil.resolve_name as dynamic resolution trampoline. Mirror new modules in SUSPICIOUS_GLOBALS for defense-in-depth. Fix multi-stream opcode analysis so detailed checks run across all pickle streams. Add NEWOBJ_EX to all opcode check lists. Includes 10 regression tests covering pkgutil trampoline, uuid RCE, multi-stream exploit, NEWOBJ_EX, and spot-checks for smtplib, sqlite3, tarfile, marshal, cloudpickle, and webbrowser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e scanning - Fix test regression: track first pickle STOP position and use it for binary content scanning instead of f.tell() (which advances past multi-stream data) - Fix multi-stream resync bypass: skip junk separator bytes between streams (up to 256 bytes) instead of returning immediately - Gracefully handle ValueErrors on subsequent streams (non-pickle data after valid pickle) instead of propagating them as file-level errors - Add type hints: ScanResult return type on _scan_bytes, -> None on all test methods - Add separator-byte resync regression test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Buffer opcodes for subsequent pickle streams and discard partial streams (binary tensor data misinterpreted as opcodes) to prevent MANY_DANGEROUS_OPCODES false positives - Add resync logic to skip up to 256 non-pickle separator bytes between streams, preventing single-byte bypass of multi-stream detection - Track pickle memo (BINPUT/BINGET) for safe ML globals so REDUCE calls referencing safe callables via memo are not counted as dangerous opcodes - Reset dangerous opcode counters at STOP boundaries so each stream is evaluated independently - Use first_pickle_end_pos for binary content scanning instead of f.tell() which advances past tensor data during multi-stream parsing Validated against 5 HuggingFace models with A/B comparison to main branch — no new false positives introduced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Scanners that return early (missing optional dependency, parse error, etc.) left bytes_scanned at its default value of 0, causing the scan summary to always display "Size: 0 bytes". This affected ONNX, TFLite, Flax, XGBoost, and several other format scanners. Add a fallback in _scan_file_internal that sets bytes_scanned to the actual file size (already computed via os.path.getsize) whenever a scanner leaves it at zero. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The weight distribution scanner never ran on ONNX files because the scanner dispatch system only executes one scanner per file, and the OnnxScanner (priority 6) always won over WeightDistributionScanner (priority 13). Additionally, the standalone _extract_onnx_weights method called onnx.load() without load_external_data=False, causing ValidationError on models with missing external data files. Changes: - Add _check_weight_distribution to OnnxScanner that extracts inline weights from the already-loaded model and delegates statistical analysis to WeightDistributionScanner._analyze_weight_distributions - Fix _extract_onnx_weights to use load_external_data=False and skip external-data tensors whose raw bytes are unavailable - Add onnx import check to WeightDistributionScanner.can_handle() - Add max_array_size guard to both extraction paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughExpanded detector blocklists; added ONNX weight-distribution analysis into OnnxScanner; reworked PickleScanner for multi-stream resynchronization, NEWOBJ_EX handling, and extended allow/block lists; added regression tests for pickle hardening and a bytes_scanned fallback in core scan reporting. Changes
Sequence Diagram(s)sequenceDiagram
participant Scanner as OnnxScanner
participant ONNX as "onnx (lib)"
participant WDS as WeightDistributionScanner
participant Result as ScanResult
Scanner->>Scanner: scan() existing checks
Scanner->>Scanner: _check_weight_distribution(model, path, result)
Scanner->>ONNX: lazy import onnx
ONNX-->>Scanner: import success / ImportError
alt import success
Scanner->>Scanner: iterate initializers, filter 2D+, skip EXTERNAL, enforce max_array_size
Scanner->>Scanner: build weights_info map
alt weights_info non-empty
Scanner->>WDS: analyze(weights_info)
activate WDS
WDS->>WDS: detect anomalies
WDS-->>Scanner: anomalies list
deactivate WDS
Scanner->>Result: emit anomaly checks and update metadata
else no suitable tensors
Scanner->>Scanner: exit silently
end
else import fails
Scanner->>Scanner: exit silently
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/onnx_scanner.py`:
- Around line 441-443: The except blocks that currently call logger.debug(...)
are swallowing failures in the ONNX weight extraction/distribution analysis and
must surface degraded coverage to the scan result; replace those silent
debug-only handlers (the logger.debug calls in the weight
extraction/distribution analysis code paths) with logic that (1) logs the full
exception at warning/error level and (2) updates the ScanResult (e.g., set a
coverage_degraded flag or call scan_result.add_finding()/add_warning with a
clear message and exception details) so downstream consumers know the weight
analysis failed; apply this change to both locations where logger.debug is used
for these exceptions.
- Line 177: Add regression tests in test_onnx_scanner.py that exercise the ONNX
scanner's _check_weight_distribution path: create small ONNX models or mocked
TensorProto objects and call the ONNX scanning entry (or directly invoke
_check_weight_distribution via a scanner instance) to verify external-data
tensors are skipped (simulate TensorProto with external_data set), verify
max_array_size size guard blocks large tensors (set a small max_array_size in
scanner config and supply a large tensor), and verify detection behavior for
benign vs malicious-style weight distributions (supply normal random weights and
crafted anomalous patterns and assert expected result/anomaly flags). Reference
the _check_weight_distribution method and the scanner instance used by
test_onnx_scanner.py to locate where to add these focused cases.
- Around line 434-438: The code currently calls
onnx.numpy_helper.to_array(initializer) (in onnx_scanner.py within the loop
handling initializers) before checking max_array_size, which can allocate huge
arrays; change the logic to estimate the byte size first by converting
initializer.dims and the element dtype via
tensor_dtype_to_np_dtype(initializer.data_type) (or equivalent) to compute
estimated_bytes = np.dtype(np_dtype).itemsize * product(initializer.dims) and
only call onnx.numpy_helper.to_array(initializer) if estimated_bytes <=
max_array_size (and max_array_size > 0); apply the same pre-check pattern in
weight_distribution_scanner.py where to_array is used so oversized tensors are
skipped before materialization.
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 1549-1591: The _safe_memo map is never cleared at STOP and
individual memo slots can carry stale True values; modify the STOP handling in
the loop (the branch where opcode.name == "STOP") to clear or reinitialize
_safe_memo (e.g., _safe_memo.clear()) so memo safety can't be carried across
STOP boundaries, and in the BINPUT/LONG_BINPUT handling (the block that looks up
prev GLOBAL/STACK_GLOBAL and writes _safe_memo[arg]) set a default safe state
first (e.g., _safe_memo[arg] = False) before reclassifying based on
_is_safe_ml_global, and apply the same default-reset fix to the analogous block
later in the file (the second BINPUT/LONG_BINPUT handling around the later
lines) so no memo slot retains stale True values.
In `@modelaudit/scanners/weight_distribution_scanner.py`:
- Around line 433-437: The current size guard checks array.nbytes after calling
onnx.numpy_helper.to_array() in the ONNX extractor; change this to estimate size
before materialization by converting initializer.data_type with
onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type), computing
element_count from initializer.dims, and comparing element_count *
np_dtype.itemsize against self.max_array_size before calling
onnx.numpy_helper.to_array(); apply the same pre-conversion-size check logic to
the other extractors (_extract_tensorflow_checkpoint_weights and
_extract_tensorflow_pb_weights) so behavior is consistent (or add a clear
comment documenting an intentional post-conversion check if you choose not to
change all three).
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 647-695: Each test should assert that the scan itself succeeded in
addition to detecting issues: in test_smtplib_blocked, test_sqlite3_blocked,
test_tarfile_blocked, test_marshal_blocked, test_cloudpickle_blocked, and
test_webbrowser_blocked, add an assertion after creating result (result =
self._scan_bytes(...)) such as "assert result.success" (or "assert
result.success is True") before checking result.has_errors and result.issues so
scan failures can't be hidden by later issue assertions.
ℹ️ Review info
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
modelaudit/detectors/suspicious_symbols.pymodelaudit/scanners/onnx_scanner.pymodelaudit/scanners/pickle_scanner.pymodelaudit/scanners/weight_distribution_scanner.pytests/scanners/test_pickle_scanner.py
| self._check_custom_ops(model, path, result) | ||
| self._check_external_data(model, path, result) | ||
| self._check_tensor_sizes(model, path, result) | ||
| self._check_weight_distribution(model, path, result) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for ONNX scanner regression tests tied to new weight-distribution behavior
echo "=== Finding ONNX test files ==="
fd -t f "test*onnx*.py" tests 2>/dev/null || echo "No exact match, searching more broadly..."
fd -t f "*.py" tests -E __pycache__ | grep -i onnx
echo -e "\n=== Searching for weight-distribution and external-data test coverage ==="
rg -n "load_external_data|TensorProto\.EXTERNAL|max_array_size|weight.distribution|_check_weight_distribution" tests -S 2>/dev/null
echo -e "\n=== Getting context from onnx_scanner.py around line 177 ==="
wc -l modelaudit/scanners/onnx_scanner.pyRepository: promptfoo/modelaudit
Length of output: 2110
🏁 Script executed:
#!/bin/bash
# Get specific line context from onnx_scanner.py
echo "=== Lines around 177 in onnx_scanner.py ==="
sed -n '170,185p' modelaudit/scanners/onnx_scanner.py
echo -e "\n=== Lines 400-467 in onnx_scanner.py ==="
sed -n '400,467p' modelaudit/scanners/onnx_scanner.pyRepository: promptfoo/modelaudit
Length of output: 3693
🏁 Script executed:
#!/bin/bash
# Check for test files that test onnx_scanner
echo "=== Searching for test files that import or test onnx_scanner ==="
rg -n "onnx_scanner|OnnxScanner" tests -S -t py
echo -e "\n=== Check test directory structure ==="
fd -t f "*.py" tests | head -20Repository: promptfoo/modelaudit
Length of output: 3044
🏁 Script executed:
#!/bin/bash
# Read the ONNX scanner test file to check weight-distribution coverage
echo "=== Content of test_onnx_scanner.py ==="
cat -n tests/scanners/test_onnx_scanner.pyRepository: promptfoo/modelaudit
Length of output: 4854
🏁 Script executed:
#!/bin/bash
# Check weight distribution scanner tests to understand test patterns
echo "=== Weight distribution scanner test file (first 150 lines) ==="
head -150 tests/scanners/test_weight_distribution_scanner.py
echo -e "\n=== Check if ONNX weight-distribution tests exist ==="
rg -n "_check_weight_distribution|weight.distribution.*onnx|test.*weight.*onnx" tests/scanners/test_onnx_scanner.pyRepository: promptfoo/modelaudit
Length of output: 4658
🏁 Script executed:
#!/bin/bash
# Search for tests with benign and malicious samples related to weights/ONNX
echo "=== Searching for benign/malicious sample tests ==="
rg -n "benign|malicious|anomal" tests/scanners/test_onnx_scanner.py -A 3Repository: promptfoo/modelaudit
Length of output: 115
🏁 Script executed:
#!/bin/bash
# Double-check: Search for any weight distribution tests specifically in ONNX context
echo "=== Search for weight-distribution tests anywhere in test_onnx_scanner.py ==="
grep -n "weight" tests/scanners/test_onnx_scanner.py || echo "No 'weight' references found in test_onnx_scanner.py"
echo -e "\n=== Full count of test functions in test_onnx_scanner.py ==="
grep -c "^def test_" tests/scanners/test_onnx_scanner.py
echo -e "\n=== Check test_safetensors_optimization.py for weight_distribution context ==="
grep -B 2 -A 5 "test_weight_distribution_scanner_skips_safetensors" tests/test_safetensors_optimization.pyRepository: promptfoo/modelaudit
Length of output: 703
Add regression tests for ONNX weight-distribution analysis, especially external-data tensor handling and malicious weight patterns.
The _check_weight_distribution() method (lines 400–467) integrates weight distribution scanning into ONNX processing with external-data tensor skipping and size guards, but test_onnx_scanner.py has no tests covering this path. Add focused tests for:
- External-data tensors (skipped safely during analysis)
- Size guard enforcement (max_array_size limiting)
- Benign and malicious-style weight distribution samples to ensure anomaly detection works
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelaudit/scanners/onnx_scanner.py` at line 177, Add regression tests in
test_onnx_scanner.py that exercise the ONNX scanner's _check_weight_distribution
path: create small ONNX models or mocked TensorProto objects and call the ONNX
scanning entry (or directly invoke _check_weight_distribution via a scanner
instance) to verify external-data tensors are skipped (simulate TensorProto with
external_data set), verify max_array_size size guard blocks large tensors (set a
small max_array_size in scanner config and supply a large tensor), and verify
detection behavior for benign vs malicious-style weight distributions (supply
normal random weights and crafted anomalous patterns and assert expected
result/anomaly flags). Reference the _check_weight_distribution method and the
scanner instance used by test_onnx_scanner.py to locate where to add these
focused cases.
- Upgrade logger.debug() to logger.warning() in weight extraction/analysis except blocks so scan output signals reduced coverage on failure - Pre-check estimated array size before calling to_array() to avoid materializing oversized tensors into memory - Assert result.success in blocked-module pickle tests before checking result.has_errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
modelaudit/scanners/onnx_scanner.py (2)
177-177:⚠️ Potential issue | 🟠 MajorAdd ONNX regression tests for the newly integrated weight-distribution path.
Line 177 introduces scanner behavior that should be regression-tested (external-data tensor skipping,
max_array_sizeguard behavior, and anomaly propagation intoScanResultmetadata/checks). This path is security-relevant and currently not covered in the provided test changes.As per coding guidelines: "tests/**/*.py: Add comprehensive tests including edge cases and regression coverage for scanner changes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/onnx_scanner.py` at line 177, Add regression tests for the newly exercised weight-distribution scan path by creating tests that call the ONNX scanner's _check_weight_distribution via the public scan entry (e.g., using the scanner class that invokes _check_weight_distribution) and assert: (1) external-data tensors are skipped (construct an ONNX ModelProto with external-data tensors and verify no weight reads occur), (2) max_array_size guard triggers appropriately (create a large tensor > max_array_size and assert the scanner avoids loading contents and records the guard), and (3) anomalies detected by _check_weight_distribution are propagated into the ScanResult metadata and checks (inspect the returned ScanResult object for the expected anomaly entries and check flags). Use existing test utilities to build small ONNX models and ensure tests cover both normal and edge cases to provide regression coverage.
422-449:⚠️ Potential issue | 🟠 MajorHarden weight extraction robustness to prevent scan interruption and numeric overflow.
The outer
tryblock (lines 422-449) aborts extraction for all remaining initializers if any single tensor fails, reducing security coverage. Additionally,numpy.prod(initializer.dims)can overflow with large dimensions—for example,np.prod([2^32, 2^32])returns0instead of2^64, causing size estimates to wrap and bypass themax_array_sizeguard.Both issues should be fixed by moving exception handling inside the loop (allowing continued processing) and switching to
math.prodfor arbitrary-precision arithmetic. The new weight distribution path is also untested and should include regression tests with representative ONNX models.🔧 Proposed fix
+ from math import prod + weights_info: dict[str, Any] = {} - try: - for initializer in model.graph.initializer: - self.check_interrupted() + raw_max_array_size = self.config.get("max_array_size", 100 * 1024 * 1024) + try: + max_array_size = int(raw_max_array_size) + except (TypeError, ValueError): + max_array_size = 100 * 1024 * 1024 + + for initializer in model.graph.initializer: + try: + self.check_interrupted() # Skip external-data tensors — their bytes were not loaded. if initializer.data_location == onnx.TensorProto.EXTERNAL: continue # Only 2-D+ tensors are interesting for distribution analysis. if len(initializer.dims) < 2: continue # Pre-check estimated size before materializing the array. import numpy as np - estimated_bytes = ( - np.prod(initializer.dims) - * np.dtype(onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type)).itemsize - ) + numel = prod(int(dim) for dim in initializer.dims) + itemsize = int(np.dtype(onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type)).itemsize) + estimated_bytes = numel * itemsize if max_array_size and max_array_size > 0 and estimated_bytes > max_array_size: continue array = onnx.numpy_helper.to_array(initializer) weights_info[initializer.name] = array - except Exception as e: - logger.warning(f"Failed to extract ONNX weights for distribution analysis: {e}") + except Exception as e: + logger.warning( + "Failed to extract ONNX initializer '%s' for distribution analysis: %s", + initializer.name, + e, + ) + continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/onnx_scanner.py` around lines 422 - 449, The current outer try/except around iterating model.graph.initializer aborts processing of all initializers on any failure and uses np.prod which can overflow; change to per-initializer error handling by moving the try/except inside the for initializer in model.graph.initializer loop (catch and log exceptions per-initializer while continuing), and replace np.prod(initializer.dims) with math.prod(initializer.dims) to use arbitrary-precision multiplication (import math where needed); keep the existing max_array_size guard logic but compute estimated_bytes using math.prod and the dtype itemsize, and add regression tests exercising the weights extraction/distribution path with representative ONNX models (including large-dimension cases and an initializer that intentionally raises) to ensure robustness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modelaudit/scanners/onnx_scanner.py`:
- Line 177: Add regression tests for the newly exercised weight-distribution
scan path by creating tests that call the ONNX scanner's
_check_weight_distribution via the public scan entry (e.g., using the scanner
class that invokes _check_weight_distribution) and assert: (1) external-data
tensors are skipped (construct an ONNX ModelProto with external-data tensors and
verify no weight reads occur), (2) max_array_size guard triggers appropriately
(create a large tensor > max_array_size and assert the scanner avoids loading
contents and records the guard), and (3) anomalies detected by
_check_weight_distribution are propagated into the ScanResult metadata and
checks (inspect the returned ScanResult object for the expected anomaly entries
and check flags). Use existing test utilities to build small ONNX models and
ensure tests cover both normal and edge cases to provide regression coverage.
- Around line 422-449: The current outer try/except around iterating
model.graph.initializer aborts processing of all initializers on any failure and
uses np.prod which can overflow; change to per-initializer error handling by
moving the try/except inside the for initializer in model.graph.initializer loop
(catch and log exceptions per-initializer while continuing), and replace
np.prod(initializer.dims) with math.prod(initializer.dims) to use
arbitrary-precision multiplication (import math where needed); keep the existing
max_array_size guard logic but compute estimated_bytes using math.prod and the
dtype itemsize, and add regression tests exercising the weights
extraction/distribution path with representative ONNX models (including
large-dimension cases and an initializer that intentionally raises) to ensure
robustness.
ℹ️ Review info
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
modelaudit/scanners/onnx_scanner.pytests/scanners/test_pickle_scanner.py
Resolve conflicts in pickle_scanner.py: - ALWAYS_DANGEROUS_GLOBALS: keep both branch additions (pkgutil, uuid) and main additions (cProfile, ctypes, etc.) - ALWAYS_DANGEROUS_MODULES: merge branch's expanded blocklist with main's PR #518 modules and linecache/logging.config notes - REDUCE opcode handling: use main's resolved_callables approach (handles memo indirection via symbolic stack simulation), replacing manual lookback loop
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelaudit/scanners/pickle_scanner.py (1)
2132-2138:⚠️ Potential issue | 🟡 MinorUse
resolved_callablesfor NEWOBJ/NEWOBJ_EX execution-pattern detection.At Line 2132, this branch requires
argto bestr, so it doesn’t leverage the callable targets already resolved forOBJ/NEWOBJ/NEWOBJ_EX. That leaves this detection path weaker than intended.🔧 Proposed fix
- # Check for INST or OBJ opcodes which can also be used for code execution - if opcode.name in ["INST", "OBJ", "NEWOBJ", "NEWOBJ_EX"] and isinstance(arg, str): - return { - "pattern": f"{opcode.name}_EXECUTION", - "argument": arg, - "position": pos, - "opcode": opcode.name, - } + # Check for INST/OBJ/NEWOBJ/NEWOBJ_EX opcodes which can also be used for code execution + if opcode.name in {"INST", "OBJ", "NEWOBJ", "NEWOBJ_EX"}: + resolved = resolved_callables.get(i) + if resolved: + mod, func = resolved + if _is_dangerous_ref(mod, func): + return { + "pattern": f"{opcode.name}_EXECUTION", + "module": mod, + "function": func, + "position": pos, + "opcode": opcode.name, + } + elif opcode.name == "INST" and isinstance(arg, str): + return { + "pattern": f"{opcode.name}_EXECUTION", + "argument": arg, + "position": pos, + "opcode": opcode.name, + }Based on learnings: "Preserve or strengthen security detections; test both benign and malicious samples when modifying scanners".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/pickle_scanner.py` around lines 2132 - 2138, The branch that returns an execution-pattern for opcode names ["INST","OBJ","NEWOBJ","NEWOBJ_EX"] should not require arg to be a str and must consult the already-resolved callable targets (resolved_callables) for NEWOBJ/NEWOBJ_EX; update the check in the block around opcode.name handling so that for NEWOBJ and NEWOBJ_EX you inspect resolved_callables (matching entries for the current pos/arg/stack position) and use those callable names/targets when building the returned dict (pattern, argument, position, opcode) instead of only using arg when it's a str; ensure you still handle OBJ/INST as before and fall back safely when no resolved callable is available.
♻️ Duplicate comments (1)
modelaudit/scanners/weight_distribution_scanner.py (1)
491-495:⚠️ Potential issue | 🟠 MajorMove the ONNX size guard before tensor materialization.
At Line 491,
onnx.numpy_helper.to_array()materializes the tensor before the size check at Line 494, so oversized tensors can still trigger memory spikes and reduce scan coverage.🔧 Proposed fix
for initializer in model.graph.initializer: # Skip external-data tensors — their raw bytes are not available # when loaded with load_external_data=False and calling to_array # on them would raise a ValidationError. if initializer.data_location == onnx.TensorProto.EXTERNAL: continue if len(initializer.dims) >= 2: + if self.max_array_size and self.max_array_size > 0: + # Pre-check approximate size before materializing tensor + elem_count = 1 + for dim in initializer.dims: + elem_count *= int(dim) + np_dtype = onnx.helper.tensor_dtype_to_np_dtype(initializer.data_type) + estimated_nbytes = elem_count * np_dtype.itemsize + if estimated_nbytes > self.max_array_size: + continue array = onnx.numpy_helper.to_array( # type: ignore[possibly-unresolved-reference] initializer, ) if self.max_array_size and self.max_array_size > 0 and array.nbytes > self.max_array_size: continue weights_info[initializer.name] = array#!/bin/bash # Verify guard ordering around ONNX tensor materialization and size checks. rg -n -C3 "numpy_helper\.to_array|max_array_size|initializer\.dims" modelaudit/scanners/weight_distribution_scanner.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelaudit/scanners/weight_distribution_scanner.py` around lines 491 - 495, The size guard is applied after materializing the ONNX tensor via onnx.numpy_helper.to_array, causing large tensors to be loaded into memory; instead, compute the tensor byte size from the initializer metadata and apply the max_array_size check before calling numpy_helper.to_array. Update the code around numpy_helper.to_array (referencing initializer, max_array_size, and onnx.numpy_helper.to_array) to calculate nbytes from initializer.dims and its data type / raw_data length (or use initializer.raw_data length when present) and skip materialization (continue) if the computed size exceeds self.max_array_size, only calling to_array when the size check passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/detectors/suspicious_symbols.py`:
- Around line 173-237: Add explicit regression tests that assert the new
suspicious module patterns are detected: create unit tests for
cloudpickle/joblib recursive pickle loading, for functools.partial and
functools.reduce callable-chain cases, for threading/_thread/_signal
thread-spawning patterns, for tarfile/zipfile path-traversal (zip-slip) cases,
and for logging.config.fileConfig execution; also add tests covering
cProfile/profile/pdb/timeit.timeit and dynamic-loading/compilation patterns
(pkgutil.resolve_name, marshal.loads, py_compile, compileall). Place these cases
into the existing suspicious-symbols detector test module and include malicious
pickle/asset payloads to exercise detection paths so each named symbol
(cloudpickle, joblib, functools.partial/reduce, threading, _thread, _signal,
tarfile, zipfile, logging.config.fileConfig, cProfile, profile, pdb,
timeit.timeit, pkgutil.resolve_name, marshal.loads, py_compile, compileall) is
explicitly asserted as flagged by the scanner.
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 723-740: Add a companion test to ensure NEWOBJ_EX doesn't
false-positive: create a new test (e.g., test_newobj_ex_benign_class) modeled on
test_newobj_ex_dangerous_class that crafts a pickle for a benign global class
(for example use GLOBAL 'builtins\nlist\n' or another harmless class) with
EMPTY_TUPLE, EMPTY_DICT, NEWOBJ_EX and STOP; call self._scan_bytes(data) and
assert result.success and that there are no CRITICAL issues referencing "os" (or
other malicious markers), ensuring the scanner accepts this benign NEWOBJ_EX
path to prevent overblocking regressions.
---
Outside diff comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 2132-2138: The branch that returns an execution-pattern for opcode
names ["INST","OBJ","NEWOBJ","NEWOBJ_EX"] should not require arg to be a str and
must consult the already-resolved callable targets (resolved_callables) for
NEWOBJ/NEWOBJ_EX; update the check in the block around opcode.name handling so
that for NEWOBJ and NEWOBJ_EX you inspect resolved_callables (matching entries
for the current pos/arg/stack position) and use those callable names/targets
when building the returned dict (pattern, argument, position, opcode) instead of
only using arg when it's a str; ensure you still handle OBJ/INST as before and
fall back safely when no resolved callable is available.
---
Duplicate comments:
In `@modelaudit/scanners/weight_distribution_scanner.py`:
- Around line 491-495: The size guard is applied after materializing the ONNX
tensor via onnx.numpy_helper.to_array, causing large tensors to be loaded into
memory; instead, compute the tensor byte size from the initializer metadata and
apply the max_array_size check before calling numpy_helper.to_array. Update the
code around numpy_helper.to_array (referencing initializer, max_array_size, and
onnx.numpy_helper.to_array) to calculate nbytes from initializer.dims and its
data type / raw_data length (or use initializer.raw_data length when present)
and skip materialization (continue) if the computed size exceeds
self.max_array_size, only calling to_array when the size check passes.
ℹ️ Review info
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
modelaudit/detectors/suspicious_symbols.pymodelaudit/scanners/pickle_scanner.pymodelaudit/scanners/weight_distribution_scanner.pytests/scanners/test_pickle_scanner.py
| def test_newobj_ex_dangerous_class(self) -> None: | ||
| """NEWOBJ_EX opcode with a dangerous class should be flagged.""" | ||
| # Craft pickle: PROTO 4 | GLOBAL 'os _wrap_close' | EMPTY_TUPLE | EMPTY_DICT | NEWOBJ_EX | STOP | ||
| # Protocol 4 is needed for NEWOBJ_EX (opcode 0x92) | ||
| proto = b"\x80\x04" | ||
| global_op = b"c" + b"os\n_wrap_close\n" | ||
| empty_tuple = b")" | ||
| empty_dict = b"}" | ||
| newobj_ex = b"\x92" # NEWOBJ_EX opcode | ||
| stop = b"." | ||
| data = proto + global_op + empty_tuple + empty_dict + newobj_ex + stop | ||
|
|
||
| result = self._scan_bytes(data) | ||
| assert result.success | ||
| assert result.has_errors | ||
| os_issues = [i for i in result.issues if i.severity == IssueSeverity.CRITICAL and "os" in i.message.lower()] | ||
| assert os_issues, f"Expected CRITICAL os issue for NEWOBJ_EX, got: {[i.message for i in result.issues]}" | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add a benign NEWOBJ_EX counterpart test to lock false-positive behavior.
This section validates the malicious NEWOBJ_EX path well; adding one safe NEWOBJ_EX case would prevent overblocking regressions as allowlists/blocklists evolve.
Based on learnings: "Add comprehensive tests including edge cases and regression coverage for scanner changes" and "Preserve or strengthen security detections; test both benign and malicious samples when modifying scanners".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/scanners/test_pickle_scanner.py` around lines 723 - 740, Add a
companion test to ensure NEWOBJ_EX doesn't false-positive: create a new test
(e.g., test_newobj_ex_benign_class) modeled on test_newobj_ex_dangerous_class
that crafts a pickle for a benign global class (for example use GLOBAL
'builtins\nlist\n' or another harmless class) with EMPTY_TUPLE, EMPTY_DICT,
NEWOBJ_EX and STOP; call self._scan_bytes(data) and assert result.success and
that there are no CRITICAL issues referencing "os" (or other malicious markers),
ensuring the scanner accepts this benign NEWOBJ_EX path to prevent overblocking
regressions.
- pickle_scanner: Clear _safe_memo at STOP opcode boundaries to prevent stale True values from carrying across pickle streams and bypassing dangerous-global checks. Default memo slots to False before reclassifying in BINPUT/LONG_BINPUT handling. - pickle_scanner: Continue scanning after partial stream errors instead of terminating, so malicious payloads hidden after a partial stream are not missed. - onnx_scanner: Move try/except inside the initializer loop so a single failing tensor does not abort extraction for all remaining tensors. Replace np.prod with math.prod for arbitrary-precision arithmetic to prevent silent integer overflow on large dimension products. Surface weight distribution analysis failures in ScanResult. - weight_distribution_scanner: Pre-check estimated tensor byte size before calling onnx.numpy_helper.to_array() to avoid memory spikes from oversized arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Around line 52-55: The resync limit in _genops_with_fallback (using
_MAX_RESYNC_BYTES and resync_skipped) allows an attacker to hide a second pickle
stream after >256 padding bytes; change the logic so it does not unconditionally
stop searching after _MAX_RESYNC_BYTES but instead either (a) make
_MAX_RESYNC_BYTES configurable and significantly larger or (b) loop until EOF
while attempting to validate potential stream headers, only giving up on
definite EOF or a configurable hard limit, and ensure resync_skipped is
correctly reset when a candidate stream is found; update all uses in
_genops_with_fallback (and the similar block around lines 107-116) to perform a
robust scan-for-next-stream routine rather than aborting after 256 bytes.
In `@modelaudit/scanners/weight_distribution_scanner.py`:
- Around line 473-509: The try/except around the whole ONNX load-and-loop lets
one malformed initializer abort processing; move the exception handling to
per-initializer level inside the for initializer in model.graph.initializer loop
(around the size-estimate and onnx.numpy_helper.to_array call) so each
initializer is individually guarded, log a debug message including
initializer.name, path and the caught exception, and continue to the next
initializer; keep the outer model = onnx.load(...) call as-is but ensure
weights_info[initializer.name] is only set after a successful to_array
conversion.
ℹ️ Review info
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
modelaudit/scanners/onnx_scanner.pymodelaudit/scanners/pickle_scanner.pymodelaudit/scanners/weight_distribution_scanner.py
| # Maximum number of consecutive non-pickle bytes to skip when resyncing | ||
| _MAX_RESYNC_BYTES = 256 | ||
| resync_skipped = 0 | ||
| # Track whether we've successfully parsed at least one complete stream |
There was a problem hiding this comment.
Resync window is bypassable with >256 separator bytes.
_genops_with_fallback stops searching for the next pickle stream after 256 non-pickle bytes. A crafted file can place a malicious second stream after 257+ bytes of padding and evade multi-stream analysis.
🔧 Proposed fix
-def _genops_with_fallback(file_obj: BinaryIO, *, multi_stream: bool = False) -> Any:
+def _genops_with_fallback(
+ file_obj: BinaryIO,
+ *,
+ multi_stream: bool = False,
+ max_resync_bytes: int = 64 * 1024,
+) -> Any:
@@
- _MAX_RESYNC_BYTES = 256
@@
- if resync_skipped >= _MAX_RESYNC_BYTES:
+ if resync_skipped >= max_resync_bytes:
returnAlso applies to: 107-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelaudit/scanners/pickle_scanner.py` around lines 52 - 55, The resync limit
in _genops_with_fallback (using _MAX_RESYNC_BYTES and resync_skipped) allows an
attacker to hide a second pickle stream after >256 padding bytes; change the
logic so it does not unconditionally stop searching after _MAX_RESYNC_BYTES but
instead either (a) make _MAX_RESYNC_BYTES configurable and significantly larger
or (b) loop until EOF while attempting to validate potential stream headers,
only giving up on definite EOF or a configurable hard limit, and ensure
resync_skipped is correctly reset when a candidate stream is found; update all
uses in _genops_with_fallback (and the similar block around lines 107-116) to
perform a robust scan-for-next-stream routine rather than aborting after 256
bytes.
The test_real_joblib_compressed test was failing on Windows CI (Python 3.11) because compressed joblib files produce "Unable to parse pickle file: MemoryError" rather than an opcode-related message. Broaden the keyword match to accept opcode, parse, format, memory, or unable. Validated with 5 ONNX models (mobilenetv2-7, resnet50-v2-7, squeezenet1.0-12, shufflenet-7, efficientnet-lite4-11) -- all scan clean with no false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move per-initializer try/except inside loop in weight_distribution_scanner so one bad initializer does not abort extraction for remaining layers - Move numpy import outside the per-initializer loop in onnx_scanner - Track and expose extraction_failures count in scan metadata - Increase pickle multi-stream resync window from 256 to 4096 bytes to prevent bypass via separator byte injection - Fix formatting in test_real_world_dill_joblib.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Missing external data files are common for HuggingFace-hosted ONNX models and are NOT a security issue. Reporting them as CRITICAL causes false positive security alerts. Also deduplicate per-file (not per-tensor) to avoid flooding scan output with hundreds of identical warnings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…er merge The merge of main into fix/onnx-weights incorrectly split the try block, placing extraction code after a return statement (unreachable) and creating a duplicate except clause. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
OnnxScanner(priority 6) always won overWeightDistributionScanner(priority 13) for.onnxfiles, so weight distribution analysis never ran. Additionally,_extract_onnx_weightscalledonnx.load()withoutload_external_data=False, causingValidationErroron models with missing external data files (common with HuggingFace downloads)._check_weight_distribution()toOnnxScannerthat extracts inline weights from the already-loaded model and delegates statistical analysis toWeightDistributionScanner._analyze_weight_distributions(). Also fix the standalone extraction method to useload_external_data=Falseand skip external-data tensors.max_array_sizeguard (default 100MB) to both extraction paths,onnximport check tocan_handle(), and proper handling for external-data-only models.Test plan
kennysuper007/vispo-models)Xenova/all-MiniLM-L6-v2(model.onnx: 40 layers analyzed)model_quantized.onnx(40 layers, 25 anomalies detected)model.int8.onnx(352 layers, 0 anomalies)pytest tests/ -x -q)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests