feat: add ONNX operator builders and converter enhancements#5
Open
dndungu wants to merge 16 commits intozerfoo:mainfrom
Open
feat: add ONNX operator builders and converter enhancements#5dndungu wants to merge 16 commits intozerfoo:mainfrom
dndungu wants to merge 16 commits intozerfoo:mainfrom
Conversation
…NBits support Critical additions for Gemma 3 and quantized model import: 1. convertAttribute: add AttributeProto_TENSOR case, converting the embedded ONNX TensorProto to a zmf.Attribute_Tensor. Required for Constant nodes. 2. convertTensorWithPath: add UINT8 and INT8 dtypes. Quantized model weights are stored as UINT8; missing this caused all 4-bit model imports to fail. 3. Initializer storage: extend to include UINT8, INT8, INT32, INT64 so that quantized weights and integer shape constants are preserved as ZMF parameters. 4. Constant node handling: detect "Constant" ONNX op in ONNXToZMFWithPath; store the embedded tensor as a ZMF parameter keyed by each output name and the node name; skip adding a graph node. Downstream nodes reference the constant as a regular parameter input. 5. MatMulNBits handling: dequantize 4-bit quantized weights to float32 [K, N] at import time using block-wise scale (and optional zero-point) unpacking. Emit a standard MatMul ZMF node so no specialised runtime kernel is needed. Both symmetric (zp=8) and asymmetric (explicit zero-point tensor) modes are handled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ONNX Slice (opset 10+) encodes starts/ends/axes/steps as input tensors; ONNX Pad (opset 11+) encodes pads and constant_value as inputs; ONNX TopK encodes K as an input tensor. Add explicit convertNode cases that lift these positional inputs into named ZMF node attributes so zerfoo runtime builders receive them directly. Tests cover all three operators plus Softmax, Sigmoid, Erf, and LayerNormalization attribute round-trips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ad, TopK builders Register seven ONNX operator builders in the importer layer registry. The actual attribute extraction for Slice/Pad/TopK is handled upstream in the converter; these builders record the operators in the registry and serve as extension points for future runtime construction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ts (T39.5) - Resize: promote input[2] scales (FLOAT tensor) to "scales" FLOATS attribute and input[3] sizes (INT64 tensor) to "sizes" INTS attribute - Generic input loop: skip empty-string inputs (ONNX optional absent inputs) to prevent empty strings appearing as graph node inputs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n, Resize stubs (T39.5) Each stub registers its op name via init() and returns nil placeholder until full zerfoo graph integration is implemented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Models like Llama 3 store weight data in separate files (e.g., model.onnx_data) using ONNX external data storage. The importer was only reading raw_data from the protobuf, resulting in empty tensor data for externally-stored weights. Add loadExternalData to read from external files using the ONNX external_data metadata (location, offset, length).
Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes. The zmf v0.3.0 protobuf schema adds Attribute_Tensor to handle this.
Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes of type BOOL. Map onnx.TensorProto_BOOL to zmf.Tensor_BOOL in convertTensorWithPath.
Previously, INT64/INT32 initializer inputs were promoted to node attributes, losing their input position. The initializers are already converted to ZMF parameters, so they will be resolved as parameterNodes during graph construction. This fixes 727+ nodes in Gemma 3 that had missing inputs (Equal, Concat, Where, Range, Sub, Max, etc.).
Resolve go.mod/go.sum conflict: keep zmf v0.3.0 (required for Attribute_Tensor support added in 2bb6bf3).
Quantizes all FLOAT32 parameter tensors in a ZMF model in-place. Supports q4_0 (~7x compression) and q8_0 (~3.6x compression).
Quantizes float32 weights during ONNX-to-ZMF conversion. Usage: zonnx convert --quantize q4_0 model.onnx model-q4.zmf
Quantizing LayerNorm/RMSNorm weights and embeddings with Q4_0 causes NaN in forward pass. Skip tensors with "norm", "embed" in name, bias suffixes, and tensors < 1024 elements.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan