feat: add ONNX operator builders and converter enhancements by dndungu · Pull Request #5 · zerfoo/zonnx

dndungu · 2026-03-03T21:59:36Z

Summary

Add TENSOR attribute support, UINT8/INT8 data types, Constant and MatMulNBits converter support
Add Softmax, Sigmoid, Erf, LayerNorm, Slice, Pad, TopK layer builders
Promote Slice/Pad/TopK input tensors to ZMF attributes
Add Conv, GlobalAveragePool, BatchNormalization, Resize layer stubs
Add Resize special case handling and skip empty optional inputs
Add MoEGate and MixtureOfExperts layer stubs

Test plan

Verify existing tests pass
Verify new operator builders integrate with zerfoo layer registry

…NBits support Critical additions for Gemma 3 and quantized model import: 1. convertAttribute: add AttributeProto_TENSOR case, converting the embedded ONNX TensorProto to a zmf.Attribute_Tensor. Required for Constant nodes. 2. convertTensorWithPath: add UINT8 and INT8 dtypes. Quantized model weights are stored as UINT8; missing this caused all 4-bit model imports to fail. 3. Initializer storage: extend to include UINT8, INT8, INT32, INT64 so that quantized weights and integer shape constants are preserved as ZMF parameters. 4. Constant node handling: detect "Constant" ONNX op in ONNXToZMFWithPath; store the embedded tensor as a ZMF parameter keyed by each output name and the node name; skip adding a graph node. Downstream nodes reference the constant as a regular parameter input. 5. MatMulNBits handling: dequantize 4-bit quantized weights to float32 [K, N] at import time using block-wise scale (and optional zero-point) unpacking. Emit a standard MatMul ZMF node so no specialised runtime kernel is needed. Both symmetric (zp=8) and asymmetric (explicit zero-point tensor) modes are handled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ONNX Slice (opset 10+) encodes starts/ends/axes/steps as input tensors; ONNX Pad (opset 11+) encodes pads and constant_value as inputs; ONNX TopK encodes K as an input tensor. Add explicit convertNode cases that lift these positional inputs into named ZMF node attributes so zerfoo runtime builders receive them directly. Tests cover all three operators plus Softmax, Sigmoid, Erf, and LayerNormalization attribute round-trips. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ad, TopK builders Register seven ONNX operator builders in the importer layer registry. The actual attribute extraction for Slice/Pad/TopK is handled upstream in the converter; these builders record the operators in the registry and serve as extension points for future runtime construction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ts (T39.5) - Resize: promote input[2] scales (FLOAT tensor) to "scales" FLOATS attribute and input[3] sizes (INT64 tensor) to "sizes" INTS attribute - Generic input loop: skip empty-string inputs (ONNX optional absent inputs) to prevent empty strings appearing as graph node inputs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n, Resize stubs (T39.5) Each stub registers its op name via init() and returns nil placeholder until full zerfoo graph integration is implemented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Models like Llama 3 store weight data in separate files (e.g., model.onnx_data) using ONNX external data storage. The importer was only reading raw_data from the protobuf, resulting in empty tensor data for externally-stored weights. Add loadExternalData to read from external files using the ONNX external_data metadata (location, offset, length).

Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes. The zmf v0.3.0 protobuf schema adds Attribute_Tensor to handle this.

Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes of type BOOL. Map onnx.TensorProto_BOOL to zmf.Tensor_BOOL in convertTensorWithPath.

Previously, INT64/INT32 initializer inputs were promoted to node attributes, losing their input position. The initializers are already converted to ZMF parameters, so they will be resolved as parameterNodes during graph construction. This fixes 727+ nodes in Gemma 3 that had missing inputs (Equal, Concat, Where, Range, Sub, Max, etc.).

Resolve go.mod/go.sum conflict: keep zmf v0.3.0 (required for Attribute_Tensor support added in 2bb6bf3).

Quantizes all FLOAT32 parameter tensors in a ZMF model in-place. Supports q4_0 (~7x compression) and q8_0 (~3.6x compression).

Quantizes float32 weights during ONNX-to-ZMF conversion. Usage: zonnx convert --quantize q4_0 model.onnx model-q4.zmf

Quantizing LayerNorm/RMSNorm weights and embeddings with Q4_0 causes NaN in forward pass. Skip tensors with "norm", "embed" in name, bias suffixes, and tensors < 1024 elements.

dndungu and others added 16 commits August 25, 2025 02:48

deps(zonnx): bump github.com/zerfoo/zmf to v0.2.0; tests passing

b8538f9

fix: convert flags

8a011a5

feat(importer/layers): add Conv, GlobalAveragePool, BatchNormalizatio…

2ea318d

…n, Resize stubs (T39.5) Each stub registers its op name via init() and returns nil placeholder until full zerfoo graph integration is implemented. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(importer/layers): add MoEGate and MixtureOfExperts stubs (T40.3)

bb132a2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

deps(zonnx): bump zmf to v0.3.0 for Attribute_Tensor support

2bb6bf3

Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes. The zmf v0.3.0 protobuf schema adds Attribute_Tensor to handle this.

fix(converter): add BOOL tensor data type support

949570c

Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes of type BOOL. Map onnx.TensorProto_BOOL to zmf.Tensor_BOOL in convertTensorWithPath.

Merge zerfoo/main into main

0c8b52a

Resolve go.mod/go.sum conflict: keep zmf v0.3.0 (required for Attribute_Tensor support added in 2bb6bf3).

feat(quantize): add Q4_0/Q8_0 weight quantization package

e9a6fc6

Quantizes all FLOAT32 parameter tensors in a ZMF model in-place. Supports q4_0 (~7x compression) and q8_0 (~3.6x compression).

feat(cli): add --quantize flag to convert command

391b15c

Quantizes float32 weights during ONNX-to-ZMF conversion. Usage: zonnx convert --quantize q4_0 model.onnx model-q4.zmf

fix(quantize): skip norm, embedding, bias, and small tensors

00ba8f2

Quantizing LayerNorm/RMSNorm weights and embeddings with Q4_0 causes NaN in forward pass. Skip tensors with "norm", "embed" in name, bias suffixes, and tensors < 1024 elements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ONNX operator builders and converter enhancements#5

feat: add ONNX operator builders and converter enhancements#5
dndungu wants to merge 16 commits intozerfoo:mainfrom
dndungu:main

dndungu commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dndungu commented Mar 3, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant