Skip to content

feat: add ONNX operator builders and converter enhancements#5

Open
dndungu wants to merge 16 commits intozerfoo:mainfrom
dndungu:main
Open

feat: add ONNX operator builders and converter enhancements#5
dndungu wants to merge 16 commits intozerfoo:mainfrom
dndungu:main

Conversation

@dndungu
Copy link
Contributor

@dndungu dndungu commented Mar 3, 2026

Summary

  • Add TENSOR attribute support, UINT8/INT8 data types, Constant and MatMulNBits converter support
  • Add Softmax, Sigmoid, Erf, LayerNorm, Slice, Pad, TopK layer builders
  • Promote Slice/Pad/TopK input tensors to ZMF attributes
  • Add Conv, GlobalAveragePool, BatchNormalization, Resize layer stubs
  • Add Resize special case handling and skip empty optional inputs
  • Add MoEGate and MixtureOfExperts layer stubs

Test plan

  • Verify existing tests pass
  • Verify new operator builders integrate with zerfoo layer registry

dndungu and others added 16 commits August 25, 2025 02:48
…NBits support

Critical additions for Gemma 3 and quantized model import:

1. convertAttribute: add AttributeProto_TENSOR case, converting the embedded
   ONNX TensorProto to a zmf.Attribute_Tensor.  Required for Constant nodes.

2. convertTensorWithPath: add UINT8 and INT8 dtypes.  Quantized model weights
   are stored as UINT8; missing this caused all 4-bit model imports to fail.

3. Initializer storage: extend to include UINT8, INT8, INT32, INT64 so that
   quantized weights and integer shape constants are preserved as ZMF parameters.

4. Constant node handling: detect "Constant" ONNX op in ONNXToZMFWithPath;
   store the embedded tensor as a ZMF parameter keyed by each output name and
   the node name; skip adding a graph node.  Downstream nodes reference the
   constant as a regular parameter input.

5. MatMulNBits handling: dequantize 4-bit quantized weights to float32 [K, N]
   at import time using block-wise scale (and optional zero-point) unpacking.
   Emit a standard MatMul ZMF node so no specialised runtime kernel is needed.
   Both symmetric (zp=8) and asymmetric (explicit zero-point tensor) modes
   are handled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ONNX Slice (opset 10+) encodes starts/ends/axes/steps as input tensors;
ONNX Pad (opset 11+) encodes pads and constant_value as inputs; ONNX TopK
encodes K as an input tensor. Add explicit convertNode cases that lift these
positional inputs into named ZMF node attributes so zerfoo runtime builders
receive them directly. Tests cover all three operators plus Softmax, Sigmoid,
Erf, and LayerNormalization attribute round-trips.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ad, TopK builders

Register seven ONNX operator builders in the importer layer registry.
The actual attribute extraction for Slice/Pad/TopK is handled upstream in
the converter; these builders record the operators in the registry and serve
as extension points for future runtime construction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ts (T39.5)

- Resize: promote input[2] scales (FLOAT tensor) to "scales" FLOATS
  attribute and input[3] sizes (INT64 tensor) to "sizes" INTS attribute
- Generic input loop: skip empty-string inputs (ONNX optional absent inputs)
  to prevent empty strings appearing as graph node inputs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n, Resize stubs (T39.5)

Each stub registers its op name via init() and returns nil placeholder
until full zerfoo graph integration is implemented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Models like Llama 3 store weight data in separate files (e.g.,
model.onnx_data) using ONNX external data storage. The importer
was only reading raw_data from the protobuf, resulting in empty
tensor data for externally-stored weights.

Add loadExternalData to read from external files using the ONNX
external_data metadata (location, offset, length).
Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes. The zmf
v0.3.0 protobuf schema adds Attribute_Tensor to handle this.
Gemma 3 ONNX uses ConstantOfShape with TENSOR attributes of type BOOL.
Map onnx.TensorProto_BOOL to zmf.Tensor_BOOL in convertTensorWithPath.
Previously, INT64/INT32 initializer inputs were promoted to node
attributes, losing their input position. The initializers are already
converted to ZMF parameters, so they will be resolved as parameterNodes
during graph construction. This fixes 727+ nodes in Gemma 3 that had
missing inputs (Equal, Concat, Where, Range, Sub, Max, etc.).
Resolve go.mod/go.sum conflict: keep zmf v0.3.0 (required for
Attribute_Tensor support added in 2bb6bf3).
Quantizes all FLOAT32 parameter tensors in a ZMF model in-place.
Supports q4_0 (~7x compression) and q8_0 (~3.6x compression).
Quantizes float32 weights during ONNX-to-ZMF conversion.
Usage: zonnx convert --quantize q4_0 model.onnx model-q4.zmf
Quantizing LayerNorm/RMSNorm weights and embeddings with Q4_0 causes
NaN in forward pass. Skip tensors with "norm", "embed" in name, bias
suffixes, and tensors < 1024 elements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant