Gemma 3 - Zerfoo Implementation

This repository contains a Gemma 3 language model implementation using the Zerfoo ML framework. The implementation has been updated to use the new Zerfoo architecture with ZMF (Zerfoo Model Format) support.

Features

Modern Zerfoo Architecture: Uses the latest Zerfoo framework with generic tensor types and graph-based computation
ZMF Model Loading: Supports loading models from ZMF format instead of legacy ONNX
Tokenizer Integration: Includes SentencePiece tokenizer support via github.com/sugarme/tokenizer
Comprehensive Testing: Unit tests for all components plus integration tests
CPU Engine Support: Optimized for CPU inference using Zerfoo's CPU engine

Architecture

Core Components

Model (gemma/gemma.go): Main model combining embedding, transformer stack, and LM head
GemmaStack (gemma/gemma_stack.go): Multi-layer transformer implementation with local attention
Tokenizer (tokenizer/tokenizer.go): SentencePiece tokenizer wrapper for text processing

Key Features

Generic Tensor Types: Uses TensorNumeric[T] for type-safe numeric operations
Local Attention: Implements sliding window attention with configurable window size
Weight Sharing: Shares weights between embedding and LM head layers
Flexible Architecture: Configurable number of layers, heads, dimensions, etc.

Usage

Basic Inference

package main

import (
    "context"
    "github.com/zerfoo/gemma/tokenizer"
    "github.com/zerfoo/zerfoo/compute"
    "github.com/zerfoo/zerfoo/layers/registry"
    "github.com/zerfoo/zerfoo/model"
    "github.com/zerfoo/zerfoo/numeric"
)

func main() {
    // Initialize layer registry
    registry.RegisterAll()

    // Load ZMF model
    zmfModel, err := model.LoadZMF("data/model_with_weights.zmf")
    if err != nil {
        panic(err)
    }

    // Build Zerfoo graph
    ops := numeric.Float32Ops{}
    engine := compute.NewCPUEngine[float32](ops)
    graph, err := model.BuildFromZMF[float32](engine, ops, zmfModel)
    if err != nil {
        panic(err)
    }

    // Initialize tokenizer
    tokenizer, err := tokenizer.NewGemmaTokenizer("data/tokenizer.json")
    if err != nil {
        panic(err)
    }

    // Tokenize input
    tokens, err := tokenizer.Encode("What is the meaning of life?")
    if err != nil {
        panic(err)
    }

    // Run inference
    output, err := graph.Forward(context.Background(), /* inputs */)
    if err != nil {
        panic(err)
    }
}

Model Creation (Programmatic)

import "github.com/zerfoo/gemma/gemma"

// Create a Gemma model programmatically
model, err := gemma.New[float32](
    engine,           // Compute engine
    ops,              // Numeric operations
    vocabSize,        // Vocabulary size
    hiddenSize,       // Hidden dimension
    numHeads,         // Number of attention heads
    numKeyValueHeads, // Number of key-value heads
    ffnDim,           // Feed-forward dimension
    epsilon,          // RMS norm epsilon
    base,             // Rotary embedding base
    maxSeqLen,        // Maximum sequence length
    numLayers,        // Number of layers
    localWindowSize,  // Local attention window size
    globalInterval,   // Global attention interval
)

Testing

The project includes comprehensive tests:

Unit Tests

# Test model components
go test ./gemma/...

# Test tokenizer
go test ./tokenizer/...

Integration Tests

# API integration (fast)
go test -run TestAPIIntegration

# Model architecture compatibility
go test -run TestModelArchitectureCompatibility

# End-to-end test (requires model files)
go test -run TestGemma3EndToEnd

Test Coverage

Gemma Stack: Tests forward pass with mock data
Gemma Model: Tests full model forward pass
Tokenizer API: Tests special token handling and encoding
API Integration: Tests component compatibility and tensor operations
Architecture Tests: Validates model creation with various parameters

Model Files

The implementation expects these files in the data/ directory:

model_with_weights.zmf: ZMF model file with full weights
tokenizer.json: SentencePiece tokenizer configuration
Additional config files (optional)

Dependencies

require (
    github.com/sugarme/tokenizer v0.2.2
    github.com/zerfoo/zerfoo v0.2.0
)

Performance

CPU Optimized: Uses Zerfoo's optimized CPU engine
Memory Efficient: Supports various quantization formats
Scalable: Configurable model size and attention patterns

Known Issues

Tokenizer Compatibility: Some Gemma tokenizer.json files may not be fully compatible with the sugarme/tokenizer library
Large Model Loading: Full-size models may require significant memory and loading time
Test Timeouts: Integration tests with large models may timeout

Contributing

Follow the existing code style
Add tests for new features
Update documentation
Ensure all tests pass before submitting PRs

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
tokenizer		tokenizer
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
integration_test.go		integration_test.go
main.go		main.go
tokenizer_debug.go		tokenizer_debug.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma 3 - Zerfoo Implementation

Features

Architecture

Core Components

Key Features

Usage

Basic Inference

Model Creation (Programmatic)

Testing

Unit Tests

Integration Tests

Test Coverage

Model Files

Dependencies

Performance

Known Issues

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gemma 3 - Zerfoo Implementation

Features

Architecture

Core Components

Key Features

Usage

Basic Inference

Model Creation (Programmatic)

Testing

Unit Tests

Integration Tests

Test Coverage

Model Files

Dependencies

Performance

Known Issues

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages