-
Notifications
You must be signed in to change notification settings - Fork 79
Unify Model Builder with Strategy Pattern #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…omPaths functions
…rmat package for GGUF and Safetensors
…th FromPath and FromPaths
…ge for GGUF and Safetensors
…ntralized files package
…onsolidating weight file types and improving error messaging
…filename handling and error logging
…essary Hugging Face handling
…for GGUF quantization
…ase conversion and simplifying tag handling
Summary of ChangesHello @ilopezluna, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the model building and handling infrastructure. It introduces a unified API for creating models from local files, abstracting away format-specific details like GGUF and Safetensors. New packages for file classification and format-specific strategies enhance modularity and extensibility. Additionally, the changes refine the HuggingFace model pulling mechanism to improve GGUF quantization selection and simplify model name normalization, leading to a more robust and maintainable system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and well-executed refactoring to unify the model builder logic. The introduction of the format and files packages provides a much cleaner and more extensible architecture for handling different model formats. The simplification of the HuggingFace pull logic and the improved model name normalization are also major improvements in terms of robustness and maintainability. Overall, this is a high-quality change that greatly improves the codebase. I have one suggestion regarding the re-implementation of standard library functions.
…porting full URL formats
…functions with standard library methods
| } | ||
|
|
||
| return nameWithOrg + ":" + tag | ||
| // Lowercase ONLY the name part (registry/org/repo). Tag stays unchanged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because HF repositories do not follow the OCI naming convention, and an HF repository name may contain uppercase letters. To simplify handling, we always convert the repository name to lowercase (but not the tag, since OCI allows tags to be either lowercase or uppercase).
…rences and streamline model name normalization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 9 issues, and left some high level feedback:
- The
isHuggingFaceReference/parseHFReferencehandling is slightly inconsistent:parseHFReferencesupportshttps://huggingface.co/...butisHuggingFaceReferencedoes not, so a full HTTPS URL tohuggingface.cowould go through the OCI path instead of native HF pulling; consider addinghttps://huggingface.co/toisHuggingFaceReferencefor symmetry. - The new
SafetensorsFormatre-implements safetensors header parsing and metadata extraction logic that already exists ininternal/safetensors; consider reusing the existingHeader/parsing code or removing the old version to avoid duplicated, diverging implementations.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `isHuggingFaceReference`/`parseHFReference` handling is slightly inconsistent: `parseHFReference` supports `https://huggingface.co/...` but `isHuggingFaceReference` does not, so a full HTTPS URL to `huggingface.co` would go through the OCI path instead of native HF pulling; consider adding `https://huggingface.co/` to `isHuggingFaceReference` for symmetry.
- The new `SafetensorsFormat` re-implements safetensors header parsing and metadata extraction logic that already exists in `internal/safetensors`; consider reusing the existing `Header`/parsing code or removing the old version to avoid duplicated, diverging implementations.
## Individual Comments
### Comment 1
<location> `pkg/distribution/huggingface/model.go:32-33` </location>
<code_context>
- // Step 2: Filter to model files (safetensors + configs)
- safetensorsFiles, configFiles := FilterModelFiles(files)
+ // Filter to model files (weights + configs)
+ weightFiles, configFiles := FilterModelFiles(files)
- if len(safetensorsFiles) == 0 {
</code_context>
<issue_to_address>
**issue:** Mixed GGUF+safetensors repos can break because GGUF selection and builder expect homogeneous formats
`FilterModelFiles` can now return a mixed set of GGUF and safetensors weights, but the downstream flow still assumes a single format:
- `isGGUFModel(weightFiles)` and `SelectGGUFFiles` assume GGUF-only input
- `builder.FromPaths` enforces a single format and will fail on mixed GGUF+safetensors
For repos that publish both formats, this can cause incorrect GGUF selection and/or a "mixed formats" error. Consider separating GGUF and safetensors earlier (either by splitting `weightFiles` into format-specific slices before selection, or by having `FilterModelFiles` return `ggufWeights` and `safetensorsWeights` explicitly) and then only passing format-homogeneous paths into the builder.
</issue_to_address>
### Comment 2
<location> `pkg/distribution/huggingface/model.go:41-43` </location>
<code_context>
+
+ // For GGUF repos with multiple quantizations, select the appropriate files
+ var mmprojFile *RepoFile
+ if isGGUFModel(weightFiles) && len(weightFiles) > 1 {
+ // Use the tag as quantization hint (e.g., "Q4_K_M", "Q8_0", or "latest")
+ weightFiles, mmprojFile = SelectGGUFFiles(weightFiles, tag)
+ if len(weightFiles) == 0 {
+ return nil, fmt.Errorf("no GGUF files found matching quantization %q in repository %s", tag, repo)
</code_context>
<issue_to_address>
**issue (bug_risk):** SelectGGUFFiles assumes GGUF-only input but is called with potentially mixed weight types
Since `SelectGGUFFiles` is GGUF-specific (it assumes GGUF and only distinguishes `mmproj` vs model files), calling it directly with `weightFiles` that may include safetensors can misbehave:
- Safetensor filenames can participate in quantization matching (`containsQuantization`) and be incorrectly chosen as the GGUF variant.
- The function’s contract becomes ambiguous: its name suggests GGUF-only handling, but callers might assume it supports any `RepoFile` slice.
Either filter `weightFiles` to GGUF entries before calling `SelectGGUFFiles`, or have `SelectGGUFFiles` itself discard non-GGUF files up front so it’s safe on mixed inputs.
</issue_to_address>
### Comment 3
<location> `pkg/distribution/distribution/client.go:677` </location>
<code_context>
// isHuggingFaceReference checks if a reference is a HuggingFace model reference
func isHuggingFaceReference(reference string) bool {
- return strings.HasPrefix(reference, "huggingface.co/")
-}
</code_context>
<issue_to_address>
**issue (bug_risk):** Hugging Face URL detection misses "https://huggingface.co/..." references
Currently this only matches bare `huggingface.co/` and misses full URLs like `https://huggingface.co/...`, which will incorrectly fall back to OCI pulls instead of the HF-native path.
To align with `parseHFReference`, consider normalizing the reference (e.g., strip `https://`) and then checking for `huggingface.co/`, or explicitly handle `https://huggingface.co/` in this helper.
</issue_to_address>
### Comment 4
<location> `pkg/distribution/huggingface/repository.go:243-248` </location>
<code_context>
-// isSafetensorsModel checks if the files contain at least one safetensors file
-func isSafetensorsModel(files []RepoFile) bool {
+// isShardedFile checks if a filename follows the sharded pattern
+// e.g., "model-00001-of-00003.gguf"
+func isShardedFile(filename string) bool {
+ // Simple check: contains "-00001-of-" or similar pattern
+ return strings.Contains(filename, "-of-") && strings.Contains(filename, "-0")
+}
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Shard detection is very loose and may produce false positives for non-shard filenames
The current heuristic:
```go
return strings.Contains(filename, "-of-") && strings.Contains(filename, "-0")
```
marks any filename with "-of-" and "-0" as sharded, even if it doesn’t match the intended `model-00001-of-00003.gguf` pattern. That can make `selectFirstGGUF`/`findAllShards` group unrelated files.
Since `indexOfShardPattern` already precisely detects the `-00001-of-00003` pattern, consider reusing it here:
```go
func isShardedFile(filename string) bool {
return indexOfShardPattern(filename) >= 0
}
```
so shard detection stays consistent with `extractShardPrefix` and avoids false positives.
```suggestion
// isShardedFile checks if a filename follows the sharded pattern
// e.g., "model-00001-of-00003.gguf"
func isShardedFile(filename string) bool {
// Delegate to indexOfShardPattern so shard detection is precise and consistent
return indexOfShardPattern(filename) >= 0
}
```
</issue_to_address>
### Comment 5
<location> `pkg/distribution/distribution/normalize_test.go:346-347` </location>
<code_context>
{"not huggingface", "registry.example.com/model:latest", false},
{"docker hub", "ai/gemma3:latest", false},
- {"hf.co prefix (not normalized)", "hf.co/org/model", false}, // This is the un-normalized form
+ {"hf.co prefix (short form)", "hf.co/org/model", true}, // Short form is also recognized
+ {"hf.co with quantization", "hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:Q4_K_M", true},
{"empty", "", false},
}
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for https-based Hugging Face references in isHuggingFaceReference
Since isHuggingFaceReference now supports full https URLs (e.g., "https://hf.co/..." and "https://huggingface.co/...."), please add corresponding cases to TestIsHuggingFaceReference (with and without tags) so these new URL forms are covered and guarded against regressions.
</issue_to_address>
### Comment 6
<location> `pkg/distribution/distribution/normalize_test.go:67` </location>
<code_context>
{
- name: "name with uppercase (not huggingface)",
- input: "MyModel",
- expected: "ai/MyModel:latest",
+ name: "name with uppercase (not huggingface)",
+ input: "MyModel",
+ expected: "ai/mymodel:latest",
},
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding normalizeModelName tests that assert tag case is preserved while name is lowercased
The current test verifies that the model name is lowercased, but doesn’t assert that tags retain their original casing. Please add tests with mixed‑case tags (e.g., "ai/model:Q4_K_M") to ensure tag case is preserved and to prevent future regressions in tag handling.
</issue_to_address>
### Comment 7
<location> `pkg/distribution/huggingface/repository_test.go:7-11` </location>
<code_context>
- }
-}
-
func TestFilterModelFiles(t *testing.T) {
- files := []RepoFile{
+ repoFiles := []RepoFile{
</code_context>
<issue_to_address>
**suggestion (testing):** Extend FilterModelFiles tests to cover GGUF, license, chat template, and non-file entries
Since FilterModelFiles now relies on files.Classify and supports GGUF, config, chat template, license, and unknown types while skipping non-"file" entries, the test should cover these cases. Please add subtests to verify:
- GGUF files are included in the weights slice
- Jinja/chat template files are included in the configs slice
- License/unknown files are skipped
- Entries with Type != "file" are ignored
Suggested implementation:
```golang
func TestFilterModelFiles(t *testing.T) {
repoFiles := []RepoFile{
{Type: "file", Path: "model.safetensors", Size: 1000},
{Type: "file", Path: "config.json", Size: 100},
{Type: "file", Path: "tokenizer.json", Size: 200},
{Type: "file", Path: "model-00002-of-00002.safetensors", Size: 2000},
// GGUF weight file should be treated as a model weight
{Type: "file", Path: "model.gguf", Size: 3000},
// Chat template / jinja file should be treated as a config
{Type: "file", Path: "chat_template.jinja", Size: 150},
// License / unknown file should be ignored
{Type: "file", Path: "LICENSE", Size: 50},
// Non-file entry should be ignored entirely
{Type: "directory", Path: "subdir/model.safetensors", Size: 500},
}
weights, configs := FilterModelFiles(repoFiles)
containsPath := func(files []RepoFile, path string) bool {
for _, f := range files {
if f.Path == path {
return true
}
}
return false
}
t.Run("includes GGUF and safetensors as weights", func(t *testing.T) {
if got, want := len(weights), 4; got != want {
t.Fatalf("expected %d weight files, got %d", want, got)
}
if !containsPath(weights, "model.safetensors") {
t.Fatalf("expected model.safetensors to be included in weights")
}
if !containsPath(weights, "model-00002-of-00002.safetensors") {
t.Fatalf("expected model-00002-of-00002.safetensors to be included in weights")
}
if !containsPath(weights, "model.gguf") {
t.Fatalf("expected model.gguf to be included in weights")
}
// ensure non-file entry isn't treated as weight
if containsPath(weights, "subdir/model.safetensors") {
t.Fatalf("did not expect non-file entry subdir/model.safetensors in weights")
}
})
t.Run("includes config, tokenizer, and chat template as configs", func(t *testing.T) {
if got, want := len(configs), 3; got != want {
t.Fatalf("expected %d config files, got %d", want, got)
}
if !containsPath(configs, "config.json") {
t.Fatalf("expected config.json to be included in configs")
}
if !containsPath(configs, "tokenizer.json") {
t.Fatalf("expected tokenizer.json to be included in configs")
}
if !containsPath(configs, "chat_template.jinja") {
t.Fatalf("expected chat_template.jinja to be included in configs")
}
})
t.Run("skips license and unknown files", func(t *testing.T) {
if containsPath(weights, "LICENSE") {
t.Fatalf("expected LICENSE to be skipped from weights")
}
if containsPath(configs, "LICENSE") {
t.Fatalf("expected LICENSE to be skipped from configs")
}
})
t.Run("ignores non-file entries", func(t *testing.T) {
if containsPath(weights, "subdir/model.safetensors") {
t.Fatalf("expected non-file subdir/model.safetensors to be ignored in weights")
}
if containsPath(configs, "subdir/model.safetensors") {
t.Fatalf("expected non-file subdir/model.safetensors to be ignored in configs")
}
})
```
This change assumes:
- `FilterModelFiles` now returns `([]RepoFile, []RepoFile)` where the first slice contains *all* weight files (including GGUF) and the second slice contains configs (including tokenizer and chat template).
- The previous `len(safetensors) != 3` assertion has been fully replaced by the new subtests; if there were additional assertions after that block, they may need to be reintroduced inside appropriate subtests.
If the actual behavior of `FilterModelFiles` differs (for example, tokenizer files are not treated as configs), adjust the expected counts and `containsPath` checks in the subtests accordingly.
</issue_to_address>
### Comment 8
<location> `pkg/distribution/huggingface/repository_gguf_test.go:7-16` </location>
<code_context>
func TestSelectGGUFFiles(t *testing.T) {
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for shard-selection helpers (selectFirstGGUF/findAllShards/indexOfShardPattern)
The new GGUF selection logic adds helpers like selectFirstGGUF, isShardedFile, findAllShards, extractShardPrefix, and indexOfShardPattern, but TestSelectGGUFFiles doesn’t exercise sharded filenames end-to-end. Please add a test with multiple sharded GGUF files (e.g. model-Q4_K_M-00001-of-00003.gguf … -00003-of-00003.gguf) to verify that all shards for the chosen quantization are selected and that selection is stable regardless of input order, so the shard-detection helpers are covered.
Suggested implementation:
```golang
func TestSelectGGUFFiles_Sharded(t *testing.T) {
// Ensure that sharded GGUF files are selected as a complete, ordered set
// for the requested quantization, regardless of input order.
files := []RepoFile{
// Intentionally out of order to verify stable shard ordering in selection.
{Filename: "model-Q4_K_M-00003-of-00003.gguf"},
{Filename: "model-Q4_K_M-00001-of-00003.gguf"},
{Filename: "model-Q4_K_M-00002-of-00003.gguf"},
// Different quantization shards should not be selected when Q4_K_M is requested.
{Filename: "model-Q5_K_M-00001-of-00003.gguf"},
{Filename: "model-Q5_K_M-00002-of-00003.gguf"},
{Filename: "model-Q5_K_M-00003-of-00003.gguf"},
// Non-GGUF files should be ignored.
{Filename: "README.md"},
{Filename: "config.json"},
}
requestedQuant := "Q4_K_M"
selected, mmproj := selectGGUFFiles(files, requestedQuant)
if mmproj != nil {
t.Fatalf("expected no mmproj file for sharded-only selection, got %v", mmproj)
}
var selectedNames []string
for _, f := range selected {
// Adjust field access if RepoFile uses a different name (e.g. Path or Name).
selectedNames = append(selectedNames, f.Filename)
}
expectedNames := []string{
"model-Q4_K_M-00001-of-00003.gguf",
"model-Q4_K_M-00002-of-00003.gguf",
"model-Q4_K_M-00003-of-00003.gguf",
}
if !reflect.DeepEqual(selectedNames, expectedNames) {
t.Fatalf("expected selected GGUF shards %v, got %v", expectedNames, selectedNames)
}
}
func TestSelectGGUFFiles(t *testing.T) {
```
1. Ensure `reflect` is imported at the top of the file:
- Add `import "reflect"` or extend the existing import block to include `"reflect"`.
2. Adjust the `RepoFile` field access if it uses a different property name than `Filename` (for example, `Path`, `Name`, or `FileName`).
3. If `selectGGUFFiles` has a different signature than `func selectGGUFFiles(files []RepoFile, requestedQuant string) ([]RepoFile, *RepoFile)`, update the call and assertions to match the actual return types.
4. If the repository uses helper assertion functions (e.g. `require`/`assert` from `testify`), you may want to replace the `t.Fatalf` calls with those helpers to match existing test style.
</issue_to_address>
### Comment 9
<location> `pkg/distribution/format/format_test.go:81-18` </location>
<code_context>
func TestDetectFromPaths(t *testing.T) {
</code_context>
<issue_to_address>
**suggestion (testing):** Add targeted tests for DiscoverShards behavior (Safetensors and GGUF) beyond format detection
These tests cover detection helpers well, but they don’t exercise the format-specific `DiscoverShards` logic. Please add targeted tests that:
- For Safetensors: use a partial shard set (e.g., only `model-00001-of-00002.safetensors`) and assert that `DiscoverShards` errors on missing shards, plus a complete set case to verify the returned shards are sorted.
- For GGUF: add a smoke test that `DiscoverShards` on a non-sharded `.gguf` file returns a single-element slice.
You can use empty temp files; no real model contents are needed.
Suggested implementation:
```golang
}
}
func TestDiscoverShardsSafetensors(t *testing.T) {
t.Run("partial shard set errors", func(t *testing.T) {
dir := t.TempDir()
// Only create the first shard of two
firstShard := filepath.Join(dir, "model-00001-of-00002.safetensors")
if err := os.WriteFile(firstShard, []byte{}, 0o644); err != nil {
t.Fatalf("failed to create shard file: %v", err)
}
shards, err := DiscoverShards(context.Background(), []string{firstShard})
if err == nil {
t.Fatalf("expected error for missing shards, got none (shards: %#v)", shards)
}
})
t.Run("complete shard set is sorted", func(t *testing.T) {
dir := t.TempDir()
// Create shards in unsorted order
firstShard := filepath.Join(dir, "model-00001-of-00002.safetensors")
secondShard := filepath.Join(dir, "model-00002-of-00002.safetensors")
if err := os.WriteFile(secondShard, []byte{}, 0o644); err != nil {
t.Fatalf("failed to create second shard file: %v", err)
}
if err := os.WriteFile(firstShard, []byte{}, 0o644); err != nil {
t.Fatalf("failed to create first shard file: %v", err)
}
shards, err := DiscoverShards(context.Background(), []string{firstShard, secondShard})
if err != nil {
t.Fatalf("unexpected error discovering shards: %v", err)
}
if len(shards) != 2 {
t.Fatalf("expected 2 shards, got %d", len(shards))
}
// Ensure the returned shard paths are sorted by shard index
expected := []string{firstShard, secondShard}
if !reflect.DeepEqual(shards, expected) {
t.Fatalf("expected sorted shards %v, got %v", expected, shards)
}
})
}
func TestDiscoverShardsGGUF(t *testing.T) {
dir := t.TempDir()
ggufPath := filepath.Join(dir, "model.gguf")
if err := os.WriteFile(ggufPath, []byte{}, 0o644); err != nil {
t.Fatalf("failed to create gguf file: %v", err)
}
shards, err := DiscoverShards(context.Background(), []string{ggufPath})
if err != nil {
t.Fatalf("unexpected error discovering shards: %v", err)
}
if len(shards) != 1 {
t.Fatalf("expected 1 shard for non-sharded gguf, got %d", len(shards))
}
if shards[0] != ggufPath {
t.Fatalf("expected shard path %q, got %q", ggufPath, shards[0])
}
}
func TestDetectFromPaths(t *testing.T) {
```
To make these tests compile and match your actual APIs, you will likely need to:
1. Ensure `DiscoverShards` has a compatible signature, or adjust the calls:
- These tests currently assume: `func DiscoverShards(ctx context.Context, paths []string) ([]string, error)`.
- If your actual function returns a different shard type (e.g., `[]Shard`), update the tests to compare on the appropriate field (e.g., `shards[i].Path`).
- If `DiscoverShards` is a method on a format type (e.g., `SafetensorsFormat.DiscoverShards`), first construct/select the appropriate format instance and then call the method.
2. Add the necessary imports at the top of `format_test.go`:
- `context`
- `os`
- `path/filepath`
- `reflect`
For example:
```go
import (
"context"
"os"
"path/filepath"
"reflect"
"testing"
"github.com/your/module/pkg/distribution/types"
)
```
(Keep the existing imports and adjust module path as appropriate.)
3. If your shard discovery for safetensors expects to infer missing shards from a directory rather than the explicit `paths` slice:
- Change the tests to call `DiscoverShards` with just the directory (or whatever input it expects).
- In that case, only the first shard path may be needed as a starting point; adapt the test setup accordingly while preserving the assertions:
- Partial case: only one `model-00001-of-00002.safetensors` file in the directory must cause an error.
- Complete case: both shards exist, and the returned slice respects index order.
4. If GGUF discovery uses format detection internally, you might want to keep the test’s `paths` argument as-is but verify that the function is indeed selecting GGUF-specific logic (e.g., by asserting it does not attempt to expand into multiple shards).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
…ving full URL support
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
|
Normally not a fan of design patterns. I have seen forced design pattern too many times over-complicate and over-engineer a codebase, but lets see where this goes. |
Actually, using the Strategy pattern was not my initial idea. I was initially looking to simplify the process, remove duplicated code, and use Hugging Face’s native pulling for GGUF. However, I then realized that the refactor would be easier if we used common interfaces for both GGUF and Safetensors, which naturally led to the Strategy pattern. I think the result is much better in terms of complexity and readability, but I’m happy to discuss alternatives as well. |
Refactors model format handling by introducing a unified format package using the Strategy Pattern, consolidating GGUF and Safetensors processing under a single interface.
It also updates the pull GGUF models from Hugging Face to use the native pulling so it simplifies the process of pulling models from OCI registries:
You can try it out by running: