feat: implement remote gguf inspection on hugging face#39
Conversation
|
Warning Review limit reached
More reviews will be available in 13 minutes and 41 seconds. Learn how PR review limits work. To continue reviewing without waiting, enable usage-based billing in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 72 |
| Duplication | 8 |
AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Pull Request Overview
This PR successfully implements remote GGUF inspection for Hugging Face repositories using HTTP Range requests, which is a significant functional addition. However, the PR is currently not up to standards according to Codacy analysis.
There are two primary areas of concern: high complexity and potential stability issues. Specifically, fetch_huggingface_repo has seen a massive complexity increase (+37) that makes it harder to maintain. Furthermore, the RemoteFileStream implementation lacks buffer management, which could lead to high memory usage when encountering malformed headers. There are also implementation gaps in the UI where user-provided GPU utilization parameters are ignored in the comparison table logic. While error handling for gated models and missing repositories was implemented, it lacks automated test verification.
About this PR
- The PR adds handling for 401 Unauthorized (gated models) and 404 Not Found scenarios, but these paths are not covered by the current test suite. Please add unit tests to verify the behavior of the remote parser when these HTTP errors occur.
1 comment outside of the diff
src/modelinfo/cli.py
line 133🟡 MEDIUM RISK
This function is too complex (16). It handles path resolution, remote repo detection, and multiple file format logic in one block. Refactoring this into a dispatcher pattern would improve maintainability.
Test suggestions
- Remote GGUF single file header parsing via range-based streaming
- Remote GGUF multi-file (group) repository detection and metadata aggregation
- Targeting a specific quantization file within a remote Hugging Face repository
- CLI UI rendering of the GGUF quantization comparison table with VRAM estimates
- Handling of 401 Unauthorized (Gated models) and 404 Not Found (Missing repos) for GGUF paths
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Handling of 401 Unauthorized (Gated models) and 404 Not Found (Missing repos) for GGUF paths
Low confidence findings
- The UI uses a hardcoded 600MB overhead fallback for GGUF groups (line 82). This fixed value may lead to inaccurate 'Fits' calculations for models that deviate significantly from standard overhead patterns.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
| row_data = [filename, file_size_str, kv_cache_str, total_vram_str] | ||
| if show_fits: | ||
| utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0 | ||
| if utilization <= 0.90: |
There was a problem hiding this comment.
🟡 MEDIUM RISK
The 'Fits' column logic should use the 'gpu_util' parameter instead of a hardcoded 0.90 threshold to respect the user's CLI configuration.
| pass | ||
|
|
||
|
|
||
| def fetch_huggingface_repo(repo_id: str, fetch_tensors: bool = False, timeout: float = 10.0) -> Tuple[Dict[str, Any], Dict[str, Any] | None, str, float]: |
There was a problem hiding this comment.
🟡 MEDIUM RISK
Suggestion: The function fetch_huggingface_repo has grown to ~150 lines and contains several branching paths for different repository types. Refactor this by splitting the logic into smaller private helper methods (e.g., _handle_gguf_repo, _handle_safetensors_repo) for each model format detection path.
| self.url = url | ||
| self.chunk_size = chunk_size | ||
| self.timeout = timeout | ||
| self.buffer = b"" |
There was a problem hiding this comment.
🟡 MEDIUM RISK
Suggestion: The buffer in RemoteFileStream accumulates all read data and is never cleared. Since this stream is primarily used for header parsing, consider implementing a sliding window or limiting the maximum buffer size to prevent high memory usage on malformed files.
| assert len(info["tensors"]["__metadata__"]["gguf_variants"]) == 2 | ||
|
|
||
|
|
||
| def test_print_model_info_gguf_group(capsys): |
There was a problem hiding this comment.
⚪ LOW RISK
This test method exceeds the length limit (63 lines). Move the large tensors and footprint dictionary definitions into reusable pytest fixtures or a local helper function.
| except Exception: | ||
| raise |
There was a problem hiding this comment.
⚪ LOW RISK
Nitpick: This exception handler is redundant and can be removed as it only re-raises the exception.
|
|
||
| url = f"{_get_hf_endpoint()}/{real_repo_id}/resolve/main/{target_filename}" | ||
| stream = RemoteFileStream(url, timeout=timeout) | ||
| from modelinfo.parsers.gguf import parse_gguf_header |
There was a problem hiding this comment.
⚪ LOW RISK
Nitpick: The import of parse_gguf_header is duplicated multiple times. Move it to the top of the fetch_huggingface_repo function to keep the logic clean.
…er, add error tests
Summary
This PR implements remote GGUF inspection for Hugging Face repositories, enabling the CLI to parse and calculate VRAM footprints for remote GGUF weights without downloading the full files.
Motivation & Context
Prior to this change, remote inspection on Hugging Face only supported SafeTensors format. Attempting to parse GGUF repositories resulted in a crash. This change introduces support for single and multi-quantization GGUF repositories, showing a comparison table of available files when no specific quantization is targeted.
Type of Change
How Has This Been Tested?
The changes were tested using pytest:
Unit tests added to
tests/test_parsers.pyto cover remote GGUF single, group, and targeted parsing with mocked HTTP responses.Unit tests added to
tests/test_cli.pyto cover CLI routing andprint_model_infogroup rendering.Manual testing against live HF GGUF repository (
bartowski/Meta-Llama-3-8B-Instruct-GGUF).Unit tests
Integration tests
Manual testing
Screenshots (if appropriate)
N/A
Checklist