Skip to content

mlx-community/Qwen2-VL-2B-Instruct-4bit fails, out of memory, with larger images #602

@jrp2014

Description

@jrp2014

The following compares the result of asking mlx-community/Qwen2-VL-2B-Instruct-4bit to describe an image. It does a good enough job on the 2.1Mpx image, but runs out of memory for the 46.7mpx one. (Same result if I use the 2.1Mpx run prompt.) Other models run against the same 46.7mpx image without running out of memory.

 % python check_models.py -m mlx-community/Qwen2-VL-2B-Instruct-4bit -i ~/Downloads/National_monument_Edinburgh.jpg 
2025-11-25 20:57:26,844 - 📝 Full environment dump written to:
2025-11-25 20:57:26,844 - /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:57:26,844 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,844 - ⚠️  SECURITY WARNING: --trust-remote-code is enabled.
2025-11-25 20:57:26,844 - ⚠️  This allows execution of remote code and may pose security risks.
2025-11-25 20:57:26,844 - ====================================================================================================
2025-11-25 20:57:26,844 -                                   MLX Vision Language Model Check                                   
2025-11-25 20:57:26,844 - ====================================================================================================
2025-11-25 20:57:26,844 - Image File:      /Users/jrp/Downloads/National_monument_Edinburgh.jpg
2025-11-25 20:57:26,857 - Image dimensions: 1080x1920 (2.1 MPixels)
2025-11-25 20:57:26,857 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,857 - ▶ [ IMAGE METADATA ]
2025-11-25 20:57:26,860 - Date: 2010-09-30 12:58:31 BST
2025-11-25 20:57:26,860 - Description: 
2025-11-25 20:57:26,860 - GPS Location: 
2025-11-25 20:57:26,860 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,860 - ▶ [ PROMPT CONFIGURATION ]
2025-11-25 20:57:26,860 - Generating default prompt based on image metadata.
2025-11-25 20:57:26,860 - Final prompt: Provide a factual caption, description, and keywords suitable for cataloguing, or searching for, the image. 

The photo was taken around 2010-09-30 12:58:31 BST . Focus on visual content, drawing on a...
2025-11-25 20:57:26,860 - Processing specified models: mlx-community/Qwen2-VL-2B-Instruct-4bit
2025-11-25 20:57:26,860 - Processing 1 model(s)...
2025-11-25 20:57:26,860 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,860 - 
2025-11-25 20:57:26,860 - Processing Model [1/1]: mlx-community/Qwen2-VL-2B-Instruct-4bit
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 28799.84it/s]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
2025-11-25 20:57:32,419 - [RUN 1/1] SUMMARY model=mlx-community/Qwen2-VL-2B-Instruct-4bit status=OK
2025-11-25 20:57:32,419 - The image depicts the National Monument of Scotland, a historic monument located on Calton Hill in
2025-11-25 20:57:32,419 - Edinburgh, Scotland. The monument was constructed in the late 19th century and features a series of
2025-11-25 20:57:32,419 - columns arranged in a symmetrical pattern. The photograph was taken in 2010, and the surrounding
2025-11-25 20:57:32,419 - area is visible, including a cityscape and a forested area.
2025-11-25 20:57:32,419 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,419 - ▶ [ PERFORMANCE SUMMARY ]
2025-11-25 20:57:32,420 - Model                                      Token      Prompt    Generation     Total    Prompt    Gen    Peak    Generation    Load    Total  Quality    Output
2025-11-25 20:57:32,420 - Name                                                (tokens)      (tokens)    Tokens       Tps    TPS    (GB)           (s)     (s)      (s)  Issues
2025-11-25 20:57:32,420 - mlx-community/Qwen2-VL-2B-Instruct-4bit  151,645       2,779            76     2,855       670    225     5.9         4.53s   0.86s    5.39s             The image depicts the National Monument of
2025-11-25 20:57:32,420 -                                                                                                                                                          Scotland, a historic monument located on Calton
2025-11-25 20:57:32,420 -                                                                                                                                                          Hill in Edinburgh, Scotland. The monument was
2025-11-25 20:57:32,420 -                                                                                                                                                          constructed in the late 19th century and features
2025-11-25 20:57:32,420 -                                                                                                                                                          a series of columns arranged in a symmetrical
2025-11-25 20:57:32,421 -                                                                                                                                                          pattern. The photograph was taken in 2010, and the
2025-11-25 20:57:32,421 -                                                                                                                                                          surrounding area is visible, including a cityscape
2025-11-25 20:57:32,421 -                                                                                                                                                          and a forested area.
2025-11-25 20:57:32,421 - 
2025-11-25 20:57:32,421 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,421 - Results Summary
2025-11-25 20:57:32,421 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,421 - 🏆 Performance Highlights:
2025-11-25 20:57:32,421 -    Fastest: mlx-community/Qwen2-VL-2B-Instruct-4bit (225.1 tps)
2025-11-25 20:57:32,421 -    💾 Most efficient: mlx-community/Qwen2-VL-2B-Instruct-4bit (5.9 GB)
2025-11-25 20:57:32,421 -    ⚡ Fastest load: mlx-community/Qwen2-VL-2B-Instruct-4bit (0.00s)
2025-11-25 20:57:32,421 -    📊 Average TPS: 225.1 across 1 models
2025-11-25 20:57:32,421 - 
2025-11-25 20:57:32,421 - 📈 Resource Usage:
2025-11-25 20:57:32,421 -    Total peak memory: 5.9 GB
2025-11-25 20:57:32,421 -    Average peak memory: 5.9 GB
2025-11-25 20:57:32,421 -    Memory efficiency: 481 tokens/GB
2025-11-25 20:57:32,421 - 
2025-11-25 20:57:32,421 - ✅ Successful Models (1):
2025-11-25 20:57:32,421 -   - mlx-community/Qwen2-VL-2B-Instruct-4bit: 225.1 tps (Active: 1.2GB, Cache: 0.1GB)
2025-11-25 20:57:32,781 - 
2025-11-25 20:57:32,781 - 📊 Reports successfully generated:
2025-11-25 20:57:32,781 -    HTML Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.html
2025-11-25 20:57:32,781 -    Markdown Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.md
2025-11-25 20:57:32,781 -    TSV Report:    /Users/jrp/Documents/AI/mlx/check_models/src/output/results.tsv
2025-11-25 20:57:32,781 -    JSONL Report:  /Users/jrp/Documents/AI/mlx/check_models/src/output/results.jsonl
2025-11-25 20:57:32,781 -    Log File: /Users/jrp/Documents/AI/mlx/check_models/src/output/check_models.log
2025-11-25 20:57:32,781 -    Environment: /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:57:32,781 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,781 - ▶ [ FINAL SUMMARY ]
2025-11-25 20:57:32,781 - 
2025-11-25 20:57:32,781 - ⏱  Overall runtime: 6.97s
2025-11-25 20:57:32,781 - --- Library Versions ---
2025-11-25 20:57:32,781 - Pillow          : 12.0.0
2025-11-25 20:57:32,781 - huggingface-hub : 0.36.0
2025-11-25 20:57:32,781 - mlx             : 0.30.1.dev20251125+c9f4dc85
2025-11-25 20:57:32,781 - mlx-lm          : 0.28.4
2025-11-25 20:57:32,781 - mlx-vlm         : 0.3.7
2025-11-25 20:57:32,781 - tokenizers      : 0.22.1
2025-11-25 20:57:32,781 - transformers    : 4.57.3
2025-11-25 20:57:32,781 - Generated: 2025-11-25 20:57:32 GMT
2025-11-25 20:57:33,005 - 
2025-11-25 20:57:33,005 - --- System Information ---
2025-11-25 20:57:33,006 - OS                  : Darwin 25.1.0
2025-11-25 20:57:33,006 - macOS Version       : 26.1
2025-11-25 20:57:33,006 - SDK Version         : 26.1
2025-11-25 20:57:33,006 - Python Version      : 3.13.9
2025-11-25 20:57:33,006 - Architecture        : arm64
2025-11-25 20:57:33,006 - GPU/Chip            : Apple M4 Max
2025-11-25 20:57:33,006 - GPU Cores           : 40
2025-11-25 20:57:33,006 - Metal Support       : Metal 4
2025-11-25 20:57:33,006 - RAM                 : 128.0 GB
2025-11-25 20:57:33,006 - CPU Cores (Physical): 16
2025-11-25 20:57:33,006 - CPU Cores (Logical) : 16
(mlx-vlm) jrp@Johns-MacBook-Pro src % python check_models.py -m mlx-community/Qwen2-VL-2B-Instruct-4bit                                               
2025-11-25 20:58:30,717 - 📝 Full environment dump written to:
2025-11-25 20:58:30,717 - /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:58:30,718 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,718 - ⚠️  SECURITY WARNING: --trust-remote-code is enabled.
2025-11-25 20:58:30,718 - ⚠️  This allows execution of remote code and may pose security risks.
2025-11-25 20:58:30,718 - ====================================================================================================
2025-11-25 20:58:30,718 -                                   MLX Vision Language Model Check                                   
2025-11-25 20:58:30,718 - ====================================================================================================
2025-11-25 20:58:30,718 - Scanning folder: /Users/jrp/Pictures/Processed
2025-11-25 20:58:30,766 - Image File:      /Users/jrp/Pictures/Processed/20251115-161928_DSC07372.jpg
2025-11-25 20:58:30,771 - Image dimensions: 8640x5400 (46.7 MPixels)
2025-11-25 20:58:30,771 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,771 - ▶ [ IMAGE METADATA ]
2025-11-25 20:58:30,772 - Date: 2025-11-15 16:19:28 GMT
2025-11-25 20:58:30,772 - Description: , White Rock, Hastings, England, United Kingdom, UK
Here is a caption for the image, written for the specified audience and context:

A damp, late autumn afternoon along the seafront at White Rock in Hastings, England. The low-angle perspective highlights the rain-slicked paving stones, which reflect the cool, overcast sky as dusk begins to fall. In the background, a row of hotels and apartments, typical of a British seaside town, lines the road. The motion blur of a passing car's taillights adds a streak of colour and a sense of movement to the otherwise quiet, atmospheric scene. This area was developed significantly during the Victorian era as Hastings grew into a popular coastal resort, and this image captures a modern, tranquil moment outside the peak tourist season.
2025-11-25 20:58:30,772 - GPS Location: 
2025-11-25 20:58:30,772 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,772 - ▶ [ PROMPT CONFIGURATION ]
2025-11-25 20:58:30,772 - Generating default prompt based on image metadata.
2025-11-25 20:58:30,772 - Final prompt: Provide a factual caption, description, and keywords suitable for cataloguing, or searching for, the image. 

Context: The image relates to ', White Rock, Hastings, England, United Kingdom, UK
Here is...
2025-11-25 20:58:30,772 - Processing specified models: mlx-community/Qwen2-VL-2B-Instruct-4bit
2025-11-25 20:58:30,772 - Processing 1 model(s)...
2025-11-25 20:58:30,772 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,772 - 
2025-11-25 20:58:30,772 - Processing Model [1/1]: mlx-community/Qwen2-VL-2B-Instruct-4bit
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 34251.93it/s]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
2025-11-25 20:58:32,235 - Runtime error for mlx-community/Qwen2-VL-2B-Instruct-4bit
Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/check_models/src/check_models.py", line 4655, in _run_model_generation
    output: GenerationResult | SupportsGenerationResult = generate(
                                                          ~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<13 lines>...
        max_tokens=params.max_tokens,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 539, in generate
    for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
                    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 429, in stream_generate
    for n, (token, logprobs) in enumerate(
                                ~~~~~~~~~^
        generate_step(input_ids, model, pixel_values, mask, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 324, in generate_step
    mx.async_eval(y)
    ~~~~~~~~~~~~~^^^
RuntimeError: [metal::malloc] Attempting to allocate 135383101952 bytes which is greater than the maximum allowed buffer size of 86586540032 bytes.
2025-11-25 20:58:32,254 - [RUN 1/1] SUMMARY model=mlx-community/Qwen2-VL-2B-Instruct-4bit status=FAIL stage=OOM
2025-11-25 20:58:32,254 - 
2025-11-25 20:58:32,254 - 
2025-11-25 20:58:32,254 - Error: Model runtime error during generation for mlx-community/Qwen2-VL-2B-Instruct-4bit: [metal::malloc] Attempting to allocate 135383101952 bytes which is greater than the maximum allowed buffer size of 86586540032 bytes.
2025-11-25 20:58:32,254 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,254 - ▶ [ PERFORMANCE SUMMARY ]
2025-11-25 20:58:32,255 - Model                                      Generation    Load    Total  Quality    Output
2025-11-25 20:58:32,255 - Name                                              (s)     (s)      (s)  Issues
2025-11-25 20:58:32,255 - mlx-community/Qwen2-VL-2B-Instruct-4bit                                            Error: OOM - Model runtime error during generation
2025-11-25 20:58:32,255 -                                                                                    for mlx-community/Qwen2-VL-2B-Instruct-4bit:
2025-11-25 20:58:32,255 -                                                                                    [metal::malloc] Attempting to allocate
2025-11-25 20:58:32,255 -                                                                                    135383101952 bytes which is greater than the
2025-11-25 20:58:32,255 -                                                                                    maximum allowed buffer size of 86586540032 bytes.
2025-11-25 20:58:32,255 - 
2025-11-25 20:58:32,255 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,255 - Results Summary
2025-11-25 20:58:32,255 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,255 - ❌ Failed Models (1):
2025-11-25 20:58:32,255 -   - mlx-community/Qwen2-VL-2B-Instruct-4bit (OOM)
2025-11-25 20:58:32,255 - 
2025-11-25 20:58:32,846 - 
2025-11-25 20:58:32,846 - 📊 Reports successfully generated:
2025-11-25 20:58:32,847 -    HTML Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.html
2025-11-25 20:58:32,847 -    Markdown Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.md
2025-11-25 20:58:32,847 -    TSV Report:    /Users/jrp/Documents/AI/mlx/check_models/src/output/results.tsv
2025-11-25 20:58:32,847 -    JSONL Report:  /Users/jrp/Documents/AI/mlx/check_models/src/output/results.jsonl
2025-11-25 20:58:32,847 -    Log File: /Users/jrp/Documents/AI/mlx/check_models/src/output/check_models.log
2025-11-25 20:58:32,847 -    Environment: /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:58:32,847 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,847 - ▶ [ FINAL SUMMARY ]
2025-11-25 20:58:32,847 - 
2025-11-25 20:58:32,847 - ⏱  Overall runtime: 2.61s
2025-11-25 20:58:32,847 - --- Library Versions ---
2025-11-25 20:58:32,847 - Pillow          : 12.0.0
2025-11-25 20:58:32,847 - huggingface-hub : 0.36.0
2025-11-25 20:58:32,847 - mlx             : 0.30.1.dev20251125+c9f4dc85
2025-11-25 20:58:32,847 - mlx-lm          : 0.28.4
2025-11-25 20:58:32,847 - mlx-vlm         : 0.3.7
2025-11-25 20:58:32,847 - tokenizers      : 0.22.1
2025-11-25 20:58:32,847 - transformers    : 4.57.3
2025-11-25 20:58:32,847 - Generated: 2025-11-25 20:58:32 GMT
2025-11-25 20:58:33,062 - 
2025-11-25 20:58:33,062 - --- System Information ---
2025-11-25 20:58:33,062 - OS                  : Darwin 25.1.0
2025-11-25 20:58:33,062 - macOS Version       : 26.1
2025-11-25 20:58:33,062 - SDK Version         : 26.1
2025-11-25 20:58:33,062 - Python Version      : 3.13.9
2025-11-25 20:58:33,062 - Architecture        : arm64
2025-11-25 20:58:33,062 - GPU/Chip            : Apple M4 Max
2025-11-25 20:58:33,062 - GPU Cores           : 40
2025-11-25 20:58:33,062 - Metal Support       : Metal 4
2025-11-25 20:58:33,062 - RAM                 : 128.0 GB
2025-11-25 20:58:33,062 - CPU Cores (Physical): 16
2025-11-25 20:58:33,062 - CPU Cores (Logical) : 16

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions