The following compares the result of asking mlx-community/Qwen2-VL-2B-Instruct-4bit to describe an image. It does a good enough job on the 2.1Mpx image, but runs out of memory for the 46.7mpx one. (Same result if I use the 2.1Mpx run prompt.) Other models run against the same 46.7mpx image without running out of memory.
% python check_models.py -m mlx-community/Qwen2-VL-2B-Instruct-4bit -i ~/Downloads/National_monument_Edinburgh.jpg
2025-11-25 20:57:26,844 - 📝 Full environment dump written to:
2025-11-25 20:57:26,844 - /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:57:26,844 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,844 - ⚠️ SECURITY WARNING: --trust-remote-code is enabled.
2025-11-25 20:57:26,844 - ⚠️ This allows execution of remote code and may pose security risks.
2025-11-25 20:57:26,844 - ====================================================================================================
2025-11-25 20:57:26,844 - MLX Vision Language Model Check
2025-11-25 20:57:26,844 - ====================================================================================================
2025-11-25 20:57:26,844 - Image File: /Users/jrp/Downloads/National_monument_Edinburgh.jpg
2025-11-25 20:57:26,857 - Image dimensions: 1080x1920 (2.1 MPixels)
2025-11-25 20:57:26,857 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,857 - ▶ [ IMAGE METADATA ]
2025-11-25 20:57:26,860 - Date: 2010-09-30 12:58:31 BST
2025-11-25 20:57:26,860 - Description:
2025-11-25 20:57:26,860 - GPS Location:
2025-11-25 20:57:26,860 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,860 - ▶ [ PROMPT CONFIGURATION ]
2025-11-25 20:57:26,860 - Generating default prompt based on image metadata.
2025-11-25 20:57:26,860 - Final prompt: Provide a factual caption, description, and keywords suitable for cataloguing, or searching for, the image.
The photo was taken around 2010-09-30 12:58:31 BST . Focus on visual content, drawing on a...
2025-11-25 20:57:26,860 - Processing specified models: mlx-community/Qwen2-VL-2B-Instruct-4bit
2025-11-25 20:57:26,860 - Processing 1 model(s)...
2025-11-25 20:57:26,860 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:26,860 -
2025-11-25 20:57:26,860 - Processing Model [1/1]: mlx-community/Qwen2-VL-2B-Instruct-4bit
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 28799.84it/s]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
2025-11-25 20:57:32,419 - [RUN 1/1] SUMMARY model=mlx-community/Qwen2-VL-2B-Instruct-4bit status=OK
2025-11-25 20:57:32,419 - The image depicts the National Monument of Scotland, a historic monument located on Calton Hill in
2025-11-25 20:57:32,419 - Edinburgh, Scotland. The monument was constructed in the late 19th century and features a series of
2025-11-25 20:57:32,419 - columns arranged in a symmetrical pattern. The photograph was taken in 2010, and the surrounding
2025-11-25 20:57:32,419 - area is visible, including a cityscape and a forested area.
2025-11-25 20:57:32,419 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,419 - ▶ [ PERFORMANCE SUMMARY ]
2025-11-25 20:57:32,420 - Model Token Prompt Generation Total Prompt Gen Peak Generation Load Total Quality Output
2025-11-25 20:57:32,420 - Name (tokens) (tokens) Tokens Tps TPS (GB) (s) (s) (s) Issues
2025-11-25 20:57:32,420 - mlx-community/Qwen2-VL-2B-Instruct-4bit 151,645 2,779 76 2,855 670 225 5.9 4.53s 0.86s 5.39s The image depicts the National Monument of
2025-11-25 20:57:32,420 - Scotland, a historic monument located on Calton
2025-11-25 20:57:32,420 - Hill in Edinburgh, Scotland. The monument was
2025-11-25 20:57:32,420 - constructed in the late 19th century and features
2025-11-25 20:57:32,420 - a series of columns arranged in a symmetrical
2025-11-25 20:57:32,421 - pattern. The photograph was taken in 2010, and the
2025-11-25 20:57:32,421 - surrounding area is visible, including a cityscape
2025-11-25 20:57:32,421 - and a forested area.
2025-11-25 20:57:32,421 -
2025-11-25 20:57:32,421 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,421 - Results Summary
2025-11-25 20:57:32,421 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,421 - 🏆 Performance Highlights:
2025-11-25 20:57:32,421 - Fastest: mlx-community/Qwen2-VL-2B-Instruct-4bit (225.1 tps)
2025-11-25 20:57:32,421 - 💾 Most efficient: mlx-community/Qwen2-VL-2B-Instruct-4bit (5.9 GB)
2025-11-25 20:57:32,421 - ⚡ Fastest load: mlx-community/Qwen2-VL-2B-Instruct-4bit (0.00s)
2025-11-25 20:57:32,421 - 📊 Average TPS: 225.1 across 1 models
2025-11-25 20:57:32,421 -
2025-11-25 20:57:32,421 - 📈 Resource Usage:
2025-11-25 20:57:32,421 - Total peak memory: 5.9 GB
2025-11-25 20:57:32,421 - Average peak memory: 5.9 GB
2025-11-25 20:57:32,421 - Memory efficiency: 481 tokens/GB
2025-11-25 20:57:32,421 -
2025-11-25 20:57:32,421 - ✅ Successful Models (1):
2025-11-25 20:57:32,421 - - mlx-community/Qwen2-VL-2B-Instruct-4bit: 225.1 tps (Active: 1.2GB, Cache: 0.1GB)
2025-11-25 20:57:32,781 -
2025-11-25 20:57:32,781 - 📊 Reports successfully generated:
2025-11-25 20:57:32,781 - HTML Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.html
2025-11-25 20:57:32,781 - Markdown Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.md
2025-11-25 20:57:32,781 - TSV Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.tsv
2025-11-25 20:57:32,781 - JSONL Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.jsonl
2025-11-25 20:57:32,781 - Log File: /Users/jrp/Documents/AI/mlx/check_models/src/output/check_models.log
2025-11-25 20:57:32,781 - Environment: /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:57:32,781 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:57:32,781 - ▶ [ FINAL SUMMARY ]
2025-11-25 20:57:32,781 -
2025-11-25 20:57:32,781 - ⏱ Overall runtime: 6.97s
2025-11-25 20:57:32,781 - --- Library Versions ---
2025-11-25 20:57:32,781 - Pillow : 12.0.0
2025-11-25 20:57:32,781 - huggingface-hub : 0.36.0
2025-11-25 20:57:32,781 - mlx : 0.30.1.dev20251125+c9f4dc85
2025-11-25 20:57:32,781 - mlx-lm : 0.28.4
2025-11-25 20:57:32,781 - mlx-vlm : 0.3.7
2025-11-25 20:57:32,781 - tokenizers : 0.22.1
2025-11-25 20:57:32,781 - transformers : 4.57.3
2025-11-25 20:57:32,781 - Generated: 2025-11-25 20:57:32 GMT
2025-11-25 20:57:33,005 -
2025-11-25 20:57:33,005 - --- System Information ---
2025-11-25 20:57:33,006 - OS : Darwin 25.1.0
2025-11-25 20:57:33,006 - macOS Version : 26.1
2025-11-25 20:57:33,006 - SDK Version : 26.1
2025-11-25 20:57:33,006 - Python Version : 3.13.9
2025-11-25 20:57:33,006 - Architecture : arm64
2025-11-25 20:57:33,006 - GPU/Chip : Apple M4 Max
2025-11-25 20:57:33,006 - GPU Cores : 40
2025-11-25 20:57:33,006 - Metal Support : Metal 4
2025-11-25 20:57:33,006 - RAM : 128.0 GB
2025-11-25 20:57:33,006 - CPU Cores (Physical): 16
2025-11-25 20:57:33,006 - CPU Cores (Logical) : 16
(mlx-vlm) jrp@Johns-MacBook-Pro src % python check_models.py -m mlx-community/Qwen2-VL-2B-Instruct-4bit
2025-11-25 20:58:30,717 - 📝 Full environment dump written to:
2025-11-25 20:58:30,717 - /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:58:30,718 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,718 - ⚠️ SECURITY WARNING: --trust-remote-code is enabled.
2025-11-25 20:58:30,718 - ⚠️ This allows execution of remote code and may pose security risks.
2025-11-25 20:58:30,718 - ====================================================================================================
2025-11-25 20:58:30,718 - MLX Vision Language Model Check
2025-11-25 20:58:30,718 - ====================================================================================================
2025-11-25 20:58:30,718 - Scanning folder: /Users/jrp/Pictures/Processed
2025-11-25 20:58:30,766 - Image File: /Users/jrp/Pictures/Processed/20251115-161928_DSC07372.jpg
2025-11-25 20:58:30,771 - Image dimensions: 8640x5400 (46.7 MPixels)
2025-11-25 20:58:30,771 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,771 - ▶ [ IMAGE METADATA ]
2025-11-25 20:58:30,772 - Date: 2025-11-15 16:19:28 GMT
2025-11-25 20:58:30,772 - Description: , White Rock, Hastings, England, United Kingdom, UK
Here is a caption for the image, written for the specified audience and context:
A damp, late autumn afternoon along the seafront at White Rock in Hastings, England. The low-angle perspective highlights the rain-slicked paving stones, which reflect the cool, overcast sky as dusk begins to fall. In the background, a row of hotels and apartments, typical of a British seaside town, lines the road. The motion blur of a passing car's taillights adds a streak of colour and a sense of movement to the otherwise quiet, atmospheric scene. This area was developed significantly during the Victorian era as Hastings grew into a popular coastal resort, and this image captures a modern, tranquil moment outside the peak tourist season.
2025-11-25 20:58:30,772 - GPS Location:
2025-11-25 20:58:30,772 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,772 - ▶ [ PROMPT CONFIGURATION ]
2025-11-25 20:58:30,772 - Generating default prompt based on image metadata.
2025-11-25 20:58:30,772 - Final prompt: Provide a factual caption, description, and keywords suitable for cataloguing, or searching for, the image.
Context: The image relates to ', White Rock, Hastings, England, United Kingdom, UK
Here is...
2025-11-25 20:58:30,772 - Processing specified models: mlx-community/Qwen2-VL-2B-Instruct-4bit
2025-11-25 20:58:30,772 - Processing 1 model(s)...
2025-11-25 20:58:30,772 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:30,772 -
2025-11-25 20:58:30,772 - Processing Model [1/1]: mlx-community/Qwen2-VL-2B-Instruct-4bit
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 34251.93it/s]
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
2025-11-25 20:58:32,235 - Runtime error for mlx-community/Qwen2-VL-2B-Instruct-4bit
Traceback (most recent call last):
File "/Users/jrp/Documents/AI/mlx/check_models/src/check_models.py", line 4655, in _run_model_generation
output: GenerationResult | SupportsGenerationResult = generate(
~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<13 lines>...
max_tokens=params.max_tokens,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 539, in generate
for response in stream_generate(model, processor, prompt, image, audio, **kwargs):
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 429, in stream_generate
for n, (token, logprobs) in enumerate(
~~~~~~~~~^
generate_step(input_ids, model, pixel_values, mask, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/generate.py", line 324, in generate_step
mx.async_eval(y)
~~~~~~~~~~~~~^^^
RuntimeError: [metal::malloc] Attempting to allocate 135383101952 bytes which is greater than the maximum allowed buffer size of 86586540032 bytes.
2025-11-25 20:58:32,254 - [RUN 1/1] SUMMARY model=mlx-community/Qwen2-VL-2B-Instruct-4bit status=FAIL stage=OOM
2025-11-25 20:58:32,254 -
2025-11-25 20:58:32,254 -
2025-11-25 20:58:32,254 - Error: Model runtime error during generation for mlx-community/Qwen2-VL-2B-Instruct-4bit: [metal::malloc] Attempting to allocate 135383101952 bytes which is greater than the maximum allowed buffer size of 86586540032 bytes.
2025-11-25 20:58:32,254 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,254 - ▶ [ PERFORMANCE SUMMARY ]
2025-11-25 20:58:32,255 - Model Generation Load Total Quality Output
2025-11-25 20:58:32,255 - Name (s) (s) (s) Issues
2025-11-25 20:58:32,255 - mlx-community/Qwen2-VL-2B-Instruct-4bit Error: OOM - Model runtime error during generation
2025-11-25 20:58:32,255 - for mlx-community/Qwen2-VL-2B-Instruct-4bit:
2025-11-25 20:58:32,255 - [metal::malloc] Attempting to allocate
2025-11-25 20:58:32,255 - 135383101952 bytes which is greater than the
2025-11-25 20:58:32,255 - maximum allowed buffer size of 86586540032 bytes.
2025-11-25 20:58:32,255 -
2025-11-25 20:58:32,255 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,255 - Results Summary
2025-11-25 20:58:32,255 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,255 - ❌ Failed Models (1):
2025-11-25 20:58:32,255 - - mlx-community/Qwen2-VL-2B-Instruct-4bit (OOM)
2025-11-25 20:58:32,255 -
2025-11-25 20:58:32,846 -
2025-11-25 20:58:32,846 - 📊 Reports successfully generated:
2025-11-25 20:58:32,847 - HTML Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.html
2025-11-25 20:58:32,847 - Markdown Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.md
2025-11-25 20:58:32,847 - TSV Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.tsv
2025-11-25 20:58:32,847 - JSONL Report: /Users/jrp/Documents/AI/mlx/check_models/src/output/results.jsonl
2025-11-25 20:58:32,847 - Log File: /Users/jrp/Documents/AI/mlx/check_models/src/output/check_models.log
2025-11-25 20:58:32,847 - Environment: /Users/jrp/Documents/AI/mlx/check_models/src/output/environment.log
2025-11-25 20:58:32,847 - ────────────────────────────────────────────────────────────────────────────────────────────────────
2025-11-25 20:58:32,847 - ▶ [ FINAL SUMMARY ]
2025-11-25 20:58:32,847 -
2025-11-25 20:58:32,847 - ⏱ Overall runtime: 2.61s
2025-11-25 20:58:32,847 - --- Library Versions ---
2025-11-25 20:58:32,847 - Pillow : 12.0.0
2025-11-25 20:58:32,847 - huggingface-hub : 0.36.0
2025-11-25 20:58:32,847 - mlx : 0.30.1.dev20251125+c9f4dc85
2025-11-25 20:58:32,847 - mlx-lm : 0.28.4
2025-11-25 20:58:32,847 - mlx-vlm : 0.3.7
2025-11-25 20:58:32,847 - tokenizers : 0.22.1
2025-11-25 20:58:32,847 - transformers : 4.57.3
2025-11-25 20:58:32,847 - Generated: 2025-11-25 20:58:32 GMT
2025-11-25 20:58:33,062 -
2025-11-25 20:58:33,062 - --- System Information ---
2025-11-25 20:58:33,062 - OS : Darwin 25.1.0
2025-11-25 20:58:33,062 - macOS Version : 26.1
2025-11-25 20:58:33,062 - SDK Version : 26.1
2025-11-25 20:58:33,062 - Python Version : 3.13.9
2025-11-25 20:58:33,062 - Architecture : arm64
2025-11-25 20:58:33,062 - GPU/Chip : Apple M4 Max
2025-11-25 20:58:33,062 - GPU Cores : 40
2025-11-25 20:58:33,062 - Metal Support : Metal 4
2025-11-25 20:58:33,062 - RAM : 128.0 GB
2025-11-25 20:58:33,062 - CPU Cores (Physical): 16
2025-11-25 20:58:33,062 - CPU Cores (Logical) : 16
The following compares the result of asking mlx-community/Qwen2-VL-2B-Instruct-4bit to describe an image. It does a good enough job on the 2.1Mpx image, but runs out of memory for the 46.7mpx one. (Same result if I use the 2.1Mpx run prompt.) Other models run against the same 46.7mpx image without running out of memory.