Summary
I ran my script against 36 models using mlx-vlm with transformers==5.0.0rc1. While 26 models passed, 10 models failed, mostly due to breaking API changes in the new transformers release candidate, I think.
*Environment:
mlx-vlm: 0.3.10
transformers: 5.0.0rc1
mlx: 0.30.3.dev20251219+d9f4d8d5
1. Breaking Changes in Transformers 5.0.0rc1
A. Removal of _validate_images_text_input_order
Affected Models: mlx-community/Kimi-VL-A3B-Thinking-2506-bf16, mlx-community/Kimi-VL-A3B-Thinking-8bit
Error:
ImportError: cannot import name '_validate_images_text_input_order' from 'transformers.processing_utils'
Cause: The internal function _validate_images_text_input_order seems to have been removed or refactored in v5. Models relying on this in their dynamic modules will fail to load.
B. padding_side Argument in Processors
Affected Models: microsoft/Phi-3.5-vision-instruct, mlx-community/Phi-3.5-vision-instruct-bf16
Error:
TypeError: Phi3VProcessor.__call__() got an unexpected keyword argument 'padding_side'
Cause: In v5, padding_side is no longer accepted as a direct argument in the processor call for Phi3VProcessor. It likely needs to be set on the tokenizer instance directly.
C. additional_special_tokens in Tokenizers
Affected Models: prince-canuma/Florence-2-large-ft
Error:
AttributeError: RobertaTokenizer has no attribute additional_special_tokens
Cause: The handling of additional_special_tokens appears to have changed for RobertaTokenizer in v5, leading to this attribute error during initialization.
D. Image Processor Type Mismatch
Affected Models: mlx-community/InternVL3-14B-8bit
Error:
TypeError: Received a InternVLImageProcessor for argument image_processor, but a ImageProcessingMixin was expected.
Cause: Likely a stricter type check or hierarchy change in v5's image processing modules.
2. Other Failures
- OOM:
mlx-community/Qwen2-VL-2B-Instruct-4bit failed with a Metal OOM allocating 136GB buffer.
- Unsupported Model:
mlx-community/LFM2-VL-1.6B-8bit returned Error: Unsupported model: lfm2-vl.
- Missing Weights:
microsoft/Florence-2-large-ft failed with ValueError: Missing 1 parameters: language_model.lm_head.weight.
- Chat Template:
mlx-community/gemma-3-12b-pt-8bit failed because the processor lacks a chat template.
3. Quality Observations (Successful Runs)
- Context Ignorance: Several models completely ignored the provided text context (e.g., date "2025-12-13") and hallucinated.
mlx-community/deepseek-vl2-8bit
mlx-community/paligemma2-3b-pt-896-4bit
mlx-community/Llama-3.2-11B-Vision-Instruct-8bit
- Verbosity:
Qwen3-VL and GLM-4 tended to be very verbose or output internal "thinking" traces.
- Repetitive Loops:
paligemma2 models (10b-ft, 3b-ft) got stuck in repetitive loops ("There are gray stone pillars...", "The squares alternate...").
4. Successful Models
For reference, the following models worked well:
mlx-community/SmolVLM2-2.2B-Instruct-mlx (Fast & Efficient)
mlx-community/Ministral-3-3B-Instruct-2512-4bit (Fastest)
qnguyen3/nanoLLaVA
mlx-community/pixtral-12b-bf16
Update Configs: For models with custom code (remote code), the repo owners may need to update their processing_*.py files to align with Transformers v5.
Full details in various formats are at https://github.com/jrp2014/check_models/tree/main/src/output
Summary
I ran my script against 36 models using
mlx-vlmwithtransformers==5.0.0rc1. While 26 models passed, 10 models failed, mostly due to breaking API changes in the newtransformersrelease candidate, I think.*Environment:
mlx-vlm: 0.3.10transformers: 5.0.0rc1mlx: 0.30.3.dev20251219+d9f4d8d51. Breaking Changes in Transformers 5.0.0rc1
A. Removal of
_validate_images_text_input_orderAffected Models:
mlx-community/Kimi-VL-A3B-Thinking-2506-bf16,mlx-community/Kimi-VL-A3B-Thinking-8bitError:
Cause: The internal function
_validate_images_text_input_orderseems to have been removed or refactored in v5. Models relying on this in their dynamic modules will fail to load.B.
padding_sideArgument in ProcessorsAffected Models:
microsoft/Phi-3.5-vision-instruct,mlx-community/Phi-3.5-vision-instruct-bf16Error:
Cause: In v5,
padding_sideis no longer accepted as a direct argument in the processor call forPhi3VProcessor. It likely needs to be set on the tokenizer instance directly.C.
additional_special_tokensin TokenizersAffected Models:
prince-canuma/Florence-2-large-ftError:
Cause: The handling of
additional_special_tokensappears to have changed forRobertaTokenizerin v5, leading to this attribute error during initialization.D. Image Processor Type Mismatch
Affected Models:
mlx-community/InternVL3-14B-8bitError:
Cause: Likely a stricter type check or hierarchy change in v5's image processing modules.
2. Other Failures
mlx-community/Qwen2-VL-2B-Instruct-4bitfailed with a Metal OOM allocating 136GB buffer.mlx-community/LFM2-VL-1.6B-8bitreturnedError: Unsupported model: lfm2-vl.microsoft/Florence-2-large-ftfailed withValueError: Missing 1 parameters: language_model.lm_head.weight.mlx-community/gemma-3-12b-pt-8bitfailed because the processor lacks a chat template.3. Quality Observations (Successful Runs)
mlx-community/deepseek-vl2-8bitmlx-community/paligemma2-3b-pt-896-4bitmlx-community/Llama-3.2-11B-Vision-Instruct-8bitQwen3-VLandGLM-4tended to be very verbose or output internal "thinking" traces.paligemma2models (10b-ft, 3b-ft) got stuck in repetitive loops ("There are gray stone pillars...", "The squares alternate...").4. Successful Models
For reference, the following models worked well:
mlx-community/SmolVLM2-2.2B-Instruct-mlx(Fast & Efficient)mlx-community/Ministral-3-3B-Instruct-2512-4bit(Fastest)qnguyen3/nanoLLaVAmlx-community/pixtral-12b-bf16Update Configs: For models with custom code (remote code), the repo owners may need to update their
processing_*.pyfiles to align with Transformers v5.Full details in various formats are at https://github.com/jrp2014/check_models/tree/main/src/output