Skip to content

Transformers 5.0.0rc1 Compatibility Issues with MLX-VLM Models #640

@jrp2014

Description

@jrp2014

Summary

I ran my script against 36 models using mlx-vlm with transformers==5.0.0rc1. While 26 models passed, 10 models failed, mostly due to breaking API changes in the new transformers release candidate, I think.

*Environment:

  • mlx-vlm: 0.3.10
  • transformers: 5.0.0rc1
  • mlx: 0.30.3.dev20251219+d9f4d8d5

1. Breaking Changes in Transformers 5.0.0rc1

A. Removal of _validate_images_text_input_order

Affected Models: mlx-community/Kimi-VL-A3B-Thinking-2506-bf16, mlx-community/Kimi-VL-A3B-Thinking-8bit
Error:

ImportError: cannot import name '_validate_images_text_input_order' from 'transformers.processing_utils'

Cause: The internal function _validate_images_text_input_order seems to have been removed or refactored in v5. Models relying on this in their dynamic modules will fail to load.

B. padding_side Argument in Processors

Affected Models: microsoft/Phi-3.5-vision-instruct, mlx-community/Phi-3.5-vision-instruct-bf16
Error:

TypeError: Phi3VProcessor.__call__() got an unexpected keyword argument 'padding_side'

Cause: In v5, padding_side is no longer accepted as a direct argument in the processor call for Phi3VProcessor. It likely needs to be set on the tokenizer instance directly.

C. additional_special_tokens in Tokenizers

Affected Models: prince-canuma/Florence-2-large-ft
Error:

AttributeError: RobertaTokenizer has no attribute additional_special_tokens

Cause: The handling of additional_special_tokens appears to have changed for RobertaTokenizer in v5, leading to this attribute error during initialization.

D. Image Processor Type Mismatch

Affected Models: mlx-community/InternVL3-14B-8bit
Error:

TypeError: Received a InternVLImageProcessor for argument image_processor, but a ImageProcessingMixin was expected.

Cause: Likely a stricter type check or hierarchy change in v5's image processing modules.

2. Other Failures

  • OOM: mlx-community/Qwen2-VL-2B-Instruct-4bit failed with a Metal OOM allocating 136GB buffer.
  • Unsupported Model: mlx-community/LFM2-VL-1.6B-8bit returned Error: Unsupported model: lfm2-vl.
  • Missing Weights: microsoft/Florence-2-large-ft failed with ValueError: Missing 1 parameters: language_model.lm_head.weight.
  • Chat Template: mlx-community/gemma-3-12b-pt-8bit failed because the processor lacks a chat template.

3. Quality Observations (Successful Runs)

  • Context Ignorance: Several models completely ignored the provided text context (e.g., date "2025-12-13") and hallucinated.
    • mlx-community/deepseek-vl2-8bit
    • mlx-community/paligemma2-3b-pt-896-4bit
    • mlx-community/Llama-3.2-11B-Vision-Instruct-8bit
  • Verbosity: Qwen3-VL and GLM-4 tended to be very verbose or output internal "thinking" traces.
  • Repetitive Loops: paligemma2 models (10b-ft, 3b-ft) got stuck in repetitive loops ("There are gray stone pillars...", "The squares alternate...").

4. Successful Models

For reference, the following models worked well:

  • mlx-community/SmolVLM2-2.2B-Instruct-mlx (Fast & Efficient)
  • mlx-community/Ministral-3-3B-Instruct-2512-4bit (Fastest)
  • qnguyen3/nanoLLaVA
  • mlx-community/pixtral-12b-bf16

Update Configs: For models with custom code (remote code), the repo owners may need to update their processing_*.py files to align with Transformers v5.

Full details in various formats are at https://github.com/jrp2014/check_models/tree/main/src/output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions