It seems that when you try to run ministral3 there is an issue with decoding the lm's response. Adding this line to the try section in fix_ministral_pre_tokenizer "tokenizer._detokenizer_class = BPEStreamingDetokenizer" should fix the issue. The issue seems to be with how mlx-lm determines the correct decoder to use for the model. This is not an issue if just using mlx-vlm for model loading.
It seems that when you try to run ministral3 there is an issue with decoding the lm's response. Adding this line to the try section in fix_ministral_pre_tokenizer "tokenizer._detokenizer_class = BPEStreamingDetokenizer" should fix the issue. The issue seems to be with how mlx-lm determines the correct decoder to use for the model. This is not an issue if just using mlx-vlm for model loading.