Two weeks ago Mistral AI released the Mistral 3 collection, which includes Ministral 3 (which you support already) and Mistral Large 3 (which is not yet supported).
Like Mistral 3 Small it has vision capability, which is why it probably fits in with mlx-vlm rather than mlx-lm.
Reason: Mistral is really good with translations to European languages. And some Asian langauges.
- The small 24B Mistral Small models generated some of the best translations to French. Which is probably not surprising as Mistral is a French company.
- The new Mistral Large 3 seems to produce the best Finnish translations.
I work on a large translation project. MLX support for this model would be a huge help.
Architecture: Like Kimi K2, Mistral Large 3 uses the "standard" DeepSeek V3-0324 architecture, with some minor variations.
This is confirmed by
There are some (minor) differences:
They also have a 12B Eagle Speculator model for for speculative decoding. It uses NVIDIA’s Eagle3 speculative decoding approach with the Model Optimizer to predict a single draft token efficiently, making it useful for high-concurrency inference scenarios where fast token generation is a priority. I do not know if this could work with MLX.
Two weeks ago Mistral AI released the Mistral 3 collection, which includes Ministral 3 (which you support already) and Mistral Large 3 (which is not yet supported).
Like Mistral 3 Small it has vision capability, which is why it probably fits in with
mlx-vlmrather thanmlx-lm.Reason: Mistral is really good with translations to European languages. And some Asian langauges.
I work on a large translation project. MLX support for this model would be a huge help.
Architecture: Like Kimi K2, Mistral Large 3 uses the "standard" DeepSeek V3-0324 architecture, with some minor variations.
This is confirmed by
There are some (minor) differences:
config.jsonfile.They also have a 12B Eagle Speculator model for for speculative decoding. It uses NVIDIA’s Eagle3 speculative decoding approach with the Model Optimizer to predict a single draft token efficiently, making it useful for high-concurrency inference scenarios where fast token generation is a priority. I do not know if this could work with MLX.