Support for Mistral-Large-3-675B-Instruct-2512 please

Two weeks ago Mistral AI [released the Mistral 3 collection](https://mistral.ai/news/mistral-3), which includes Ministral 3 (which you support already) and **Mistral Large 3** (which is not yet supported).

Like Mistral 3 Small it has vision capability, which is why it probably fits in with `mlx-vlm` rather than `mlx-lm`.

---

**Reason:** Mistral is really good with translations to European languages. And some Asian langauges.
* The small 24B Mistral Small models generated some of the best translations to French. Which is probably not surprising as Mistral is a French company.
* The new Mistral Large 3 seems to produce the best Finnish translations.

I work on a large translation project. MLX support for this model would be a huge help.

---

**Architecture:** Like Kimi K2, Mistral Large 3 uses the "standard" DeepSeek V3-0324 architecture, with some minor variations.

This is confirmed by
* [Sebastian Raschka drew the architecture diagrams](https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd.it%2F70lznwrbzz6g1.png%3Fwidth%3D2846%26format%3Dpng%26auto%3Dwebp%26s%3Daca49968a91f54b80594024ab98b9cd968be8bdf) comparing the DeepSeek V3/R1 model to the new Mistral Large 3 model.
* [A comment by a Mistral employee](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/discussions/6#693843141b0a1bfa17c37650) compares the differences.

There are some (minor) differences:
* The model is not yet transformers-compatible, as [confirmed by the Mistral employee](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/discussions/3#693027acb2cb82ffcbbedc2e). As such there is no `config.json` file.
* You might glean more information from the [llama.cpp pull request 17730](https://github.com/ggml-org/llama.cpp/pull/17730). It did not take them very long to get this working for llama.cpp.
* It does work with vLLM

They also have [a 12B Eagle Speculator model](https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512-Eagle) for for speculative decoding. It uses NVIDIA’s Eagle3 speculative decoding approach with the Model Optimizer to predict a single draft token efficiently, making it useful for high-concurrency inference scenarios where fast token generation is a priority. I do not know if this could work with MLX.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Mistral-Large-3-675B-Instruct-2512 please #630

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Support for Mistral-Large-3-675B-Instruct-2512 please #630

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions