Implement prewarm for MLXLanguageModel #97

noorbhatia · 2026-01-29T11:22:00Z

Implements prewarm() for MLXLanguageModel that improves first response time.
Prewarms the model with instructions, tools and prefixPrompt

Copilot

Pull request overview

Implements prewarm(for:promptPrefix:) for MLXLanguageModel to reduce first-response latency by loading the model context and priming the MLX processor with session instructions, tools, and an optional prompt prefix.

Changes:

Add MLXLanguageModel.prewarm(for:promptPrefix:) implementation.
Prewarm loads/caches ModelContext and calls context.processor.prepare(input:) with a minimal chat + tool specs.
Include session instructions and optional prompt prefix in the prewarm input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T12:08:48Z

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

+            Task {
+
+                let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)
+
+                // Build chat history similar to respond() to prime the cache effectively


Task { ... } inherits the caller’s actor context. If prewarm() is called from the main actor/UI, the model load + tokenization work inside this task can end up running on the main actor and cause UI hitches. Prefer running this as a detached/background task (e.g., Task.detached or explicitly hopping off the main actor) and consider setting an appropriate priority for prewarming work.

Copilot · 2026-01-29T12:08:49Z

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

+                let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)
+
+                // Build chat history similar to respond() to prime the cache effectively
+                var chat: [MLXLMCommon.Chat.Message] = []
+
+                // Add system instructions if present
+                if let instructions, !instructions.isEmpty {
+                    chat.append(.init(role: .system, content: instructions))
+                }
+
+                // Add prompt prefix or minimal user message
+                let promptText = promptPrefix?.description ?? "."
+                chat.append(.init(role: .user, content: promptText))
+
+                // Convert tools to MLX format
+                let toolSpecs: [ToolSpec]? =
+                    tools.isEmpty
+                    ? nil
+                    : tools.map { convertToolToMLXSpec($0) }
+
+                let userInput = MLXLMCommon.UserInput(
+                    chat: chat,
+                    processing: .init(resize: .init(width: 512, height: 512)),
+                    tools: toolSpecs
+                )
+
+                // Prepare input - triggers tokenization and processor initialization
+                _ = try await context.processor.prepare(input: userInput)


The Task body can throw (loadContext / prepare), but the error is never handled or surfaced. Since prewarm is fire-and-forget, it should catch and intentionally ignore (or otherwise report) failures so prewarm errors don’t get silently lost in a failed task.

Suggested change

let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)

// Build chat history similar to respond() to prime the cache effectively

var chat: [MLXLMCommon.Chat.Message] = []

// Add system instructions if present

if let instructions, !instructions.isEmpty {

chat.append(.init(role: .system, content: instructions))

}

// Add prompt prefix or minimal user message

let promptText = promptPrefix?.description ?? "."

chat.append(.init(role: .user, content: promptText))

// Convert tools to MLX format

let toolSpecs: [ToolSpec]? =

tools.isEmpty

? nil

: tools.map { convertToolToMLXSpec($0) }

let userInput = MLXLMCommon.UserInput(

chat: chat,

processing: .init(resize: .init(width: 512, height: 512)),

tools: toolSpecs

)

// Prepare input - triggers tokenization and processor initialization

_ = try await context.processor.prepare(input: userInput)

do {

let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)

// Build chat history similar to respond() to prime the cache effectively

var chat: [MLXLMCommon.Chat.Message] = []

// Add system instructions if present

if let instructions, !instructions.isEmpty {

chat.append(.init(role: .system, content: instructions))

}

// Add prompt prefix or minimal user message

let promptText = promptPrefix?.description ?? "."

chat.append(.init(role: .user, content: promptText))

// Convert tools to MLX format

let toolSpecs: [ToolSpec]? =

tools.isEmpty

? nil

: tools.map { convertToolToMLXSpec($0) }

let userInput = MLXLMCommon.UserInput(

chat: chat,

processing: .init(resize: .init(width: 512, height: 512)),

tools: toolSpecs

)

// Prepare input - triggers tokenization and processor initialization

_ = try await context.processor.prepare(input: userInput)

} catch {

// Intentionally ignore prewarm failures (model will be loaded on demand)

// You may replace this with a more sophisticated logging mechanism if desired.

print("MLXLanguageModel prewarm failed for modelId \(modelId): \(error)")

}

mattt · 2026-01-29T12:11:01Z

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

+                // Add prompt prefix or minimal user message
+                let promptText = promptPrefix?.description ?? "."
+                chat.append(.init(role: .user, content: promptText))


Unless "." has special significance in MLX, this makes me think that promptPrefix should be non-optional (and maybe non-empty?)

What do you think?

mattt · 2026-01-29T12:11:40Z

Sources/AnyLanguageModel/Models/MLXLanguageModel.swift

+
+                let userInput = MLXLMCommon.UserInput(
+                    chat: chat,
+                    processing: .init(resize: .init(width: 512, height: 512)),


This seems like the kind of thing that we'd want to parameterize in the method, rather than hard-code.

mattt · 2026-01-29T12:18:20Z

Thanks for opening this PR, @noorbhatia!

I think this kind of functionality gets into the realm of KV cache management, which so far this implementation hasn't attempted to support. At a high-level, I'd expect an API that has some concept of prewarming a common prefix of tokens, caching that, and then reusing for various suffixes. Most likely, cache selection and management would be automatic; I'm not sure yet what controls we'd want to expose.

Can you say more about how you understand the problem?

noorbhatia force-pushed the noor/mlx-prewarm-model branch from ede4b54 to 6d7cbd5 Compare January 29, 2026 11:25

Implement prewarm for MLXLanguageModel

f024896

noorbhatia force-pushed the noor/mlx-prewarm-model branch from 6d7cbd5 to f024896 Compare January 29, 2026 11:30

mattt requested a review from Copilot January 29, 2026 12:04

Copilot started reviewing on behalf of mattt January 29, 2026 12:04 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

mattt reviewed Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement prewarm for MLXLanguageModel #97

Implement prewarm for MLXLanguageModel #97

Uh oh!

noorbhatia commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

mattt Jan 29, 2026

Uh oh!

mattt Jan 29, 2026

Uh oh!

mattt commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement prewarm for MLXLanguageModel #97

Are you sure you want to change the base?

Implement prewarm for MLXLanguageModel #97

Uh oh!

Conversation

noorbhatia commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

mattt Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

mattt Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

mattt commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants