Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions Sources/AnyLanguageModel/Models/MLXLanguageModel.swift
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,51 @@ import Foundation

return LanguageModelSession.ResponseStream(stream: stream)
}

/// Prewarms the model for the given session and optional prompt prefix.
public func prewarm(
for session: LanguageModelSession,
promptPrefix: Prompt?
) {
let modelId = self.modelId
let hub = self.hub
let directory = self.directory

let instructions = session.instructions?.description
let tools = session.tools

Task {

let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)

// Build chat history similar to respond() to prime the cache effectively
Comment on lines +373 to +377
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task { ... } inherits the caller’s actor context. If prewarm() is called from the main actor/UI, the model load + tokenization work inside this task can end up running on the main actor and cause UI hitches. Prefer running this as a detached/background task (e.g., Task.detached or explicitly hopping off the main actor) and consider setting an appropriate priority for prewarming work.

Copilot uses AI. Check for mistakes.
var chat: [MLXLMCommon.Chat.Message] = []

// Add system instructions if present
if let instructions, !instructions.isEmpty {
chat.append(.init(role: .system, content: instructions))
}

// Add prompt prefix or minimal user message
let promptText = promptPrefix?.description ?? "."
chat.append(.init(role: .user, content: promptText))
Comment on lines +385 to +387
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless "." has special significance in MLX, this makes me think that promptPrefix should be non-optional (and maybe non-empty?)

What do you think?


// Convert tools to MLX format
let toolSpecs: [ToolSpec]? =
tools.isEmpty
? nil
: tools.map { convertToolToMLXSpec($0) }

let userInput = MLXLMCommon.UserInput(
chat: chat,
processing: .init(resize: .init(width: 512, height: 512)),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the kind of thing that we'd want to parameterize in the method, rather than hard-code.

tools: toolSpecs
)

// Prepare input - triggers tokenization and processor initialization
_ = try await context.processor.prepare(input: userInput)
Comment on lines +375 to +402
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Task body can throw (loadContext / prepare), but the error is never handled or surfaced. Since prewarm is fire-and-forget, it should catch and intentionally ignore (or otherwise report) failures so prewarm errors don’t get silently lost in a failed task.

Suggested change
let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)
// Build chat history similar to respond() to prime the cache effectively
var chat: [MLXLMCommon.Chat.Message] = []
// Add system instructions if present
if let instructions, !instructions.isEmpty {
chat.append(.init(role: .system, content: instructions))
}
// Add prompt prefix or minimal user message
let promptText = promptPrefix?.description ?? "."
chat.append(.init(role: .user, content: promptText))
// Convert tools to MLX format
let toolSpecs: [ToolSpec]? =
tools.isEmpty
? nil
: tools.map { convertToolToMLXSpec($0) }
let userInput = MLXLMCommon.UserInput(
chat: chat,
processing: .init(resize: .init(width: 512, height: 512)),
tools: toolSpecs
)
// Prepare input - triggers tokenization and processor initialization
_ = try await context.processor.prepare(input: userInput)
do {
let context = try await loadContext(modelId: modelId, hub: hub, directory: directory)
// Build chat history similar to respond() to prime the cache effectively
var chat: [MLXLMCommon.Chat.Message] = []
// Add system instructions if present
if let instructions, !instructions.isEmpty {
chat.append(.init(role: .system, content: instructions))
}
// Add prompt prefix or minimal user message
let promptText = promptPrefix?.description ?? "."
chat.append(.init(role: .user, content: promptText))
// Convert tools to MLX format
let toolSpecs: [ToolSpec]? =
tools.isEmpty
? nil
: tools.map { convertToolToMLXSpec($0) }
let userInput = MLXLMCommon.UserInput(
chat: chat,
processing: .init(resize: .init(width: 512, height: 512)),
tools: toolSpecs
)
// Prepare input - triggers tokenization and processor initialization
_ = try await context.processor.prepare(input: userInput)
} catch {
// Intentionally ignore prewarm failures (model will be loaded on demand)
// You may replace this with a more sophisticated logging mechanism if desired.
print("MLXLanguageModel prewarm failed for modelId \(modelId): \(error)")
}

Copilot uses AI. Check for mistakes.
}
}
}

// MARK: - Options Mapping
Expand Down