diff --git a/deployment/on-device/leap-sdk-changelog.mdx b/deployment/on-device/leap-sdk-changelog.mdx
index cf3c82a4..0c1a4d79 100644
--- a/deployment/on-device/leap-sdk-changelog.mdx
+++ b/deployment/on-device/leap-sdk-changelog.mdx
@@ -3,7 +3,7 @@ title: "Changelog"
 description: "Release notes for the LEAP SDK, including the 0.9.x → 0.10.x Kotlin Multiplatform transition."
 ---
 
-Latest release: **v0.10.6** ([GitHub](https://github.com/Liquid4All/leap-sdk/releases/tag/v0.10.6)).
+Latest release: **v0.10.7** ([GitHub](https://github.com/Liquid4All/leap-sdk/releases/tag/v0.10.7)).
 
 This page covers user-visible changes in the LEAP SDK across releases. For per-build commit detail, see the release notes on [`Liquid4All/leap-sdk`](https://github.com/Liquid4All/leap-sdk/releases).
 
@@ -37,7 +37,7 @@ v0.10.0 raises the minimum iOS deployment target from 15.0 to **17.0** and macOS
 - **SPM URL change.** Point your Swift Package Manager dependency at `https://github.com/Liquid4All/leap-sdk.git` (not the deprecated `leap-ios` repo).
 - **CocoaPods removed.** The SDK ships exclusively through SPM in v0.10.0 onward.
 - **Toolchain bump.** Xcode 16 and Swift 6.0 are required.
-- **`ModelDownloader` → `LeapModelDownloader`.** The downloader class was renamed; update call sites accordingly. See [Model Loading](/deployment/on-device/sdk/model-loading) for the 0.10.x constructor signature.
+- **Swift downloader name.** In current 0.10.x, Swift code instantiates `ModelDownloader` from the `LeapModelDownloader` SPM product. Android code still uses the Kotlin class `ai.liquid.leap.downloader.LeapModelDownloader`. See [Model Loading](/deployment/on-device/sdk/model-loading) for the constructor signatures.
 
 ## Major additions since 0.9.x
 
@@ -147,6 +147,34 @@ val runner = downloader.loadModel(
 
 ## Per-release notes
 
+### v0.10.7 — 2026-05-18
+
+KMP target completion for `leap-openai-client` plus a repo-wide bytecode-hardening pass. iOS / macOS Swift surface is unchanged from v0.10.6 — this is a Kotlin/JVM ergonomics release for non-Apple consumers.
+
+**New targets on `leap-openai-client`** ([PR #256](https://github.com/Liquid4All/leap-android-sdk/pull/256)):
+
+- **`jvm`** (Ktor CIO engine) — Maven Central now publishes `ai.liquid.leap:leap-openai-client-jvm:0.10.7`. Pure-JVM desktop / server apps can route OpenAI-compatible chat completions without dragging in Android or KMP targets. (The 0.10.0 — 0.10.6 SPM cascade only shipped Android + Apple + Linux/MinGW K/N + wasmJs metadata; the JVM slice was absent.)
+- **`wasmJs`** (Ktor Js engine) — browser-side chat-completions client matching what `leap-sdk` already targets.
+
+The Apple slice (`LeapOpenAIClient.xcframework`) ships unchanged — same SSE-stream surface, same `OpenAiClientConfig`, same OpenRouter extra-headers support. SKIE is still not applied to this module in v0.10.7, so the Kotlin/Native exports remain the same as v0.10.6: `Flow<ChatCompletionEvent>` is not bridged to Swift `AsyncSequence`, and `onEnum(of:)` is not generated for `ChatCompletionEvent`. **The next release will enable SKIE on `leap-sdk-openai-client`**, bringing `for try await` over the stream, exhaustive `onEnum(of:)` switching, and SKIE-bundled Swift convenience inits — see the [OpenAI client page](/deployment/on-device/sdk/openai-client) for the current pinning guidance.
+
+**Bytecode hardening:**
+
+- The `leap-sdk-jvm`, `leap-openai-client-jvm`, `leap-ui-jvm`, and `leap-ui-android` artifacts had been silently shipping Java 17 / Java 21 bytecode against the project's stated JVM-target-11 stance. All ten published JVM / Android slices now consistently emit class-file major version `0x37` (Java 11). Consumers running on JDK 11 — particularly long-running services and JDK-11-pinned Android Gradle builds — are no longer at risk of `UnsupportedClassVersionError`.
+
+**Internal: KMP build centralization** (no consumer-visible API change):
+
+- Root-level `subprojects { tasks.withType<KotlinCompile>().configureEach { compilerOptions { jvmTarget.set(JvmTarget.JVM_11) } } }` replaces 17 per-site `JVM_11` pins.
+- Karma + headless Chrome runner for `wasmJs` targets centralized into the same `subprojects {}` block — replaces 3 per-site copies. Future modules pick up both patterns automatically.
+
+**Test coverage:**
+
+- `OpenAiClientTest`'s seven SSE-stream + auth-header + error-event + malformed-chunk cases were promoted from `androidHostTest` to `commonTest`. They now also run on `jvmTest`, `macosArm64Test`, `iosSimulatorArm64Test`, `linuxX64Test`, `mingwX64Test`, and `wasmJsTest`.
+
+**iOS surface (unchanged from v0.10.6):**
+
+The four XCFrameworks (`LeapSDK`, `LeapModelDownloader`, `LeapOpenAIClient`, `LeapUi`) ship the same Swift APIs as v0.10.6. The v0.10.6 ObjC class rename to `ModelDownloader`, the dual-import guard, the dynamic `LeapModelDownloader` framework, and the `LeapDownloaderConfig()` parameterless init all remain in place.
+
 ### v0.10.6 — 2026-05-12
 
 iOS `ModelDownloader` (the Swift class formerly known as `LeapModelDownloader` — see the rename note below) reaches parity with the cross-platform `LeapDownloader`. Callers no longer need to pair the two classes to download and load a model on Apple platforms — every entry point routes file transfer through `URLSession` and then hands off to the loader.
diff --git a/deployment/on-device/sdk/advanced-features.mdx b/deployment/on-device/sdk/advanced-features.mdx
index 74ce2176..88baa9de 100644
--- a/deployment/on-device/sdk/advanced-features.mdx
+++ b/deployment/on-device/sdk/advanced-features.mdx
@@ -11,25 +11,25 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    `GenerationOptions` is a Kotlin `data class` bridged into Swift. Kotlin parameter defaults don't survive the ObjC bridge, so the canonical Swift idiom is the parameterless init plus chained `.with(...)` builders:
+
     ```swift
-    public struct GenerationOptions {
+    public class GenerationOptions {
       public var temperature: Float?
       public var topP: Float?
       public var minP: Float?
       public var repetitionPenalty: Float?
+      public var topK: Int32?
+      public var rngSeed: Int64?
       public var jsonSchemaConstraint: String?
-      public var functionCallParser: LeapFunctionCallParserProtocol?
-
-      public init(
-        temperature: Float? = nil,
-        topP: Float? = nil,
-        minP: Float? = nil,
-        repetitionPenalty: Float? = nil,
-        jsonSchemaConstraint: String? = nil,
-        functionCallParser: LeapFunctionCallParserProtocol? = LFMFunctionCallParser()
-      )
-
-      public mutating func setResponseFormat<T: GeneratableType>(type: T.Type) throws
+      public var functionCallParser: LeapFunctionCallParser?
+      public var injectSchemaIntoPrompt: Bool        // default true
+      public var maxTokens: Int32?
+      public var inlineThinkingTags: Bool            // default false
+      public var enableThinking: Bool                // default false
+      public var extras: String?
+
+      public convenience init()
     }
     ```
 
@@ -40,8 +40,10 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
         .with(temperature: 0.3)
         .with(minP: 0.15)
         .with(repetitionPenalty: 1.05)
-        .with(jsonSchema: schemaString)
+        .with(jsonSchema: CityFact.jsonSchema())   // or any other schema string
     ```
+
+    For the legacy compat-class path (`Leap.load(...)` flows), `GenerationOptionsCompat` additionally exposes `setResponseFormat(jsonSchema: String)`.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
@@ -50,10 +52,17 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
         var topP: Float? = null,
         var minP: Float? = null,
         var repetitionPenalty: Float? = null,
+        var topK: Int? = null,
+        var rngSeed: Long? = null,
         var jsonSchemaConstraint: String? = null,
         var functionCallParser: LeapFunctionCallParser? = LFMFunctionCallParser(),
+        var injectSchemaIntoPrompt: Boolean = true,
+        var maxTokens: Int? = null,
+        var inlineThinkingTags: Boolean = false,
+        var enableThinking: Boolean = false,
+        var extras: String? = null,
     ) {
-        fun setResponseFormatType(kClass: KClass<*>)
+        inline fun <reified T : Any> setResponseFormatType()
 
         companion object {
             fun build(buildAction: GenerationOptions.() -> Unit): GenerationOptions
@@ -63,9 +72,14 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
   </Tab>
 </Tabs>
 
-- **Sampling fields** — use the model card's recommended values; arbitrary defaults from generic tutorials usually underperform.
-- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the `setResponseFormat(type:)` / `setResponseFormatType(...)` helpers instead of writing the schema by hand.
+- **Sampling fields** — `temperature`, `topP`, `minP`, `topK`, and `repetitionPenalty`. Use the model bundle's recommended values; arbitrary defaults from generic tutorials usually underperform.
+- **`rngSeed`** — deterministic sampling seed for tests and reproducible runs.
+- **`maxTokens`** — maximum completion tokens to generate. Prompt tokens do not count toward this cap.
+- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the higher-level helpers — Swift `options.with(jsonSchema: T.jsonSchema())` / Kotlin `setResponseFormatType<T>()` — instead of writing the schema by hand.
+- **`injectSchemaIntoPrompt`** — when `true` (default), the schema is also appended to the system message for semantic guidance. Set `false` to use only the structural constraint.
 - **`functionCallParser`** — `LFMFunctionCallParser` (default), `HermesFunctionCallParser()`, or `null`/`nil` to disable parsing and surface raw tool-call text in `Chunk`s.
+- **`enableThinking` / `inlineThinkingTags`** — reasoning-mode controls for models that emit `<think>` content.
+- **`extras`** — backend-specific JSON payload.
 
 ## Constrained generation utilities
 
@@ -73,11 +87,13 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
   <Tab title="Swift (iOS / macOS)">
     ```swift
     // Compile-time schema synthesis lives in the @Generatable macro.
-    // For ad-hoc inspection:
-    let schemaString = try JSONSchemaGenerator.getJSONSchema(for: CityFact.self)
+    // For ad-hoc inspection (ships in the LeapSDKMacros SPM product):
+    import LeapSDKMacros
+
+    let schemaString = JSONSchemaGenerator.getJSONSchema(for: CityFact.self)
     ```
 
-    `JSONSchemaGenerator.getJSONSchema(for:)` returns the same JSON Schema string the macro emits at compile time. Useful when embedding the schema in the prompt itself, or when you want to debug the schema the model is being constrained against.
+    `JSONSchemaGenerator.getJSONSchema(for:)` is non-throwing — it forwards to the `jsonSchema()` method that the `@Generatable` macro adds to the type, so the schema is produced at compile time. Useful when embedding the schema in the prompt itself, or when you want to debug the schema the model is being constrained against.
 
     See [Constrained Generation](./constrained-generation) for the full `@Generatable` / `@Guide` macro reference.
   </Tab>
@@ -89,11 +105,14 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
 
     object JSONSchemaGenerator {
       @Throws(LeapGeneratableSchematizationException::class)
-      fun <T : Any> getJSONSchema(klass: KClass<T>, indentSpaces: Int? = null): String
+      fun <T : Any> getJSONSchema(serializer: KSerializer<T>, indentSpaces: Int? = null): String
+
+      @Throws(LeapGeneratableSchematizationException::class)
+      inline fun <reified T : Any> getJSONSchema(indentSpaces: Int? = null): String
     }
     ```
 
-    - `klass` — must be a data class annotated with `@Generatable`.
+    - `serializer` — the `KSerializer<T>` for a data class annotated with `@Generatable` and `@Serializable`. The reified-`T` overload calls `serializer<T>()` for you.
     - `indentSpaces` — non-null formats the output with the given indent (pretty-print).
 
     Throws `LeapGeneratableSchematizationException` if the class can't be translated.
@@ -101,16 +120,18 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani
     ### `GeneratableFactory`
 
     ```kotlin
+    import kotlinx.serialization.json.JsonObject
+
     object GeneratableFactory {
       @Throws(LeapGeneratableDeserializationException::class)
-      fun <T : Any> createFromJSONObject(jsonObject: JSONObject, klass: KClass<T>): T
+      fun <T : Any> createFromJsonObject(jsonObject: JsonObject, serializer: KSerializer<T>): T
 
       @Throws(LeapGeneratableDeserializationException::class)
-      inline fun <reified T : Any> createFromJSONObject(jsonObject: JSONObject): T
+      inline fun <reified T : Any> createFromJsonObject(jsonObject: JsonObject): T
     }
     ```
 
-    The reified-`T` overload is a convenience when the target type can be inferred from context.
+    Note the camelCase `Json` in the method name and the `kotlinx.serialization.json.JsonObject` argument (not `org.json.JSONObject`). The reified-`T` overload is a convenience when the target type can be inferred from context.
 
     ### Annotations
 
@@ -179,16 +200,19 @@ The full surface is documented in [Function Calling](./function-calling); the ty
         val optional: Boolean = false,
     )
 
-    sealed class LeapFunctionParameterType(description: String? = null) {
-      val description: String? = description
-
-      class String(val enumValues: List<kotlin.String>? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description)
-      class Number(val enumValues: List<kotlin.Number>? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description)
-      class Integer(val enumValues: List<Int>? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description)
-      class Boolean(description: kotlin.String? = null) : LeapFunctionParameterType(description)
-      class Null : LeapFunctionParameterType()
-      class Array(val itemType: LeapFunctionParameterType, description: kotlin.String? = null) : LeapFunctionParameterType(description)
-      class Object(
+    sealed class LeapFunctionParameterType(typeDescription: kotlin.String? = null) {
+      var description: kotlin.String? = typeDescription
+        private set
+
+      // Nested class names carry a `Leap` prefix so they don't shadow `kotlin.String`,
+      // `kotlin.Number`, etc. at use sites.
+      class LeapStr(val enumValues: List<kotlin.String>? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description)
+      class LeapNum(val enumValues: List<kotlin.Number>? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description)
+      class LeapInt(val enumValues: List<Int>? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description)
+      class LeapBool(description: kotlin.String? = null) : LeapFunctionParameterType(description)
+      class LeapNull : LeapFunctionParameterType()
+      class LeapArr(val itemType: LeapFunctionParameterType, description: kotlin.String? = null) : LeapFunctionParameterType(description)
+      class LeapObj(
         val properties: Map<kotlin.String, LeapFunctionParameterType>,
         val required: List<kotlin.String> = listOf(),
         description: kotlin.String? = null,
@@ -210,29 +234,25 @@ Two parser implementations ship with the SDK on every platform:
 - **`LFMFunctionCallParser`** — default. Handles Liquid Foundation Model (LFM2) Pythonic-style control tokens (`<|tool_call_start|>` / `<|tool_call_end|>`).
 - **`HermesFunctionCallParser`** — Qwen3 and other models using the [Hermes function-calling format](https://github.com/NousResearch/Hermes-Function-Calling).
 
-Implement `LeapFunctionCallParserProtocol` (Swift) / `LeapFunctionCallParser` (Kotlin) to add support for a new format.
+Subclass `LeapFunctionCallParser` (Kotlin `abstract class`, bridged to Swift as a class with the same name) to add support for a new format.
 
-## Backend-specific extras
+## Prompt token budgeting
 
-Some runtime utilities are exposed on the concrete `LiquidInferenceEngineRunner` rather than the cross-platform `ModelRunner` protocol/interface. The most common is **prompt token budgeting** — useful when you need to estimate context usage before sending a long request.
+`getPromptTokensSize(messages:, addBosToken:)` is declared directly on `ModelRunner` — no cast required. Useful when you need to estimate context usage before sending a long request.
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    if let engine = runner as? LiquidInferenceEngineRunner {
-        let count = engine.getPromptTokensSize(messages: history, addBosToken: true)
-        print("Prompt would consume \(count) tokens")
-    }
+    let count = try await runner.getPromptTokensSize(messages: history, addBosToken: true)
+    print("Prompt would consume \(count) tokens")
     ```
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
-    (runner as? LiquidInferenceEngineRunner)?.let { engine ->
-        val count = engine.getPromptTokensSize(messages = history, addBosToken = true)
-        println("Prompt would consume $count tokens")
-    }
+    val count = runner.getPromptTokensSize(messages = history, addBosToken = true)
+    println("Prompt would consume $count tokens")
     ```
+
+    `getPromptTokensSize` is `suspend` — call it from a coroutine.
   </Tab>
 </Tabs>
-
-These methods are backend-specific and may be elevated to the `ModelRunner` interface in a future release — defensively check the cast.
diff --git a/deployment/on-device/sdk/ai-agent-usage-guide.mdx b/deployment/on-device/sdk/ai-agent-usage-guide.mdx
index de399d02..bd117c2b 100644
--- a/deployment/on-device/sdk/ai-agent-usage-guide.mdx
+++ b/deployment/on-device/sdk/ai-agent-usage-guide.mdx
@@ -49,6 +49,9 @@ Every agent has the same shape: send a `ChatMessage`, iterate the response strea
                 Task { await dispatch(call) }
             }
         case .audioSample(let audio):
+            // `audio.samples` is `KotlinFloatArray` — bridge to `[Float]` via
+            // `LeapSDK.ArrayConversionsKt.floatArrayToNSData(array:)` if your
+            // renderer expects a Swift array (see the demo in leap-ui-demo/shared/).
             audioPlayer.enqueue(audio.samples, sampleRate: Int(audio.sampleRate))
         case .complete(let completion):
             currentText = ""
@@ -88,7 +91,7 @@ The defining feature of an agent: the model emits `FunctionCalls`, you execute t
     ```swift
     func agentLoop(initialQuestion: String) async throws {
         var workingConv = conversation!
-        var pending = ChatMessage(role: .user, content: [.text(initialQuestion)])
+        var pending = ChatMessage(role: .user, textContent: initialQuestion)
 
         while true {
             var toolCalls: [LeapFunctionCall] = []
@@ -107,14 +110,16 @@ The defining feature of an agent: the model emits `FunctionCalls`, you execute t
 
             if toolCalls.isEmpty { break }   // Agent is done
 
-            // Execute tools, append results, loop
-            let toolMessages = await toolCalls.asyncMap { call in
+            // Execute tools sequentially, append results, loop.
+            // (Swift's stdlib has no `asyncMap`; use a `for await` accumulation pass.)
+            var toolMessages: [ChatMessage] = []
+            for call in toolCalls {
                 let result = await runtimeDispatch(call)
-                return ChatMessage(role: .tool, content: [.text(result)])
+                toolMessages.append(ChatMessage(role: .tool, textContent: result))
             }
             let updatedHistory = workingConv.history + toolMessages
             workingConv = workingConv.modelRunner.createConversationFromHistory(history: updatedHistory)
-            pending = ChatMessage(role: .user, content: [.text("")])  // empty turn — let the model continue
+            pending = ChatMessage(role: .user, textContent: "")  // empty turn — let the model continue
         }
     }
     ```
@@ -148,7 +153,7 @@ The defining feature of an agent: the model emits `FunctionCalls`, you execute t
                 )
             }
             val updatedHistory = workingConv.history + toolMessages
-            workingConv = modelRunner.createConversationFromHistory(updatedHistory)
+            workingConv = workingConv.modelRunner.createConversationFromHistory(updatedHistory)
             pending = ChatMessage(role = ChatMessage.Role.USER, content = listOf(ChatMessageContent.Text("")))
         }
     }
@@ -161,7 +166,7 @@ Define `runtimeDispatch(_:)` as your tool-call → result router: validate argum
 ## Multimodal inputs
 
 <Info>
-**Multimodality is model-specific.** Most multimodal models ship as text + one other modality (vision OR audio), not both. Send `.image(...)` parts only to a vision-capable model and `.audio(...)` parts only to an audio-capable model. Verify on the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up the input.
+**Multimodality is model-specific.** Most multimodal models ship as text + one other modality (vision OR audio), not both. Send image parts (Swift `ChatMessageContent.fromJPEGData(_:)` / Kotlin `ImageUtils.fromBitmap(...)`) only to a vision-capable model, and audio parts (Swift `ChatMessageContent.fromWAVData(_:)` / Kotlin `ChatMessageContent.Audio(...)`) only to an audio-capable model. Verify on the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up the input.
 </Info>
 
 <Tabs>
@@ -170,20 +175,26 @@ Define `runtimeDispatch(_:)` as your tool-call → result router: validate argum
     // Vision-capable model
     let imageMessage = ChatMessage(
       role: .user,
-      content: [.text("Describe what you see."), .image(jpegData)]
+      content: [.text("Describe what you see."), ChatMessageContent.fromJPEGData(jpegData)],
+      reasoningContent: nil,
+      functionCalls: nil
     )
 
     // Audio-capable model — WAV blob
     let audioMessage = ChatMessage(
       role: .user,
-      content: [.text("Transcribe."), .audio(wavData)]
+      content: [.text("Transcribe."), ChatMessageContent.fromWAVData(wavData)],
+      reasoningContent: nil,
+      functionCalls: nil
     )
 
     // Audio-capable model — raw float32 PCM samples (no WAV re-encode)
     let pcmMessage = ChatMessage(
       role: .user,
       content: [.text("How's my pronunciation?"),
-                ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)]
+                ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)],
+      reasoningContent: nil,
+      functionCalls: nil
     )
     ```
   </Tab>
@@ -272,10 +283,14 @@ A `ChatViewModel` that loads the model, registers a tool, drives generation, and
             generationTask = Task { [weak self] in
                 defer { Task { @MainActor in self?.isGenerating = false } }
                 do {
-                    let userMessage = ChatMessage(role: .user, content: [.text(text)])
+                    let userMessage = ChatMessage(role: .user, textContent: text)
+                    let options = GenerationOptions()
+                        .with(temperature: 0.3)
+                        .with(minP: 0.15)
+                        .with(repetitionPenalty: 1.05)
                     for try await response in conversation.generateResponse(
                         message: userMessage,
-                        generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
+                        generationOptions: options
                     ) {
                         await MainActor.run { self?.handle(response) }
                     }
@@ -314,11 +329,11 @@ A `ChatViewModel` that loads the model, registers a tool, drives generation, and
     import androidx.lifecycle.viewModelScope
     import ai.liquid.leap.Conversation
     import ai.liquid.leap.GenerationOptions
-    import ai.liquid.leap.MessageResponse
+    import ai.liquid.leap.message.MessageResponse
     import ai.liquid.leap.ModelRunner
     import ai.liquid.leap.message.ChatMessage
     import ai.liquid.leap.message.ChatMessageContent
-    import ai.liquid.leap.model_downloader.LeapModelDownloader
+    import ai.liquid.leap.downloader.LeapModelDownloader
     import kotlinx.coroutines.*
     import kotlinx.coroutines.flow.*
 
@@ -430,14 +445,14 @@ A `ChatViewModel` that loads the model, registers a tool, drives generation, and
     - **Min SDK 31** (Android 12).
     - Use a real device for testing — the emulator may crash loading model bundles.
     - `LeapModelDownloader` (the Android one) requires `POST_NOTIFICATIONS` at runtime on Android 13+ and a few manifest entries — see [Quick Start → Install the SDK](./quick-start#2-install-the-sdk).
-    - Background downloads use WorkManager + a foreground service; the SDK ships notification configuration via `LeapModelDownloaderNotificationConfig`.
+    - Background prefetch uses `requestDownloadModel(...)`, which enqueues the WorkManager downloader and runs it as a foreground worker while files transfer. The SDK ships notification configuration via `LeapModelDownloaderNotificationConfig`.
     - For most cases, hold the runner in a `ViewModel` with `viewModelScope`. Unload via `runBlocking(Dispatchers.IO) { runner.unload() }` in `onCleared()`.
   </Tab>
   <Tab title="JVM / Linux native / Windows native">
     - JVM: JDK 11+. No `Context` parameter, no foreground service, no notifications — `LeapDownloader` is a simple async fetcher with a configurable `saveDir`.
     - Linux native runtime: glibc **2.34+** (Ubuntu 22.04, Debian 12, RHEL 9 or newer). Older hosts fail at process start.
     - Windows native: Windows 10+. DLLs co-locate next to the `.exe` (Windows' standard search order finds them).
-    - **Pin to 0.10.5+** for Kotlin/Native — earlier 0.10.x releases have unresolved cinterop / linker issues that prevent producing a working executable. See [Desktop & Native Platforms](./desktop-platforms).
+    - **Pin to 0.10.7 or newer** for Kotlin/Native — earlier 0.10.x releases (0.10.0, 0.10.1) have unresolved cinterop / linker issues that prevent producing a working executable; the fixes shipped in the 0.10.4.x point releases (SPM) and v0.10.6 / v0.10.7 (Android-SDK repo). See [Desktop & Native Platforms](./desktop-platforms).
   </Tab>
 </Tabs>
 
diff --git a/deployment/on-device/sdk/cloud-ai-comparison.mdx b/deployment/on-device/sdk/cloud-ai-comparison.mdx
index a1298632..06be32e4 100644
--- a/deployment/on-device/sdk/cloud-ai-comparison.mdx
+++ b/deployment/on-device/sdk/cloud-ai-comparison.mdx
@@ -237,10 +237,10 @@ Both LEAP and the OpenAI Python streaming client run inside an async context. Th
 
 | Concept | OpenAI | LEAP |
 |---|---|---|
-| Role-tagged messages | `{"role": "user", "content": "..."}` | `ChatMessage(role: .user, content: [.text("...")])` |
-| Streaming responses | `stream=True` iterator | `AsyncThrowingStream` (Swift) / `Flow` (Kotlin) |
-| Function calling | Tool definitions + `tool_calls` field | `registerFunction(LeapFunction)` + `MessageResponse.functionCalls` |
-| Structured output | `response_format = json_schema` | `GenerationOptions.setResponseFormat(type:)` |
+| Role-tagged messages | `{"role": "user", "content": "..."}` | `ChatMessage(role: .user, textContent: "...")` |
+| Streaming responses | `stream=True` iterator | `SkieSwiftFlow<MessageResponse>` (Swift, iterable with `for try await`) / `Flow<MessageResponse>` (Kotlin) |
+| Function calling | Tool definitions + `tool_calls` field | `registerFunction(LeapFunction)` + `MessageResponse.FunctionCalls` |
+| Structured output | `response_format = json_schema` | Swift `options.with(jsonSchema: T.jsonSchema())` / Kotlin `setResponseFormatType<T>()` |
 | Token usage stats | `usage` object on completion | `Complete.stats` (`promptTokens`, `completionTokens`, `tokenPerSecond`) |
 
 ## What's different
diff --git a/deployment/on-device/sdk/constrained-generation.mdx b/deployment/on-device/sdk/constrained-generation.mdx
index a92bf2b7..11573bf9 100644
--- a/deployment/on-device/sdk/constrained-generation.mdx
+++ b/deployment/on-device/sdk/constrained-generation.mdx
@@ -3,7 +3,7 @@ title: "Constrained Generation"
 description: "Generate structured JSON output with compile-time validation — same approach on every platform."
 ---
 
-Constrained generation forces the model to emit JSON matching a schema. Use the language's native facility — Swift macros (`@Generatable` / `@Guide`) or Kotlin annotations (`@Generatable` / `@Guide`) — to define the structure, then set it on `GenerationOptions`. The schema is computed at compile time (Swift) or via reflection at load time (Kotlin), and the model's output decodes directly into your type.
+Constrained generation forces the model to emit JSON matching a schema. Use the language's native facility — Swift macros (`@Generatable` / `@Guide`) or Kotlin annotations (`@Generatable` / `@Guide`) — to define the structure, then set it on `GenerationOptions`. The schema is computed at compile time (Swift) or built from the `kotlinx.serialization` descriptor at runtime (Kotlin), and the model's output decodes directly into your type.
 
 ## Define the structured type
 
@@ -13,6 +13,7 @@ Constrained generation forces the model to emit JSON matching a schema. Use the
 
     ```swift
     import LeapModelDownloader
+    import LeapSDKMacros
 
     @Generatable("A joke with metadata")
     struct Joke: Codable {
@@ -35,16 +36,26 @@ Constrained generation forces the model to emit JSON matching a schema. Use the
     </Info>
   </Tab>
   <Tab title="Kotlin (all platforms)">
-    `@Generatable` and `@Guide` are runtime annotations on Kotlin `data class` declarations. All properties must be declared in the primary constructor.
+    `@Generatable` and `@Guide` are `@SerialInfo` annotations applied to `@Serializable` Kotlin `data class` declarations. All properties must be declared in the primary constructor, and the class itself must carry `@kotlinx.serialization.Serializable` so the schema generator can read its descriptor.
 
     ```kotlin
     package ai.liquid.leap.structuredoutput
 
+    @kotlinx.serialization.SerialInfo
+    @Target(AnnotationTarget.CLASS)
+    @Retention(AnnotationRetention.RUNTIME)
     annotation class Generatable(val description: String)
+
+    @kotlinx.serialization.SerialInfo
+    @Target(AnnotationTarget.PROPERTY)
+    @Retention(AnnotationRetention.RUNTIME)
     annotation class Guide(val description: String)
     ```
 
     ```kotlin
+    import kotlinx.serialization.Serializable
+
+    @Serializable
     @Generatable(description = "Facts about a city")
     data class CityFact(
         @Guide(description = "Name of the city")
@@ -60,6 +71,8 @@ Constrained generation forces the model to emit JSON matching a schema. Use the
         val placeOfInterests: List<String>,
     )
     ```
+
+    The `@Serializable` annotation is required — `JSONSchemaGenerator.getJSONSchema<T>()` resolves the type via `kotlinx.serialization`'s `serializer<T>()` and throws `LeapGeneratableSchematizationException("Type must be @Serializable to generate JSON Schema")` if the type isn't serializable.
   </Tab>
 </Tabs>
 
@@ -68,10 +81,18 @@ Constrained generation forces the model to emit JSON matching a schema. Use the
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    var options = GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
-    try options.setResponseFormat(type: Joke.self)
+    let options = GenerationOptions()
+        .with(temperature: 0.3)
+        .with(minP: 0.15)
+        .with(repetitionPenalty: 1.05)
+        .with(jsonSchema: Joke.jsonSchema())
 
-    let message = ChatMessage(role: .user, content: [.text("Tell me a programming joke")])
+    let message = ChatMessage(
+        role: .user,
+        content: [.text("Tell me a programming joke")],
+        reasoningContent: nil,
+        functionCalls: nil
+    )
 
     for try await response in conversation.generateResponse(
         message: message,
@@ -91,8 +112,11 @@ Constrained generation forces the model to emit JSON matching a schema. Use the
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
+    import kotlinx.serialization.json.Json
+    import kotlinx.serialization.json.JsonObject
+
     val options = GenerationOptions.build {
-        setResponseFormatType(CityFact::class)
+        setResponseFormatType<CityFact>()
         temperature = 0.3f
         minP = 0.15f
         repetitionPenalty = 1.05f
@@ -102,14 +126,15 @@ Constrained generation forces the model to emit JSON matching a schema. Use the
         .onEach { response ->
             if (response is MessageResponse.Complete) {
                 val jsonContent = (response.fullMessage.content.first() as ChatMessageContent.Text).text
-                val cityFact: CityFact = GeneratableFactory.createFromJSONObject(JSONObject(jsonContent))
+                val kxObj = Json.parseToJsonElement(jsonContent) as JsonObject
+                val cityFact = GeneratableFactory.createFromJsonObject<CityFact>(kxObj)
                 println(cityFact)
             }
         }
         .collect()
     ```
 
-    If the model's JSON doesn't deserialize cleanly into the data class, `GeneratableFactory.createFromJSONObject` throws `LeapGeneratableDeserializationException`.
+    If the model's JSON doesn't deserialize cleanly into the data class, `GeneratableFactory.createFromJsonObject` throws `LeapGeneratableDeserializationException`.
   </Tab>
 </Tabs>
 
@@ -120,16 +145,20 @@ Some models do better when the JSON Schema is also included in the prompt text.
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    let schemaString = try JSONSchemaGenerator.getJSONSchema(for: Joke.self)
+    let schemaString = JSONSchemaGenerator.getJSONSchema(for: Joke.self)
     let message = ChatMessage(
         role: .user,
-        content: [.text("Tell me a programming joke following this JSON Schema: \(schemaString)")]
+        content: [.text("Tell me a programming joke following this JSON Schema: \(schemaString)")],
+        reasoningContent: nil,
+        functionCalls: nil
     )
     ```
+
+    `JSONSchemaGenerator.getJSONSchema(for:)` ships in the `LeapSDKMacros` product (no `try` needed; it's non-throwing — it just forwards to the macro-synthesized `Joke.jsonSchema()`).
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
-    val jsonSchema = JSONSchemaGenerator.getJSONSchema(CityFact::class)
+    val jsonSchema = JSONSchemaGenerator.getJSONSchema<CityFact>()
     conversation.generateResponse(
         "Show the city facts about Tokyo following this JSON Schema: $jsonSchema",
         options
@@ -194,6 +223,9 @@ Composition types are supported as long as the leaf types are supported.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
+    import kotlinx.serialization.Serializable
+
+    @Serializable
     @Generatable("A recipe with ingredients and instructions")
     data class Recipe(
         @Guide("Name of the dish")
@@ -215,6 +247,7 @@ Composition types are supported as long as the leaf types are supported.
         val nutrition: NutritionInfo? = null,
     )
 
+    @Serializable
     @Generatable("Nutritional information for a recipe")
     data class NutritionInfo(
         @Guide("Calories per serving")
@@ -247,7 +280,7 @@ Smaller, single-responsibility types produce better output than sprawling struct
 
 ### Lower temperature for structured output
 
-Temperature `0.3–0.5` typically improves adherence to the schema. The default `0.7` is biased toward conversational variation that doesn't help when you need parseable JSON.
+Temperature `0.1–0.3` typically improves adherence to the schema — high-temperature sampling adds variation that doesn't help when you need parseable JSON. Use the per-model defaults the LFM model cards recommend (e.g. `0.1` for instruct/VL, `0.3` for LFM2 text) and lower from there if the model strays.
 
 ### Validate the decoded output
 
@@ -269,10 +302,14 @@ Even with constrained generation, you should handle parse failures gracefully. T
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
-    fun <T : Any> parse(jsonText: String, kClass: KClass<T>): T? = try {
-        GeneratableFactory.createFromJSONObject(JSONObject(jsonText)) as T
+    import kotlinx.serialization.json.Json
+    import kotlinx.serialization.json.JsonObject
+
+    inline fun <reified T : Any> parse(jsonText: String): T? = try {
+        val kxObj = Json.parseToJsonElement(jsonText) as JsonObject
+        GeneratableFactory.createFromJsonObject<T>(kxObj)
     } catch (e: LeapGeneratableDeserializationException) {
-        Log.e(TAG, "Failed to decode response as ${kClass.simpleName}", e)
+        Log.e(TAG, "Failed to decode response as ${T::class.simpleName}", e)
         null
     }
     ```
@@ -281,8 +318,8 @@ Even with constrained generation, you should handle parse failures gracefully. T
 
 ## How it works
 
-1. **Compile/load time** — `@Generatable` produces a JSON Schema for your type. (Swift: compile-time macro; Kotlin: reflective build at load time.)
-2. **Configuration** — `GenerationOptions.setResponseFormat(type:)` / `setResponseFormatType(...)` installs the schema as `jsonSchemaConstraint` on the generation options.
+1. **Compile/load time** — `@Generatable` produces a JSON Schema for your type. (Swift: compile-time macro emits `jsonSchema()`; Kotlin: built at runtime from the `kotlinx.serialization` descriptor.)
+2. **Configuration** — Swift `options.with(jsonSchema: T.jsonSchema())` (or `GenerationOptionsCompat.setResponseFormat(jsonSchema:)`) / Kotlin `setResponseFormatType<T>()` installs the schema as `jsonSchemaConstraint` on the generation options.
 3. **Generation** — the SDK constrains decoding so only tokens that produce schema-valid JSON are emitted. The model's output is guaranteed to parse.
 
 ## Error handling
diff --git a/deployment/on-device/sdk/conversation-generation.mdx b/deployment/on-device/sdk/conversation-generation.mdx
index 6925d55a..e9815a26 100644
--- a/deployment/on-device/sdk/conversation-generation.mdx
+++ b/deployment/on-device/sdk/conversation-generation.mdx
@@ -60,30 +60,34 @@ Hold a strong reference for as long as you need to perform generations, then cal
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    `Conversation` is a Kotlin `interface` bridged to Swift as a protocol — the get-only properties surface as `{ get }` in Swift. The generation methods return a SKIE-bridged `SkieSwiftFlow<MessageResponse>` (iterable with `for try await`):
+
     ```swift
-    public class Conversation {
-      public let modelRunner: ModelRunner
-      public private(set) var history: [ChatMessage]
-      public private(set) var functions: [LeapFunction]
-      public private(set) var isGenerating: Bool
-
-      public func registerFunction(_ function: LeapFunction)
-      public func registerFunctions(_ functions: [LeapFunction])
-      public func appendToHistory(_ message: ChatMessage)
-      public func removeLastMessage()
-      public func exportToJSON() throws -> [[String: Any]]
-
-      public func generateResponse(
+    public protocol Conversation {
+      var modelRunner: ModelRunner { get }
+      var history: [ChatMessage] { get }
+      var functions: [LeapFunction] { get }
+      var isGenerating: Bool { get }
+
+      func registerFunction(function: LeapFunction)
+      func registerFunctions(functions: [LeapFunction])
+      func appendToHistory(message: ChatMessage)
+      func removeLastMessage()
+      func exportToJSON() -> String
+
+      func generateResponse(
         userTextMessage: String,
-        generationOptions: GenerationOptions? = nil
-      ) -> AsyncThrowingStream<MessageResponse, Error>
+        generationOptions: GenerationOptions?
+      ) -> SkieSwiftFlow<MessageResponse>
 
-      public func generateResponse(
+      func generateResponse(
         message: ChatMessage,
-        generationOptions: GenerationOptions? = nil
-      ) -> AsyncThrowingStream<MessageResponse, Error>
+        generationOptions: GenerationOptions?
+      ) -> SkieSwiftFlow<MessageResponse>
     }
     ```
+
+    Kotlin parameter defaults don't propagate through Kotlin/Native, so the Swift method labels match the Kotlin parameter names (`function:`, `functions:`, `message:`) and `generationOptions` must be passed explicitly. A `ConvenienceExtensions.swift` overlay adds `generateResponse(message:)` without the options argument for the common case.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
@@ -108,8 +112,6 @@ Hold a strong reference for as long as you need to perform generations, then cal
           message: ChatMessage,
           generationOptions: GenerationOptions? = null
       ): Flow<MessageResponse>
-
-      fun exportToJSONArray(): JSONArray
     }
     ```
   </Tab>
@@ -123,7 +125,7 @@ Hold a strong reference for as long as you need to perform generations, then cal
 
 - **`history`** — a snapshot copy of the chat messages. Mutations don't affect generation. Once the stream emits `Complete`, `history` includes the final assistant reply.
 - **`isGenerating`** — `true` while a generation is in flight. Starting a second generation while one is running is blocked.
-- **`functions`** (Swift only field, registered via `registerFunction` on both platforms) — tool definitions the model may invoke.
+- **`functions`** — tool definitions the model may invoke. Registered through `registerFunction(_:)` / `registerFunctions(_:)` on both platforms.
 
 ### Streaming generation
 
@@ -132,13 +134,17 @@ The async stream is the recommended way to drive generation — both platforms e
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    let user = ChatMessage(role: .user, content: [.text("Hello! What can you do?")])
+    let user = ChatMessage(role: .user, textContent: "Hello! What can you do?")
+    let options = GenerationOptions()
+      .with(temperature: 0.3)
+      .with(minP: 0.15)
+      .with(repetitionPenalty: 1.05)
 
     Task {
       do {
         for try await response in conversation.generateResponse(
           message: user,
-          generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
+          generationOptions: options
         ) {
           switch onEnum(of: response) {
           case .chunk(let c):
@@ -148,6 +154,10 @@ The async stream is the recommended way to drive generation — both platforms e
           case .functionCalls(let payload):
             handleFunctionCalls(payload.functionCalls)
           case .audioSample(let audio):
+            // `audio.samples` is a `KotlinFloatArray` from Kotlin/Native — bridge to
+            // `[Float]` via NSData if your renderer expects a Swift array:
+            //   let nsData = LeapSDK.ArrayConversionsKt.floatArrayToNSData(array: audio.samples)
+            //   let floats = nsData.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) }
             audioRenderer.enqueue(audio.samples, sampleRate: Int(audio.sampleRate))
           case .complete(let completion):
             let text = completion.fullMessage.content.compactMap { part -> String? in
@@ -216,18 +226,23 @@ The async stream is the recommended way to drive generation — both platforms e
 
 ### Export chat history
 
-Both platforms expose a serializer compatible with OpenAI's chat-completions message format. Useful for persistence, analytics, or replaying conversations through a cloud fallback.
+Persisting, replaying, or shipping the conversation to a cloud fallback all boil down to serializing `conversation.history`. Swift exposes `exportToJSON()` (returns a JSON string in OpenAI chat-completions shape); Kotlin uses `kotlinx.serialization` (`ChatMessage` and `ChatMessageContent` are `@Serializable`).
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    let payload: [[String: Any]] = try conversation.exportToJSON()
+    let jsonString: String = conversation.exportToJSON()
     ```
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
-    val payload: JSONArray = conversation.exportToJSONArray()
+    import kotlinx.serialization.json.Json
+    import kotlinx.serialization.encodeToString
+
+    val jsonString = Json.encodeToString(conversation.history)
     ```
+
+    Add `org.jetbrains.kotlinx:kotlinx-serialization-json` to your dependencies — see [Utilities → Serialization](./utilities#serialization) for the round-trip pattern.
   </Tab>
 </Tabs>
 
@@ -252,11 +267,11 @@ A sealed type with one case per kind of incremental output the engine emits.
   <Tab title="Kotlin (all platforms)">
     ```kotlin
     sealed interface MessageResponse {
-      class Chunk(val text: String) : MessageResponse
-      class ReasoningChunk(val reasoning: String) : MessageResponse
-      class FunctionCalls(val functionCalls: List<LeapFunctionCall>) : MessageResponse
-      class AudioSample(val samples: FloatArray, val sampleRate: Int) : MessageResponse
-      class Complete(
+      data class Chunk(val text: String) : MessageResponse
+      data class ReasoningChunk(val reasoning: String) : MessageResponse
+      data class FunctionCalls(val functionCalls: List<LeapFunctionCall>) : MessageResponse
+      data class AudioSample(val samples: FloatArray, val sampleRate: Int) : MessageResponse
+      data class Complete(
         val fullMessage: ChatMessage,
         val finishReason: GenerationFinishReason,
         val stats: GenerationStats?,
@@ -270,7 +285,7 @@ A sealed type with one case per kind of incremental output the engine emits.
 - **`ReasoningChunk`** — thinking-style tokens emitted by reasoning models (wrapped between `<think>` / `</think>` upstream). Only fires when `GenerationOptions.enableThinking = true` *and* the model supports it.
 - **`FunctionCalls`** — one or more tool invocations the model wants you to execute. See [Function Calling](./function-calling).
 - **`AudioSample`** — float32 mono PCM frames from audio-capable checkpoints. The sample rate is constant for a generation; route the frames to a renderer.
-- **`Complete`** — final marker. `fullMessage` is the assembled assistant `ChatMessage` (also present in `conversation.history`). `stats` holds token counts and `tokenPerSecond` (may be `null` on some backends).
+- **`Complete`** — final marker. `fullMessage` is the assembled assistant `ChatMessage` (also present in `conversation.history`). `stats` is nullable (`GenerationStats?`); when present it holds `promptTokens`, `completionTokens`, `totalTokens`, `tokenPerSecond` (non-nullable `Float`), and `cachedPromptTokens`.
 
 ### `GenerationFinishReason`
 
@@ -290,33 +305,53 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    `GenerationOptions` is a Kotlin `data class` bridged into Swift. Kotlin parameter defaults don't survive the ObjC bridge, so the canonical Swift idiom is the parameterless init plus chained `.with(...)` builders from `ConvenienceExtensions.swift`:
+
     ```swift
-    public struct GenerationOptions {
+    public class GenerationOptions {
       public var temperature: Float?
       public var topP: Float?
       public var minP: Float?
-      public var topK: Int32?
       public var repetitionPenalty: Float?
+      public var topK: Int32?
       public var rngSeed: Int64?
-      public var maxTokens: Int32?
       public var jsonSchemaConstraint: String?
+      public var functionCallParser: LeapFunctionCallParser?
       public var injectSchemaIntoPrompt: Bool        // default true
-      public var functionCallParser: LeapFunctionCallParserProtocol?
+      public var maxTokens: Int32?
       public var inlineThinkingTags: Bool            // default false
       public var enableThinking: Bool                // default false
       public var extras: String?
 
-      public init(/* all fields as optional kwargs */)
-      public mutating func setResponseFormat<T: GeneratableType>(type: T.Type) throws
+      public convenience init()                      // builder entry point
+
+      // Builders (chainable):
+      public func with(temperature: Float) -> GenerationOptions
+      public func with(topP: Float) -> GenerationOptions
+      public func with(minP: Float) -> GenerationOptions
+      public func with(repetitionPenalty: Float) -> GenerationOptions
+      public func with(topK: Int32) -> GenerationOptions
+      public func with(rngSeed: Int64) -> GenerationOptions
+      public func with(jsonSchema: String) -> GenerationOptions
+      public func with(maxTokens: Int32) -> GenerationOptions
+      public func with(injectSchemaIntoPrompt: Bool) -> GenerationOptions
+      public func with(inlineThinkingTags: Bool) -> GenerationOptions
+      public func with(enableThinking: Bool) -> GenerationOptions
     }
     ```
 
+    For constrained generation, pass the schema string produced by the `@Generatable` macro into the JSON-schema builder:
+
     ```swift
-    var options = GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05, maxTokens: 512)
-    try options.setResponseFormat(type: CityFact.self)
+    let options = GenerationOptions()
+        .with(temperature: 0.3)
+        .with(minP: 0.15)
+        .with(repetitionPenalty: 1.05)
+        .with(maxTokens: 512)
+        .with(jsonSchema: CityFact.jsonSchema())
     ```
 
-    Builder style is available too — chain `.with(temperature:)`, `.with(topP:)`, `.with(maxTokens:)`, etc.
+    The Apple-only `GenerationOptionsCompat` sibling type (used by legacy `Leap.load(...)` flows) additionally exposes `setResponseFormat(jsonSchema: String)` — see [Constrained Generation](./constrained-generation).
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
@@ -336,7 +371,6 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per
         var extras: String? = null,
     ) {
       inline fun <reified T : Any> setResponseFormatType()
-      fun setResponseFormatType(kClass: KClass<*>)
 
       companion object {
         fun build(buildAction: GenerationOptions.() -> Unit): GenerationOptions
@@ -350,7 +384,7 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per
         minP = 0.15f
         repetitionPenalty = 1.05f
         maxTokens = 512
-        setResponseFormatType(CityFact::class)
+        setResponseFormatType<CityFact>()
     }
     ```
   </Tab>
@@ -359,7 +393,7 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per
 - **Sampling fields** (`temperature`, `topP`, `minP`, `topK`, `repetitionPenalty`) — standard sampling knobs. Use the values from the LEAP bundle manifest (`sampling_parameters` under `generation_time_parameters` in each model's `<Quant>.json` on [LiquidAI/LeapBundles](https://huggingface.co/LiquidAI/LeapBundles)); they're tuned per checkpoint by the training team and differ from the HF model card defaults (the manifest values are the llama.cpp-engine path the SDK runs). Arbitrary "0.7" defaults from generic AI tutorials usually underperform.
 - **`rngSeed`** — set for deterministic / reproducible output (testing, debugging). Default is non-deterministic.
 - **`maxTokens`** — cap the response length. The model stops after this many completion tokens (prompt tokens don't count). Defaults to "until EOS or context limit." Useful for cost control with constrained output.
-- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the higher-level `setResponseFormat(type:)` / `setResponseFormatType(...)` helpers with `@Generatable` types. See [Constrained Generation](./constrained-generation).
+- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the higher-level helpers — Swift `options.with(jsonSchema: T.jsonSchema())` (or `GenerationOptionsCompat.setResponseFormat(jsonSchema:)`) / Kotlin `setResponseFormatType<T>()` — with `@Generatable` types. See [Constrained Generation](./constrained-generation).
 - **`injectSchemaIntoPrompt`** — when `true` (default), the schema is appended to the system message for semantic guidance *in addition* to the structural constraint at decode time. Set `false` to skip the prompt injection (matches `llama-server` grammar mode) — saves prompt tokens for large schemas.
 - **`functionCallParser`** — picks the tokenizer expected by the model. `LFMFunctionCallParser` (default) for Liquid Foundation Models; `HermesFunctionCallParser()` for Hermes/Qwen3 formats; `null` to receive raw tool-call text in `Chunk`s.
 - **`enableThinking`** — turn on reasoning mode for models that support it (e.g. LFM2.5-Thinking). Reasoning tokens arrive as `ReasoningChunk`s.
diff --git a/deployment/on-device/sdk/desktop-platforms.mdx b/deployment/on-device/sdk/desktop-platforms.mdx
index 0659b913..26ca1c65 100644
--- a/deployment/on-device/sdk/desktop-platforms.mdx
+++ b/deployment/on-device/sdk/desktop-platforms.mdx
@@ -49,13 +49,13 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux
     }
 
     dependencies {
-        implementation("ai.liquid.leap:leap-sdk:0.10.6")
+        implementation("ai.liquid.leap:leap-sdk:0.10.7")
 
-        // Optional: OpenAI-compatible cloud chat client
-        // implementation("ai.liquid.leap:leap-openai-client:0.10.6")
+        // Optional: OpenAI-compatible cloud chat client (JVM support added in v0.10.7)
+        // implementation("ai.liquid.leap:leap-openai-client:0.10.7")
 
         // Optional: Compose Multiplatform voice widget (also runs on JVM)
-        // implementation("ai.liquid.leap:leap-ui:0.10.6")
+        // implementation("ai.liquid.leap:leap-ui:0.10.7")
     }
 
     application {
@@ -75,7 +75,7 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux
     }
 
     dependencies {
-        implementation 'ai.liquid.leap:leap-sdk:0.10.6'
+        implementation 'ai.liquid.leap:leap-sdk:0.10.7'
     }
 
     application {
@@ -89,7 +89,7 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux
       <dependency>
         <groupId>ai.liquid.leap</groupId>
         <artifactId>leap-sdk-jvm</artifactId>
-        <version>0.10.6</version>
+        <version>0.10.7</version>
       </dependency>
     </dependencies>
     ```
@@ -107,9 +107,9 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux
 `LeapDownloader` is the cross-platform downloader. Point it at a writable directory and call `loadModel(modelName:, quantizationType:)` for manifest-based downloads, or `loadSimpleModel(model: ModelSource(...))` for a GGUF you already have on disk.
 
 ```kotlin
-import ai.liquid.leap.LeapDownloader
-import ai.liquid.leap.LeapDownloaderConfig
-import ai.liquid.leap.ModelSource
+import ai.liquid.leap.manifest.LeapDownloader
+import ai.liquid.leap.manifest.LeapDownloaderConfig
+import ai.liquid.leap.manifest.ModelSource
 import ai.liquid.leap.message.ChatMessage
 import ai.liquid.leap.message.MessageResponse
 import kotlinx.coroutines.runBlocking
@@ -132,7 +132,7 @@ fun main() = runBlocking {
     )
 
     conversation.generateResponse(
-        ChatMessage.user("What is the capital of France?")
+        ChatMessage(ChatMessage.Role.USER, "What is the capital of France?")
     ).collect { response ->
         when (response) {
             is MessageResponse.Chunk -> print(response.text)
@@ -163,7 +163,7 @@ Pass `mmprojPath = "..."` for vision models, or `audioDecoderPath = "..."` (and
 
 ### Runtime expectations
 
-- **Memory.** Plan for at least `model_size_on_disk + 1 GiB` of free RAM. With `use_mmap=true` (the default since v0.10.4 — see the [changelog](/deployment/on-device/leap-sdk-changelog#mmap-default)) the OS pages weights in lazily, so resident memory grows as the model is exercised rather than at load time.
+- **Memory.** Plan for at least `model_size_on_disk + 1 GiB` of free RAM. With `use_mmap=true` (the default since v0.10.4 — see the [changelog](/deployment/on-device/leap-sdk-changelog#memory-mapped-model-loading-by-default)) the OS pages weights in lazily, so resident memory grows as the model is exercised rather than at load time.
 - **Threads.** The engine defaults to a sensible CPU thread count for the host (`CpuThreadAdvisor.getRecommendedThreadCount()`). Override by passing `ModelLoadingOptions(cpuThreads = N)` through `loadModel(...)` if you need to share the box with other workloads.
 - **GPU acceleration.** Available on macOS (Metal, automatic) and on Linux JVM builds with a CUDA-capable GPU when the matching native variant is on the classpath. GPU offload is configured through the `extras` JSON payload on `ModelLoadingOptions` (advanced use only — most desktop workloads run pure-CPU).
 
@@ -194,11 +194,11 @@ dependencyResolutionManagement {
 // build.gradle.kts
 plugins {
     kotlin("multiplatform") version "2.3.20"
-    id("ai.liquid.leap.nativelibs") version "0.10.6"
+    id("ai.liquid.leap.nativelibs") version "0.10.7"
 }
 
 dependencies {
-    implementation("ai.liquid.leap:leap-sdk:0.10.6")
+    implementation("ai.liquid.leap:leap-sdk:0.10.7")
 }
 
 kotlin {
@@ -216,7 +216,7 @@ Build with the usual Kotlin/Native link tasks:
 The resulting binary lives at `build/bin/linuxX64/releaseExecutable/`, alongside the `.so` files the plugin installed (`libinference_engine.so`, `libinference_engine_llamacpp_backend.so`, `libie_zip.so`, plus their transitive dependencies). Keep them co-located when you ship — the cinterop manifest bakes `-rpath=$ORIGIN` into the binary so the dynamic linker resolves siblings.
 
 <Warning>
-**Versions 0.10.0, 0.10.1, and 0.10.2 cannot link a working Kotlin/Native executable** due to three separate Maven Central / cinterop issues that have all been fixed in 0.10.5. Maven Central is immutable per GAV, so the older versions cannot be republished — pin to **0.10.5 or newer**. See [the changelog](/deployment/on-device/leap-sdk-changelog#kotlin-native-linux-windows) for the full story.
+**Versions 0.10.0 and 0.10.1 cannot link a working Kotlin/Native executable** due to Maven Central / cinterop issues; v0.10.2 and v0.10.3 were never published, and the fixes shipped across v0.10.4.x, v0.10.6, and v0.10.7. Maven Central is immutable per GAV, so the older versions cannot be republished — pin to **0.10.7 or newer**. See [the changelog](/deployment/on-device/leap-sdk-changelog#kotlin-native-linux-windows) for the full story.
 </Warning>
 
 ### Manual recipe (if you can't apply the plugin)
@@ -227,7 +227,7 @@ plugins {
 }
 
 dependencies {
-    implementation("ai.liquid.leap:leap-sdk:0.10.6")
+    implementation("ai.liquid.leap:leap-sdk:0.10.7")
 }
 
 val nativesDir = layout.buildDirectory.dir("bin/linuxX64/releaseExecutable")
@@ -241,7 +241,7 @@ kotlin {
 
 val leapSdkNatives by configurations.creating
 dependencies {
-    leapSdkNatives("ai.liquid.leap:leap-sdk-linuxx64:0.10.6:natives@zip")
+    leapSdkNatives("ai.liquid.leap:leap-sdk-linuxx64:0.10.7:natives@zip")
 }
 
 val installLeapNatives by tasks.registering(Copy::class) {
@@ -259,8 +259,8 @@ tasks.named("linkReleaseExecutableLinuxX64") { dependsOn(installLeapNatives) }
 
 The Maven coordinates for the `-natives.zip` artifacts:
 
-- `ai.liquid.leap:leap-sdk-linuxx64:0.10.6:natives@zip`
-- `ai.liquid.leap:leap-sdk-linuxarm64:0.10.6:natives@zip`
+- `ai.liquid.leap:leap-sdk-linuxx64:0.10.7:natives@zip`
+- `ai.liquid.leap:leap-sdk-linuxarm64:0.10.7:natives@zip`
 
 ## Windows native (MinGW x64)
 
@@ -269,11 +269,11 @@ The same Kotlin/Native flow works for Windows x86_64 via the MinGW-w64 toolchain
 ```kotlin
 plugins {
     kotlin("multiplatform") version "2.3.20"
-    id("ai.liquid.leap.nativelibs") version "0.10.6"
+    id("ai.liquid.leap.nativelibs") version "0.10.7"
 }
 
 dependencies {
-    implementation("ai.liquid.leap:leap-sdk:0.10.6")
+    implementation("ai.liquid.leap:leap-sdk:0.10.7")
 }
 
 kotlin {
@@ -291,7 +291,7 @@ The plugin installs `inference_engine.dll`, `libinference_engine_llamacpp_backen
 
 The Maven coordinates for the `-natives.zip` artifact:
 
-- `ai.liquid.leap:leap-sdk-mingwx64:0.10.6:natives@zip`
+- `ai.liquid.leap:leap-sdk-mingwx64:0.10.7:natives@zip`
 
 <Info>
 **Building from macOS or Linux for Windows?** Kotlin/Native does not support cross-compiling to MinGW from a non-Windows host as of 2.3.20 — the build must run on Windows (native or in CI). GitHub Actions `windows-latest` works without extra setup.
@@ -316,12 +316,12 @@ Identical Swift API to iOS — same `ModelDownloader`, `Conversation`, `ChatMess
 ```swift
 .binaryTarget(
   name: "LeapSDK",
-  url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapSDK.xcframework.zip",
-  checksum: "ae9ecddbe5dc226ddd4ec8fe42178b721faeab71a20b3f14efceaae5a2495b7e"
+  url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapSDK.xcframework.zip",
+  checksum: "6f2721aa45d7555646f78cbcaedb57aba3d869f56b24d681ad332846e131ae3d"
 )
 ```
 
-The XCFramework slice for macOS ARM64 is in the same zip as the iOS slices. Mac Catalyst (`x86_64-apple-ios13.0-macabi`, `arm64-apple-ios13.0-macabi`) is also included.
+The XCFramework slice for macOS ARM64 is in the same zip as the iOS slices. The released framework ships exactly three slices — `ios-arm64`, `ios-arm64-simulator`, `macos-arm64`; **Mac Catalyst is not supported**, and the iOS simulator slice is ARM64-only (Intel-Mac simulator hosts cannot run it).
 
 ### From Kotlin (JVM, Compose for Desktop)
 
@@ -329,8 +329,8 @@ If you're targeting macOS as a JVM host — for example with Compose Multiplatfo
 
 ```kotlin
 dependencies {
-    implementation("ai.liquid.leap:leap-sdk:0.10.6")
-    implementation("ai.liquid.leap:leap-ui:0.10.6") // Compose voice widget runs on JVM too
+    implementation("ai.liquid.leap:leap-sdk:0.10.7")
+    implementation("ai.liquid.leap:leap-ui:0.10.7") // Compose voice widget runs on JVM too
 }
 ```
 
diff --git a/deployment/on-device/sdk/function-calling.mdx b/deployment/on-device/sdk/function-calling.mdx
index e3a016d1..1bec418e 100644
--- a/deployment/on-device/sdk/function-calling.mdx
+++ b/deployment/on-device/sdk/function-calling.mdx
@@ -19,23 +19,28 @@ Vision and audio-capable models require companion files. Bundles embed these ref
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    The Kotlin `LeapFunction` / `LeapFunctionParameter` constructors carry `@ObjCName` annotations on `description:`, so the Swift labels are `functionDescription:` and `parameterDescription:`. `LeapFunctionParameter`'s `optional` parameter has no Swift default — pass `optional: false` for required parameters.
+
     ```swift
     conversation.registerFunction(
-      LeapFunction(
+      function: LeapFunction(
         name: "get_weather",
-        description: "Query the weather of a city",
+        functionDescription: "Query the weather of a city",
         parameters: [
           LeapFunctionParameter(
             name: "city",
-            type: LeapFunctionParameterType.string(StringType()),
-            description: "The city to query weather for"
+            type: LeapFunctionParameterType.LeapStr(enumValues: nil, description: nil),
+            parameterDescription: "The city to query weather for",
+            optional: false
           ),
           LeapFunctionParameter(
             name: "unit",
-            type: LeapFunctionParameterType.string(
-              StringType(enumValues: ["celsius", "fahrenheit"])
+            type: LeapFunctionParameterType.LeapStr(
+              enumValues: ["celsius", "fahrenheit"],
+              description: nil
             ),
-            description: "Temperature unit (celsius or fahrenheit)"
+            parameterDescription: "Temperature unit (celsius or fahrenheit)",
+            optional: false
           ),
         ]
       )
@@ -73,24 +78,26 @@ Vision and audio-capable models require companion files. Bundles embed these ref
 Use normal identifiers — letters, underscores, and digits (not starting with a digit). Most models trained for tool use recognize that shape.
 
 <Info>
-The Kotlin parameter type classes are named with a `Leap` prefix (`LeapStr`, `LeapNum`, `LeapInt`, `LeapBool`, `LeapArr`, `LeapObj`, `LeapNull`) to avoid collisions with Kotlin's built-in `String`, `Number`, `Int`, `Boolean`, etc. The Swift bindings expose the same primitives under cleaner names (`.string(...)`, `.number(...)`, etc.) via SKIE.
+The Kotlin parameter type classes are named with a `Leap` prefix (`LeapStr`, `LeapNum`, `LeapInt`, `LeapBool`, `LeapArr`, `LeapObj`, `LeapNull`) to avoid collisions with Kotlin's built-in `String`, `Number`, `Int`, `Boolean`, etc. The Swift bindings expose the same names — there are no separate `.string(...)` / `.number(...)` aliases; SKIE preserves the Kotlin nested-class names.
 </Info>
 
 ## Handle the response
 
-Function calls arrive as `MessageResponse.functionCalls` (Swift) / `MessageResponse.FunctionCalls` (Kotlin), which wraps a list of `LeapFunctionCall`.
+Function calls arrive as the `MessageResponse.FunctionCalls` variant on both platforms, wrapping a list of `LeapFunctionCall` payloads.
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    `LeapFunctionCall` is a Kotlin `data class` bridged into Swift. `arguments` is a Kotlin `Map<String, Any?>` exposed as Swift `[String: Any]` (the ObjC bridge collapses `Any?` to non-optional `id`):
+
     ```swift
-    public struct LeapFunctionCall {
-      public let name: String
-      public let arguments: [String: Any?]
+    public class LeapFunctionCall {
+      public var name: String
+      public var arguments: [String: Any]
     }
     ```
 
     ```swift
-    let userMessage = ChatMessage(role: .user, content: [.text("What's the weather in NYC?")])
+    let userMessage = ChatMessage(role: .user, textContent: "What's the weather in NYC?")
 
     for try await response in conversation.generateResponse(message: userMessage) {
         switch onEnum(of: response) {
@@ -147,12 +154,12 @@ Append the tool's output as a `tool`-role message and continue the conversation.
     ```swift
     let toolMessage = ChatMessage(
       role: .tool,
-      content: [.text(#"{"temperature":72,"conditions":"sunny"}"#)]
+      textContent: #"{"temperature":72,"conditions":"sunny"}"#
     )
 
-    guard let current = conversation else { return }
-    let updatedHistory = current.history + [toolMessage]
-    conversation = current.modelRunner.createConversationFromHistory(history: updatedHistory)
+    let updatedHistory = conversation.history + [toolMessage]
+    let nextConversation = conversation.modelRunner.createConversationFromHistory(history: updatedHistory)
+    // Continue generation against `nextConversation`.
     ```
   </Tab>
   <Tab title="Kotlin (all platforms)">
@@ -176,18 +183,29 @@ Then call `generateResponse(...)` on the new conversation to get the model's too
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    Both types are Kotlin `data class`es bridged into Swift. `@ObjCName` annotations rename the `description` parameter on the Swift inits to `functionDescription:` / `parameterDescription:`.
+
     ```swift
-    public struct LeapFunction: Equatable {
-      public let name: String
-      public let description: String
-      public let parameters: [LeapFunctionParameter]
+    public class LeapFunction {
+      public var name: String
+      public var functionDescription: String       // ObjC-renamed from Kotlin `description`
+      public var parameters: [LeapFunctionParameter]
+
+      public init(name: String, functionDescription: String, parameters: [LeapFunctionParameter])
     }
 
-    public struct LeapFunctionParameter: Equatable {
-      public let name: String
-      public let type: LeapFunctionParameterType
-      public let description: String
-      public let optional: Bool
+    public class LeapFunctionParameter {
+      public var name: String
+      public var type: LeapFunctionParameterType
+      public var parameterDescription: String      // ObjC-renamed from Kotlin `description`
+      public var optional: Bool
+
+      public init(
+        name: String,
+        type: LeapFunctionParameterType,
+        parameterDescription: String,
+        optional: Bool                              // no default in Swift — pass `false` for required
+      )
     }
     ```
   </Tab>
@@ -215,22 +233,27 @@ Then call `generateResponse(...)` on the new conversation to get the model's too
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    `LeapFunctionParameterType` is a Kotlin `sealed class`. SKIE generates an `onEnum(of:)`-compatible enum view, but the constructors you use to build instances keep the Kotlin nested-class names — there is no `.string(...)` / `.number(...)` alias.
+
     ```swift
-    public indirect enum LeapFunctionParameterType: Codable, Equatable {
-      case string(StringType)
-      case number(NumberType)
-      case integer(IntegerType)
-      case boolean(BooleanType)
-      case array(ArrayType)
-      case object(ObjectType)
-      case null(NullType)
-    }
+    // Direct constructors (use these to build parameter types):
+    LeapFunctionParameterType.LeapStr(enumValues: [String]?, description: String?)
+    LeapFunctionParameterType.LeapNum(enumValues: [NSNumber]?, description: String?)
+    LeapFunctionParameterType.LeapInt(enumValues: [KotlinInt]?, description: String?)
+    LeapFunctionParameterType.LeapBool(description: String?)
+    LeapFunctionParameterType.LeapArr(itemType: LeapFunctionParameterType, description: String?)
+    LeapFunctionParameterType.LeapObj(
+      properties: [String: LeapFunctionParameterType],
+      required: [String],
+      description: String?
+    )
+    LeapFunctionParameterType.LeapNull()   // no description parameter
     ```
 
-    - `StringType`, `NumberType`, `IntegerType` accept `enumValues` to constrain valid values.
-    - `ArrayType` has `itemType` describing element type.
-    - `ObjectType` has `properties: [String: LeapFunctionParameterType]` and `required: [String]`.
-    - All non-`null` types take an optional `description` (only used when nested via `ArrayType.itemType` or object properties — when used directly as `LeapFunctionParameter.type`, the outer `description` wins).
+    - `LeapStr` / `LeapNum` / `LeapInt` accept `enumValues` to constrain valid values.
+    - `LeapArr` has `itemType` describing the element type.
+    - `LeapObj` has `properties: [String: LeapFunctionParameterType]` and `required: [String]`.
+    - The nested `description` is overridden when the type is used directly as `LeapFunctionParameter.type`; it's only consulted when the type is used inside `LeapArr.itemType` or `LeapObj.properties`.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
@@ -258,21 +281,25 @@ Then call `generateResponse(...)` on the new conversation to get the model's too
     ```swift
     LeapFunction(
       name: "get_weather",
-      description: "Query the weather of cities",
+      functionDescription: "Query the weather of cities",
       parameters: [
         LeapFunctionParameter(
           name: "cities",
-          type: LeapFunctionParameterType.array(
-            ArrayType(itemType: .string(StringType()))
+          type: LeapFunctionParameterType.LeapArr(
+            itemType: LeapFunctionParameterType.LeapStr(enumValues: nil, description: nil),
+            description: nil
           ),
-          description: "Names of the cities to query weather for"
+          parameterDescription: "Names of the cities to query weather for",
+          optional: false
         ),
         LeapFunctionParameter(
           name: "unit",
-          type: LeapFunctionParameterType.string(
-            StringType(enumValues: ["celsius", "fahrenheit"])
+          type: LeapFunctionParameterType.LeapStr(
+            enumValues: ["celsius", "fahrenheit"],
+            description: nil
           ),
-          description: "Temperature unit"
+          parameterDescription: "Temperature unit",
+          optional: false
         ),
       ]
     )
diff --git a/deployment/on-device/sdk/messages-content.mdx b/deployment/on-device/sdk/messages-content.mdx
index 085d4f52..f30edb60 100644
--- a/deployment/on-device/sdk/messages-content.mdx
+++ b/deployment/on-device/sdk/messages-content.mdx
@@ -3,76 +3,87 @@ title: "Messages & Content"
 description: "ChatMessage, ChatMessageContent, audio format requirements — same shape on every platform."
 ---
 
-`ChatMessage` and `ChatMessageContent` mirror the OpenAI chat-completions message schema. The same fields exist on iOS / macOS (`struct ChatMessage`, `enum ChatMessageContent`) and the Kotlin platforms (`data class ChatMessage`, `sealed interface ChatMessageContent`).
+`ChatMessage` and `ChatMessageContent` mirror the OpenAI chat-completions message schema. Both are declared once in `commonMain` (`data class ChatMessage`, `sealed class ChatMessageContent`) and Kotlin/Native + SKIE bridge the Kotlin types into Swift — there are no separate "native" Swift declarations.
 
 ## `ChatMessage`
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    The Swift class is generated from the Kotlin `data class`. Kotlin parameter defaults don't propagate, so the primary init requires all four arguments explicitly:
+
     ```swift
-    public struct ChatMessage {
-      public var role: ChatMessageRole
+    public class ChatMessage {
+      public var role: ChatMessage.Role
       public var content: [ChatMessageContent]
       public var reasoningContent: String?
       public var functionCalls: [LeapFunctionCall]?
 
+      // Primary init — pass `reasoningContent: nil, functionCalls: nil` for ordinary messages.
       public init(
-        role: ChatMessageRole,
+        role: ChatMessage.Role,
         content: [ChatMessageContent],
-        reasoningContent: String? = nil,
-        functionCalls: [LeapFunctionCall]? = nil
+        reasoningContent: String?,
+        functionCalls: [LeapFunctionCall]?
       )
 
-      public init(from json: [String: Any]) throws
-    }
+      // Secondary inits (from Kotlin secondary constructors):
+      public init(role: ChatMessage.Role, content: ChatMessageContent)  // single content
+      public init(role: ChatMessage.Role, textContent: String)          // plain text
 
-    public enum ChatMessageRole: String {
-      case user, system, assistant, tool
+      public enum Role {
+        case system, user, assistant, tool
+      }
     }
     ```
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
+    @Serializable(with = ChatMessageJsonSerializer::class)
     data class ChatMessage(
         val role: Role,
         val content: List<ChatMessageContent>,
-        val reasoningContent: String? = null,
-        val functionCalls: List<LeapFunctionCall>? = null,
+        @SerialName("reasoning_content") val reasoningContent: String? = null,
+        @SerialName("tool_calls") val functionCalls: List<LeapFunctionCall>? = null,
     ) {
+        // Single-content secondary ctor (wraps the part in a list, drops defaults).
+        constructor(role: Role, content: ChatMessageContent)
+        // Plain-text secondary ctor (parameter name is `textContent`).
+        constructor(role: Role, textContent: String)
+
         enum class Role(val type: String) {
             SYSTEM("system"),
             USER("user"),
             ASSISTANT("assistant"),
-            TOOL("tool"),
-        }
+            TOOL("tool");
 
-        fun toJSONObject(): JSONObject
-
-        companion object {
-            fun fromJSONObject(obj: JSONObject): ChatMessage
+            companion object {
+                fun fromTypeString(type: String): Role  // throws LeapSerializationException on unknown values
+            }
         }
     }
     ```
+
+    `ChatMessage` is `@Serializable` via the dedicated `ChatMessageJsonSerializer` — encode/decode through `kotlinx.serialization.json.Json` rather than ad-hoc `JSONObject` helpers. See [Utilities → Serialization](./utilities#serialization).
   </Tab>
 </Tabs>
 
 ### Fields
 
 - **`role`** — the speaker (`user`, `system`, `assistant`, or `tool`). Use `tool` when appending function-call results back into the history.
-- **`content`** — ordered fragments. Supported part types: `Text`, `Image` (JPEG bytes), `Audio` (WAV bytes), and on Kotlin `AudioPcmF32` for raw float samples.
+- **`content`** — ordered fragments. Supported part types: `Text`, `Image` (JPEG bytes wrapped in a data URL), `Audio` (WAV bytes or `input_audio` payload), and on Kotlin `AudioPcmF32` for raw float samples.
 - **`reasoningContent`** — text emitted by reasoning models inside `<think>` / `</think>` tags. `null` for non-reasoning responses.
-- **`functionCalls`** — calls returned by `MessageResponse.functionCalls` on the previous turn, included when appending tool-call results to history.
+- **`functionCalls`** — calls returned by `MessageResponse.FunctionCalls` on the previous turn, included when appending tool-call results to history.
 
 ### Serialization
 
-Both platforms expose round-trip JSON helpers compatible with OpenAI's `ChatCompletionRequestMessage`.
+Round-trip the message through `kotlinx.serialization` — there is no separate "from `[String: Any]`" initializer on either platform.
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
-    `ChatMessage(from: [String: Any])` constructs a message from an OpenAI-style payload. Throws `LeapSerializationError` on unrecognized shapes.
+    Encode with `LeapJson.encodeToString` (or your own `JSONEncoder` against the OpenAI shape) and decode with the matching Kotlin serializer. See [Utilities → Serialization](./utilities#serialization) for examples that route through `LeapJson`.
   </Tab>
   <Tab title="Kotlin (all platforms)">
-    `ChatMessage.toJSONObject()` / `ChatMessage.fromJSONObject(obj)`. Throws `LeapSerializationException` on unrecognized shapes. See [Utilities → Serialization Support](./utilities#serialization-support).
+    `ChatMessage` is `@Serializable`. Encode with `Json.encodeToString(message)` and decode with `Json.decodeFromString<ChatMessage>(jsonString)` — see [Utilities → Serialization](./utilities#serialization). On error, expect a `LeapSerializationException` (not `LeapSerializationError`).
   </Tab>
 </Tabs>
 
@@ -80,48 +91,78 @@ Both platforms expose round-trip JSON helpers compatible with OpenAI's `ChatComp
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
-    ```swift
-    public enum ChatMessageContent {
-      case text(String)
-      case image(Data)   // JPEG bytes
-      case audio(Data)   // WAV bytes
+    `ChatMessageContent` is the Kotlin `sealed class` bridged to Swift — switch on its subclasses with SKIE's `onEnum(of:)` helper. There is no native Swift `enum`, no positional `.image(_:)` / `.audio(_:)` factory, and no `init(from json:)`. Use the static factories on the Swift overlay:
 
-      public init(from json: [String: Any]) throws
-    }
+    ```swift
+    // Text (cross-platform):
+    ChatMessageContent.text(_ text: String) -> ChatMessageContent
+
+    // Image:
+    ChatMessageContent.fromJPEGData(_ jpegData: Data) -> ChatMessageContent.Image
+    ChatMessageContent.image(url: String) -> ChatMessageContent.Image           // data URL or remote URL
+
+    // Audio:
+    ChatMessageContent.fromWAVData(_ wavData: Data) -> ChatMessageContent.Audio
+    ChatMessageContent.audio(data: Data, format: String = "wav") -> ChatMessageContent.Audio
+    ChatMessageContent.fromFloatSamples(_ samples: [Float], sampleRate: Int, channelCount: Int = 1)
+        -> ChatMessageContent.Audio
+
+    // iOS only — UIKit:
+    public static func fromUIImage(_ image: UIImage) throws -> ChatMessageContent
+    // (JPEG quality is fixed at 0.85; no compressionQuality parameter is exposed.)
     ```
 
-    Helper initializers simplify interop with platform-native buffers:
+    `fromUIImage` is iOS-only and takes only the image — JPEG compression quality is hard-coded to `0.85` in the overlay (`leap-sdk/src/iosMain/.../ChatMessageContentExtensionsIos.kt`). There is no `fromNSImage` factory; on macOS, convert your `NSImage` to JPEG `Data` yourself and pass it through `fromJPEGData(_:)`.
 
-    - `ChatMessageContent.fromUIImage(image, compressionQuality:)` — UIKit
-    - `ChatMessageContent.fromNSImage(image, compressionQuality:)` — AppKit
-    - `ChatMessageContent.fromWAVData(data)` — pass-through validator
-    - `ChatMessageContent.fromFloatSamples(samples, sampleRate:, channelCount:)` — wrap raw float32 PCM into a WAV blob
-
-    On the wire, image parts are encoded as OpenAI-style `image_url` payloads and audio parts as `input_audio` arrays with Base64 data.
+    On the wire, image parts are encoded as OpenAI-style `image_url` payloads (with a `data:image/jpeg;base64,...` URL) and audio parts as `input_audio` arrays with Base64 data.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
-    sealed interface ChatMessageContent {
-        fun clone(): ChatMessageContent
-        fun toJSONObject(): JSONObject
-
-        data class Text(val text: String) : ChatMessageContent
-        data class Image(val jpegByteArray: ByteArray) : ChatMessageContent
-        data class Audio(val wavByteArray: ByteArray) : ChatMessageContent
-        data class AudioPcmF32(val samples: FloatArray, val sampleRate: Int) : ChatMessageContent
-    }
+    sealed class ChatMessageContent {
+        data class Text(val text: String) : ChatMessageContent()
+
+        data class Image(val imageUrl: ImageUrl) : ChatMessageContent() {
+            // Convenience secondary ctor — wraps the bytes in a data: URL.
+            constructor(jpegByteArray: ByteArray)
+            val jpegByteArray: ByteArray   // derived property: decodes the data: URL
 
-    fun ChatMessageContent.fromJSONObject(obj: JSONObject): ChatMessageContent
+            // Nested wrapper for the OpenAI `image_url` wire shape.
+            data class ImageUrl(val url: String)
+        }
+
+        data class Audio(val inputAudio: InputAudio) : ChatMessageContent() {
+            // Convenience secondary ctor — wraps the bytes in an InputAudio.
+            constructor(data: ByteArray)
+            val data: ByteArray            // derived property: decodes the base64 InputAudio payload
+
+            data class InputAudio(val data: String, val format: String)  // base64-encoded `data`
+        }
+
+        // Convenience helpers (declared on the sealed class) wrap raw PCM into Audio:
+        fun toWavBytes(): ByteArray        // on AudioPcmF32 — encodes float samples as 16-bit PCM WAV
+        fun toAudio(): Audio               // on AudioPcmF32 — same bytes wrapped as ChatMessageContent.Audio
+
+        data class AudioPcmF32(val samples: FloatArray, val sampleRate: Int) : ChatMessageContent()
+    }
     ```
 
-    Android-specific helper: `ChatMessageContent.Image.fromBitmap(bitmap, compressionQuality = 85)` re-encodes an Android `Bitmap` to JPEG.
+    Serialize via `kotlinx.serialization` (every variant is `@Serializable`).
+
+    Android-specific helper: `ImageUtils.fromBitmap(bitmap, compressionQuality = 85)` (in `ai.liquid.leap.message`) re-encodes an Android `Bitmap` to JPEG and returns a `ChatMessageContent.Image`. It's a `suspend` function — call it from a coroutine.
+
+    ```kotlin
+    import ai.liquid.leap.message.ImageUtils
+
+    val image: ChatMessageContent.Image = ImageUtils.fromBitmap(bitmap, compressionQuality = 85)
+    ```
   </Tab>
 </Tabs>
 
 - **`Text`** — plain text fragment.
 - **`Image`** — JPEG-encoded image bytes. Only vision-capable models can interpret image parts.
 - **`Audio`** — WAV-encoded audio bytes (see [audio format requirements](#audio-format-requirements) below).
-- **`AudioPcmF32`** (Kotlin) / `fromFloatSamples(...)` (Swift) — raw float32 mono PCM in memory. Avoids re-encoding when you already have samples.
+- **`AudioPcmF32`** (Kotlin) — raw float32 mono PCM in memory. Avoids the WAV encoding step when you already have samples; the engine handles framing internally. Kotlin-only.
+- **`fromFloatSamples(...)` (Swift)** — convenience that wraps `[Float]` samples into a `ChatMessageContent.Audio` WAV blob (via `FloatAudioBuffer.makeAudioContent()`). Different from Kotlin's `AudioPcmF32`: this one DOES re-encode through WAV. There is no Swift surface for raw `AudioPcmF32` today.
 
 ## Audio format requirements
 
@@ -168,8 +209,10 @@ The engine **only accepts WAV**. M4A, MP3, AAC, OGG, and other compressed format
         role: .user,
         content: [
             .text("What is being said in this audio?"),
-            .audio(wavData)
-        ]
+            ChatMessageContent.fromWAVData(wavData)
+        ],
+        reasoningContent: nil,
+        functionCalls: nil
     )
     ```
   </Tab>
@@ -205,7 +248,9 @@ The engine **only accepts WAV**. M4A, MP3, AAC, OGG, and other compressed format
 
     let message = ChatMessage(
         role: .user,
-        content: [.text("Transcribe this audio"), audioContent]
+        content: [.text("Transcribe this audio"), audioContent],
+        reasoningContent: nil,
+        functionCalls: nil
     )
     ```
   </Tab>
@@ -252,7 +297,7 @@ The engine **only accepts WAV**. M4A, MP3, AAC, OGG, and other compressed format
     recorder.stop()
 
     let wavData = try Data(contentsOf: audioURL)
-    let audioContent: ChatMessageContent = .audio(wavData)
+    let audioContent: ChatMessageContent = ChatMessageContent.fromWAVData(wavData)
     ```
   </Tab>
   <Tab title="Kotlin (Android)">
diff --git a/deployment/on-device/sdk/model-loading.mdx b/deployment/on-device/sdk/model-loading.mdx
index 1e79299a..e33ee29a 100644
--- a/deployment/on-device/sdk/model-loading.mdx
+++ b/deployment/on-device/sdk/model-loading.mdx
@@ -11,12 +11,12 @@ The LEAP SDK ships two downloader classes built on the same pipeline. They diffe
 | **iOS / macOS (Swift)** | `ModelDownloader` | One-shot `loadModel(...)` and `loadSimpleModel(...)` that route every file transfer through `URLSession`. Pass `sessionConfiguration: .background(withIdentifier:)` for downloads that survive app suspension. Also exposes the underlying `downloadModel` / `requestDownloadModel` / `queryStatus` lifecycle for prefetch flows. The class ships in the `LeapModelDownloader` SPM library product. |
 | **All platforms (iOS, Android, JVM, Linux native, Windows native, macOS Kotlin)** | `LeapDownloader` | The cross-platform manifest loader. One-shot `loadModel(...)` and `loadSimpleModel(...)`. No platform-native background integration — the iOS `ModelDownloader` and Android `LeapModelDownloader` classes wrap one of these internally. |
 
-Both classes return the same `ModelRunner` and share an on-disk model cache when constructed with the same `LeapDownloaderConfig.saveDir`. The platform downloader wraps a `LeapDownloader` internally — once a download has landed, calling `LeapDownloader.loadModel(...)` against the shared cache picks up the files without re-downloading.
+All downloader classes return the same `ModelRunner` type. They share an on-disk model cache when pointed at the same directory: `LeapDownloaderConfig.saveDir` for Swift / JVM / native, and `modelFileDir` for Android `LeapModelDownloader`. Once a download has landed, calling `LeapDownloader.loadModel(...)` against the shared cache picks up the files without re-downloading.
 
 <Info>
 **Parameter naming.** Every loader uses the same parameter labels across Swift and Kotlin:
 
-- **`loadModel(...)` / `downloadModel(...)` / `requestDownloadModel(...)` / `queryStatus(...)` / `removeModel(...)`** all use `modelName:` / `quantizationType:` on the Swift `ModelDownloader` (iOS, macOS), the Kotlin `LeapModelDownloader` (Android), and the cross-platform `LeapDownloader`.
+- Manifest loaders and lifecycle methods use `modelName:` / `quantizationType:` consistently. Swift `ModelDownloader` exposes `downloadModel(...)`, `requestDownloadModel(...)`, `queryStatus(...)`, and `removeModel(...)`; Android `LeapModelDownloader` exposes `requestDownloadModel(...)`, `requestStopDownload(...)`, `queryStatus(...)`, and `getModelResourceFolder(...)`; cross-platform `LeapDownloader` exposes foreground `downloadModel(...)` / `loadModel(...)` plus cache cleanup helpers.
 - **`ModelSource` (sideloaded)** uses `quantizationId` — the field is part of the source descriptor, not a loader parameter.
 </Info>
 
@@ -68,17 +68,19 @@ Both classes return the same `ModelRunner` and share an on-disk model cache when
     class LeapModelDownloader(
         private val context: Context,
         modelFileDir: File? = null,
-        private val extraHTTPRequestHeaders: Map<String, String> = mapOf(),
         private val notificationConfig: LeapModelDownloaderNotificationConfig = LeapModelDownloaderNotificationConfig(),
+        private val downloaderConfig: LeapDownloaderConfig = LeapDownloaderConfig(),
+        private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO,
     )
     ```
 
     | Field | Description |
     |---|---|
     | `context` | Activity or Application context. |
-    | `modelFileDir` | Override the model cache directory. Defaults to app's external files directory. |
-    | `extraHTTPRequestHeaders` | Extra headers to attach to download requests. |
-    | `notificationConfig` | Foreground service notification title/content/icon strings. |
+    | `modelFileDir` | Override the model cache directory. Defaults to `File(context.filesDir, "leap_models")`. |
+    | `notificationConfig` | Notification channel, title, and content strings used by the WorkManager download worker. |
+    | `downloaderConfig` | Network / validation settings for the underlying `LeapDownloader` (`baseUrl`, SHA-256 validation, SSL, and timeouts). The cache directory comes from `modelFileDir`, not `downloaderConfig.saveDir`. |
+    | `ioDispatcher` | Coroutine dispatcher for blocking I/O. Defaults to `Dispatchers.IO`. |
   </Tab>
   <Tab title="Kotlin (JVM / native)">
     ```kotlin
@@ -87,6 +89,11 @@ Both classes return the same `ModelRunner` and share an on-disk model cache when
     data class LeapDownloaderConfig(
         val saveDir: String = "leap_models",
         val validateSha256: Boolean = true,
+        val disableSslValidation: Boolean = false,
+        val baseUrl: String? = null,
+        val connectTimeoutMillis: Long = 30_000,
+        val socketTimeoutMillis: Long = 60_000,
+        val requestTimeoutMillis: Long = 600_000,
     )
     ```
 
@@ -195,8 +202,8 @@ Resolves the GGUF manifest for the given model + quantization slug, downloads an
 
     - **`forceDownload`** — re-fetch even when cached. Use after a corrupted download or when the manifest has changed upstream.
     - **`forceLocal`** — skip the Leap Model Service and load in-process. Useful for testing the local path when the service is installed.
-    - **`progress`** — pass a callback to load eagerly inside `loadModel(...)` and observe progress; pass `null` (the default) to defer loading until the first session is created.
-    - **Background staging** — use `requestDownloadModel(modelName, quantizationType, forceDownload)` + `observeDownloadProgress(modelName, quantizationType): Flow<ProgressData>` for WorkManager-backed transfers. See [Utilities](./utilities).
+    - **`progress`** — observe manifest / model download bytes as `ProgressData`. On the Leap Model Service path, passing `null` preserves the service's deferred-load behavior; if the service is unavailable, the in-process fallback still loads before `loadModel(...)` returns.
+    - **Background staging** — call `requestDownloadModel(modelName, quantizationType, forceDownload)` to enqueue a unique WorkManager download worker, then observe `observeDownloadProgress(modelName, quantizationType): StateFlow<ModelDownloadProgress?>`. See [Utilities](./utilities).
   </Tab>
   <Tab title="Kotlin (JVM / native)">
     ```kotlin
@@ -211,7 +218,9 @@ Resolves the GGUF manifest for the given model + quantization slug, downloads an
     ```
 
     ```kotlin
-    val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir))
+    // `saveDir` is a String filesystem path (not java.io.File). On Android pass
+    // `context.cacheDir.absolutePath`; on JVM/native pass any writable directory:
+    val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = "/var/cache/leap"))
 
     val runner = downloader.loadModel(
         modelName = "LFM2-1.2B",
@@ -289,19 +298,33 @@ Use this path when you ship the model as an app asset, `adb push` it for develop
     ```
 
     <Accordion title="Legacy: Leap.load(url:options:)">
-      The 0.9.x-style URL-based loader still works:
+      The 0.9.x-style URL-based loader still works for the common case (auto-detection picks up sibling `mmproj-*.gguf` for vision and audio decoder files whose name contains "audio" and "decoder"):
 
       ```swift
       let runner = try await Leap.load(url: ggufURL)
+      ```
+
+      If you need to override the companion-file picks, build a fully-specified `LiquidInferenceEngineOptions`. The Kotlin/Native ObjC bridge strips default-argument metadata, so the Swift designated init requires every field — there is no `LiquidInferenceEngineOptions(bundlePath: …)` single-arg overload today. Pass `nil` for fields you don't need to set:
 
+      ```swift
       let options = LiquidInferenceEngineOptions(
         bundlePath: ggufURL.path,
-        mmProjPath: mmprojURL.path
+        cacheOptions: nil,
+        cpuThreads: nil,
+        contextSize: nil,
+        nGpuLayers: nil,
+        mmProjPath: mmprojURL.path,
+        audioDecoderPath: nil,
+        chatTemplate: nil,
+        audioTokenizerPath: nil,
+        audioDecoderUseGpu: false,
+        useMmap: nil,
+        extras: nil
       )
       let runner = try await Leap.load(url: ggufURL, options: options, autoDetectCompanionFiles: false)
       ```
 
-      Auto-detection picks up sibling `mmproj-*.gguf` (vision) and audio decoder files (`.gguf`/`.bin` whose name contains "audio" and "decoder"). New code should prefer `loadSimpleModel(model: ModelSource(...))` for race-free, explicit wiring.
+      New code should prefer `loadSimpleModel(model: ModelSource(...))` for race-free, explicit wiring.
     </Accordion>
   </Tab>
   <Tab title="Kotlin (all platforms)">
@@ -407,7 +430,7 @@ Useful for onboarding flows that prefetch over Wi-Fi or staging models you'll lo
     }
 
     public struct DownloadedModelManifest {
-      public let manifest: ModelManifest
+      public let manifest: Manifest
       public let localModelPath: String
       public let localMultimodalProjectorPath: String?
       public let localAudioDecoderPath: String?
@@ -418,22 +441,15 @@ Useful for onboarding flows that prefetch over Wi-Fi or staging models you'll lo
   </Tab>
   <Tab title="Kotlin (Android)">
     ```kotlin
-    suspend fun downloadModel(
-        modelName: String,
-        quantizationType: String,
-        progress: ((ProgressData) -> Unit)? = null,
-    ): Manifest
-
-    // Background variant (WorkManager): fire-and-forget, returns immediately
+    // Enqueues a unique WorkManager download worker and returns after staging it.
     suspend fun requestDownloadModel(modelName: String, quantizationType: String, forceDownload: Boolean = false)
     suspend fun requestStopDownload(modelName: String, quantizationType: String)
     suspend fun queryStatus(modelName: String, quantizationType: String): ModelDownloadStatus
-    fun observeDownloadProgress(modelName: String, quantizationType: String): Flow<ProgressData>
+    fun observeDownloadProgress(modelName: String, quantizationType: String): StateFlow<ModelDownloadProgress?>
     fun getModelResourceFolder(modelName: String, quantizationType: String): File
-    suspend fun requestStopService()
     ```
 
-    The background variant runs on WorkManager and survives app restarts. See [Utilities → Android background staging](./utilities) for the full status-polling lifecycle.
+    Android `LeapModelDownloader` does not expose foreground-only `downloadModel(...)`; use `requestDownloadModel(...)` to prefetch by enqueuing the WorkManager downloader, or `loadModel(...)` when you want download + load in one call. The queued worker survives app restarts. See [Utilities → Android background staging](./utilities) for the full status-polling lifecycle.
   </Tab>
   <Tab title="Kotlin (JVM / native)">
     ```kotlin
@@ -458,7 +474,7 @@ Per-load runtime overrides. Default values come from the model bundle's manifest
   <Tab title="Swift (iOS / macOS)">
     ```swift
     public struct LiquidInferenceEngineOptions {
-      public var bundlePath: String
+      public let bundlePath: String
       public let cacheOptions: LiquidCacheOptions?
       public let cpuThreads: UInt32?
       public let contextSize: UInt32?
@@ -468,20 +484,30 @@ Per-load runtime overrides. Default values come from the model bundle's manifest
       public let audioTokenizerPath: String?
       public let audioDecoderUseGpu: Bool       // default false
       public let chatTemplate: String?
+      public let useMmap: Bool?
       public let extras: String?
     }
 
-    // Manifest-based variant — accepts cacheOptions + contextSize without bundlePath
+    // Manifest-based variant — used with downloader.loadModel(...). No bundlePath
+    // (the downloader supplies it) and no companion-path / mmap fields (the manifest
+    // pins those). Only cache + tuning fields are exposed:
     public struct LiquidInferenceEngineManifestOptions {
       public let cacheOptions: LiquidCacheOptions?
+      public let cpuThreads: UInt32?
       public let contextSize: UInt32?
-      // …same companion-file and tuning fields…
+      public let nGpuLayers: UInt32?
+      public let audioDecoderUseGpu: Bool       // default false
+      public let chatTemplate: String?
+      public let extras: String?
     }
     ```
 
     Pass `LiquidInferenceEngineManifestOptions` to `ModelDownloader.loadModel(modelName:, quantizationType:, options:, ...)` for manifest-based loads, and `LiquidInferenceEngineOptions` to `Leap.load(url:, options:)` for sideloaded GGUFs:
 
     ```swift
+    // Manifest-based load (preferred — LiquidInferenceEngineManifestOptions has a
+    // SKIE-bundled convenience init in ConvenienceExtensions.swift that lets you
+    // pass just the fields you care about):
     let manifestOpts = LiquidInferenceEngineManifestOptions(
       contextSize: 8192,
       cpuThreads: 6
@@ -491,25 +517,17 @@ Per-load runtime overrides. Default values come from the model bundle's manifest
       quantizationType: "Q4_K_M",
       options: manifestOpts
     )
-
-    // Sideloaded variant (URL-based)
-    let options = LiquidInferenceEngineOptions(
-      bundlePath: ggufURL.path,
-      cpuThreads: 6,
-      contextSize: 8192
-    )
-    let runner = try await Leap.load(url: ggufURL, options: options)
     ```
 
-    **Builder style.** Chain `.with(...)` on `GenerationOptions`, `LiquidInferenceEngineOptions`, or `LiquidInferenceEngineManifestOptions`:
+    **Builder style on the manifest variant** — `LiquidInferenceEngineManifestOptions` exposes `.with(...)` chains that match the Kotlin builder surface:
 
     ```swift
-    let opts = LiquidInferenceEngineOptions(bundlePath: ggufURL.path)
+    let opts = LiquidInferenceEngineManifestOptions(contextSize: 8192)
         .with(cpuThreads: 6)
-        .with(contextSize: 8192)
-        .with(useMmap: false)
         .with(cacheOptions: .enabled(path: cacheDir.path))
     ```
+
+    **Sideloaded `LiquidInferenceEngineOptions` (URL-based load).** The non-manifest variant does NOT ship a Swift convenience init in v0.10.7 — the K/N-generated designated init takes all 12 fields. Either build it fully (verbose) or use `loadSimpleModel(model: ModelSource(...))` on `ModelDownloader` (preferred for new code; see the Sideloaded files section). The builder `.with(...)` overloads exist but they create a new instance internally via the same 12-arg init, so you still need a fully-built starting instance — there is no `LiquidInferenceEngineOptions(bundlePath: …)` 1-arg form today.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
@@ -523,7 +541,6 @@ Per-load runtime overrides. Default values come from the model bundle's manifest
         var extras: String? = null,
     ) {
         companion object {
-            fun build(action: ModelLoadingOptions.() -> Unit): ModelLoadingOptions
             fun cacheOptions(path: String, maxEntriesDisk: Int = 40): EngineOptions.CacheOptions
         }
     }
@@ -536,10 +553,10 @@ Per-load runtime overrides. Default values come from the model bundle's manifest
             modelName = "LFM2-1.2B",
             quantizationId = "Q5_K_M",
         ),
-        options = ModelLoadingOptions.build {
-            cpuThreads = 6
-            contextSize = 4096
-        }
+        options = ModelLoadingOptions(
+            cpuThreads = 6,
+            contextSize = 4096,
+        )
     )
     ```
 
@@ -583,6 +600,7 @@ data class SamplingParameters(
     val topP: Double? = null,
     val minP: Double? = null,
     val repetitionPenalty: Double? = null,
+    val topK: Int? = null,
 )
 ```
 
@@ -758,9 +776,9 @@ The service requires the `POST_NOTIFICATIONS` runtime permission (Android 13+) t
 
 ### Notes
 
-- The service ignores caller-supplied `cacheDir` paths (it maintains its own KV cache directory) — pass `cacheOptions` on `ModelLoadingOptions` to control the in-memory + disk caps, not the path.
+- The service does not accept caller-supplied `cacheOptions`; it maintains its own KV cache directory and policy. `LeapModelDownloader` forwards first-class load options such as `cpuThreads`, `randomSeed`, `chatTemplate`, `contextSize`, `extras`, and `useMmap`, but intentionally omits `cacheOptions` from the AIDL parcel. Use `forceLocal = true` when you need caller-controlled KV cache settings.
 - First-load wins: when multiple apps request the same model simultaneously, the first call's `ModelLoadingOptions` are applied; subsequent callers receive the shared runner regardless of their options. Read the effective config back via `LeapServiceClient.getLoadedModelConfig`.
-- Models stay loaded until the service is shut down or restarted. `evictUnusedModel` is a no-op by design — eviction would race with in-flight generations.
+- Models stay loaded until the service is shut down or restarted. The service has no public mid-flight eviction API — caller-driven eviction would race with in-flight generations.
 
 ## `ProgressData` / `Manifest`
 
diff --git a/deployment/on-device/sdk/openai-client.mdx b/deployment/on-device/sdk/openai-client.mdx
index 09432202..b63416b2 100644
--- a/deployment/on-device/sdk/openai-client.mdx
+++ b/deployment/on-device/sdk/openai-client.mdx
@@ -9,7 +9,7 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs —
 
 - **Hybrid on-device + cloud routing.** Run small / fast models on-device with `LeapSDK`, fall back to a larger cloud model for hard prompts.
 - **Standardised cloud API.** Talk to any OpenAI-compatible backend without pulling in a heavier OpenAI SDK.
-- **Streaming first.** SSE streaming is the only mode — non-streaming requests aren't exposed (`stream = true` is the default).
+- **Streaming first.** SSE streaming is the only mode — non-streaming requests aren't exposed. `streamChatCompletion(...)` forces `stream = true` on the outgoing request regardless of the `stream` field on the `ChatCompletionRequest` you pass in.
 
 ## Add the dependency
 
@@ -19,7 +19,7 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs —
 
     ```swift
     dependencies: [
-        .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.6")
+        .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.7")
     ]
 
     targets: [
@@ -37,22 +37,32 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs —
   <Tab title="Android (Gradle)">
     ```kotlin
     dependencies {
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
-      implementation("ai.liquid.leap:leap-openai-client:0.10.6")
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-openai-client:0.10.7")
     }
     ```
 
     Bundles an OkHttp-engine Ktor client. No extra HTTP setup needed.
   </Tab>
-  <Tab title="JVM / native (Gradle)">
+  <Tab title="JVM (Gradle)">
     ```kotlin
     dependencies {
-        implementation("ai.liquid.leap:leap-sdk:0.10.6")
-        implementation("ai.liquid.leap:leap-openai-client:0.10.6")
+        implementation("ai.liquid.leap:leap-sdk:0.10.7")
+        implementation("ai.liquid.leap:leap-openai-client:0.10.7")
     }
     ```
 
-    Bundles the CIO Ktor engine on JVM, and platform-appropriate engines on Linux native / Windows native. Maven users: use `leap-openai-client-jvm` for the JVM artifact.
+    JVM support landed in v0.10.7 (the `jvm` slice was absent in the v0.10.0–v0.10.6 cascade). Pure-Maven JVM projects should consume the `-jvm` classifier directly: `ai.liquid.leap:leap-openai-client-jvm:0.10.7`. Bundles the CIO Ktor engine.
+  </Tab>
+  <Tab title="Kotlin/Native (Gradle)">
+    ```kotlin
+    dependencies {
+        implementation("ai.liquid.leap:leap-sdk:0.10.7")
+        implementation("ai.liquid.leap:leap-openai-client:0.10.7")
+    }
+    ```
+
+    Targets `linuxX64`, `linuxArm64`, `mingwX64` (Windows native), and `wasmJs` (browser via Ktor Js engine, added in v0.10.7).
   </Tab>
 </Tabs>
 
@@ -60,10 +70,23 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs —
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    <Warning>
+    The `leap-sdk-openai-client` Kotlin module does **not** apply the SKIE plugin in v0.10.7 (only `leap-sdk`, `leap-sdk-model-downloader`, and `leap-ui` do). That means `Flow<ChatCompletionEvent>` is **not** bridged to a Swift `AsyncSequence` and the `onEnum(of:)` helper is **not** generated for `ChatCompletionEvent`. Swift consumers on v0.10.7 must collect the Kotlin `Flow` through its native collector and downcast each event with `as?`. For most Swift apps that just need cloud chat completions, an off-the-shelf OpenAI Swift client is more ergonomic — use `LeapOpenAIClient` from Swift only if you need to share Kotlin code with Android.
+
+    **Coming in the next release:** SKIE will be enabled on `leap-sdk-openai-client`, adding the same Swift-friendly surface as `LeapSDK` — `for try await event in client.streamChatCompletion(...)`, `onEnum(of: event)` exhaustive switching, and nested-class Swift names (`ChatCompletionEvent.Delta` instead of the current flattened `ChatCompletionEventDelta`). Swift convenience inits and builders for `OpenAiClientConfig` are also planned. Pin to v0.10.7 if you need the current behavior frozen; otherwise expect the more ergonomic surface to land soon.
+    </Warning>
+
+    Manual collection pattern (the `Flow<ChatCompletionEvent>.collect(...)` shape varies by Kotlin/Native version — check the framework header in your Xcode build for the exact label):
+
     ```swift
     import LeapOpenAIClient
 
-    let client = OpenAiClient(
+    // The Kotlin top-level `fun OpenAiClient(config: OpenAiClientConfig)` exports as
+    // `OpenAiClientKt.OpenAiClient(config:)` (PascalCase preserved from the Kotlin
+    // function name). Without SKIE the K/N export also flattens Kotlin's nested
+    // class names — `ChatMessage.User` → `ChatMessageUser`,
+    // `ChatCompletionEvent.Delta` → `ChatCompletionEventDelta`, etc.
+    let client = OpenAiClientKt.OpenAiClient(
         config: OpenAiClientConfig(
             apiKey: "sk-…",
             baseUrl: "https://api.openai.com/v1"
@@ -73,24 +96,25 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs —
     let request = ChatCompletionRequest(
         model: "gpt-4o-mini",
         messages: [
-            ChatMessage.System(content: "You are a helpful assistant."),
-            ChatMessage.User(content: "What is the capital of Japan?")
+            ChatMessageSystem(content: "You are a helpful assistant."),
+            ChatMessageUser(content: "What is the capital of Japan?")
         ],
         temperature: 0.7
     )
 
-    for try await event in client.streamChatCompletion(request: request) {
-        switch onEnum(of: event) {
-        case .delta(let d):
-            print(d.content, terminator: "")
-        case .done(let d):
-            if let usage = d.usage {
-                print("\nTokens: \(usage.totalTokens)")
+    // Pseudocode — actual collector signature depends on your Kotlin/Native version
+    // and framework headers. Without SKIE, there is no `for try await` integration.
+    try await client.streamChatCompletion(request: request).collect(
+        collector: FlowCollector { event in
+            if let delta = event as? ChatCompletionEventDelta {
+                print(delta.content, terminator: "")
+            } else if let done = event as? ChatCompletionEventDone {
+                if let usage = done.usage { print("\nTokens: \(usage.totalTokens)") }
+            } else if let err = event as? ChatCompletionEventError {
+                print("\nError: \(err.message)")
             }
-        case .error(let e):
-            print("\nError: \(e.message)")
         }
-    }
+    )
 
     client.close()  // closes the underlying URLSession-backed HttpClient
     ```
@@ -157,7 +181,11 @@ data class OpenAiClientConfig(
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    let client = OpenAiClient(
+    // The leap-sdk-openai-client module has no SKIE plugin applied, so the
+    // top-level Kotlin `fun OpenAiClient(config:)` factory is exported as
+    // `OpenAiClientKt.OpenAiClient(config:)`. See the [Basic usage](#basic-usage)
+    // warning for the full reasoning.
+    let client = OpenAiClientKt.OpenAiClient(
         config: OpenAiClientConfig(
             apiKey: "sk-or-…",
             baseUrl: "https://openrouter.ai/api/v1",
@@ -190,7 +218,7 @@ data class OpenAiClientConfig(
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    let client = OpenAiClient(
+    let client = OpenAiClientKt.OpenAiClient(
         config: OpenAiClientConfig(
             apiKey: "anything",  // Required by config but typically unused
             baseUrl: "http://10.0.0.42:8000/v1"
@@ -242,7 +270,7 @@ data class ChatCompletionRequest(
 
 ## Response shape
 
-`streamChatCompletion(request)` returns an `AsyncSequence<ChatCompletionEvent>` (Swift) / `Flow<ChatCompletionEvent>` (Kotlin):
+`streamChatCompletion(request)` returns a `Flow<ChatCompletionEvent>` (Kotlin) — and the same `Flow` is exposed verbatim to Swift in v0.10.7 (no SKIE on this module yet, so it's not bridged to a Swift `AsyncSequence`; collect it via the native `Flow.collect(...)` shape shown above). Events:
 
 | Variant | Meaning |
 |---|---|
@@ -276,15 +304,24 @@ Route simple prompts to a small on-device LFM; escalate harder prompts to a clou
 
         func send(_ text: String, useCloud: Bool) async throws {
             if useCloud {
+                // Cloud path: leap-sdk-openai-client has no SKIE — collect the Kotlin
+                // Flow manually and downcast each event with `as?`. Note the flattened
+                // Swift type names (`ChatMessageUser`, `ChatCompletionEventDelta`).
                 let request = ChatCompletionRequest(
                     model: "gpt-4o-mini",
-                    messages: [ChatMessage.User(content: text)]
+                    messages: [ChatMessageUser(content: text)]
+                )
+                try await cloud.streamChatCompletion(request: request).collect(
+                    collector: FlowCollector { event in
+                        if let delta = event as? ChatCompletionEventDelta {
+                            appendChunk(delta.content)
+                        }
+                    }
                 )
-                for try await event in cloud.streamChatCompletion(request: request) {
-                    if case let .delta(d) = onEnum(of: event) { appendChunk(d.content) }
-                }
             } else {
-                let userMessage = LeapModelDownloader.ChatMessage(role: .user, content: [.text(text)])
+                // On-device path: leap-sdk has SKIE — `for try await` + `onEnum(of:)`
+                // work as written.
+                let userMessage = ChatMessage(role: .user, textContent: text)
                 for try await response in onDevice.generateResponse(message: userMessage) {
                     if case let .chunk(c) = onEnum(of: response) { appendChunk(c.text) }
                 }
@@ -300,7 +337,7 @@ Route simple prompts to a small on-device LFM; escalate harder prompts to a clou
   <Tab title="Kotlin (Android)">
     ```kotlin
     import ai.liquid.leap.Conversation
-    import ai.liquid.leap.MessageResponse
+    import ai.liquid.leap.message.MessageResponse
     import ai.liquid.leap.openai.ChatCompletionEvent
     import ai.liquid.leap.openai.ChatCompletionRequest
     import ai.liquid.leap.openai.ChatMessage as CloudChatMessage
@@ -375,7 +412,7 @@ See [Cloud AI Comparison](./cloud-ai-comparison) for a side-by-side feature brea
 
 ## Lifecycle
 
-The platform `OpenAiClient(config:)` factory creates an `HttpClient` internally and ties it to the returned client — call `close()` when you're done.
+The platform `OpenAiClient(config:)` factory (Kotlin `fun OpenAiClient(config:)` → Swift `OpenAiClientKt.OpenAiClient(config:)`) creates an `HttpClient` internally and ties it to the returned client — call `close()` when you're done.
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
@@ -383,7 +420,7 @@ The platform `OpenAiClient(config:)` factory creates an `HttpClient` internally
     deinit { client.close() }
     ```
 
-    The lower-level constructor that accepts an externally-managed `HttpClient` is part of the Kotlin/Ktor surface and isn't a useful entry point from Swift — the Ktor engine machinery isn't bridged into the public Swift API. Use `OpenAiClient(config:)` and let the SDK own the session. If multiple consumers share a client, share the `OpenAiClient` instance and `close()` once at teardown.
+    The lower-level constructor that accepts an externally-managed `HttpClient` is part of the Kotlin/Ktor surface and isn't a useful entry point from Swift — the Ktor engine machinery isn't bridged into the public Swift API. Use `OpenAiClientKt.OpenAiClient(config:)` and let the SDK own the session. If multiple consumers share a client, share the `OpenAiClient` instance and `close()` once at teardown.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
diff --git a/deployment/on-device/sdk/quick-start.mdx b/deployment/on-device/sdk/quick-start.mdx
index a9d5e40c..3496557e 100644
--- a/deployment/on-device/sdk/quick-start.mdx
+++ b/deployment/on-device/sdk/quick-start.mdx
@@ -3,9 +3,32 @@ title: "Quick Start"
 description: "Install the LEAP SDK on iOS, macOS, Android, JVM, Linux, or Windows — same API everywhere."
 ---
 
-Latest version: `v0.10.6`
+Latest version: `v0.10.7`
 
-The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conversation` / `MessageResponse` API runs on every supported target. The code differs only in **language** (Swift vs. Kotlin) and **packaging** (SPM, Gradle, or Kotlin/Native plugin) — the call shapes are identical.
+## What is the Leap SDK?
+
+The **Leap SDK** is Liquid AI's official on-device inference SDK and the **only SDK with first-class support for [Liquid Foundation Models](https://www.liquid.ai/blog/liquid-foundation-models-our-first-series-of-generative-ai-models) (LFMs)** — LFM2, LFM2.5 (text, thinking, JP, VL), and LFM2.5-Audio. "First-class" means every published Liquid checkpoint is supported, validated, and shipped through this SDK on day-one — the same team that trains the models ships the engine, sampler defaults, chat templates, and tool-call parsers that run them. There is no separate adapter layer, no community port, no upstream-rebase lag.
+
+It's also a Kotlin Multiplatform library: the same `ModelRunner` / `Conversation` / `MessageResponse` API runs on iOS, macOS, Android, JVM desktop, Linux native, Windows native, and (preview) wasmJs. The Swift surface is generated through Kotlin/Native + SKIE and ships as XCFrameworks; the Android/JVM surface ships as Maven Central artifacts. Both call shapes are identical — only the language and packaging differ.
+
+### What "first-class support for Liquid models" gets you
+
+- **Day-one model coverage.** New LFM checkpoints land in the SDK release that announces them — no waiting for a generic runtime to catch up to a new architecture, no manual quant conversion, no template-mismatch debugging. The [LEAP Model Library](https://leap.liquid.ai/models) is the canonical distribution path and the SDK pulls directly from it.
+- **Per-checkpoint validated defaults.** The sampling parameters baked into each model's bundle manifest (`sampling_parameters` under `generation_time_parameters` in each `<Quant>.json` on [LiquidAI/LeapBundles](https://huggingface.co/LiquidAI/LeapBundles)) are the values the training team validated for that exact checkpoint. The SDK applies them automatically — no `temperature=0.7` placeholder retuning, no token-stream artifacts from the wrong `min_p` / `repetition_penalty`.
+- **LFM-native special tokens and chat templates.** The shipped engine knows how to filter LFM control tokens before they reach your stream, applies the right chat template per checkpoint, and parses LFM's hermes and pythonic function-call dialects out of the box. Generic SDKs treat these as opaque text and surface raw tokens; Leap surfaces typed `MessageResponse.FunctionCalls` with parsed argument maps.
+- **Multimodal LFMs in one API.** Vision (LFM2-VL family) and audio (LFM2.5-Audio) plug into the same `ChatMessage` / `ChatMessageContent` types you already use for text. Image inputs travel as JPEG bytes; audio travels as WAV blobs (or raw float32 PCM on Kotlin via `AudioPcmF32`). Output `MessageResponse.AudioSample` streams float32 PCM frames for audio-out checkpoints. No separate runtime per modality.
+- **Constrained generation, end-to-end.** Kotlin annotations (`@Generatable` / `@Guide` on `@Serializable` data classes) and Swift macros (`@Generatable` / `@Guide` synthesizing `jsonSchema()` at compile time) produce JSON Schemas the engine enforces at decode time. The model's output is guaranteed to parse into your type.
+- **One-call model fetching from the LEAP Model Library.** `LeapModelDownloader.loadModel(modelName:, quantizationType:)` resolves a manifest, downloads the right GGUF + matching `mmproj`/audio-decoder companion files for the checkpoint, caches them on disk, and hands back a `ModelRunner` — one call, no manual path wiring, no companion-file detection. Background-safe on iOS (`URLSessionConfiguration.background(withIdentifier:)`), WorkManager-backed on Android (survives app restarts).
+
+### Other features
+
+- **On-device by default.** No cloud round-trip, no per-token cost, full privacy, full offline operation.
+- **KV cache reuse for fast multi-turn.** Bounded-LRU disk + memory `CacheOptions` skip the prefill step for shared prompt prefixes — TTFT on a long system prompt or RAG preamble drops from seconds to under a hundred milliseconds on cache hits. Disabled by default; opt in with `LiquidCacheOptions.enabled(path:)` / `ModelLoadingOptions.cacheOptions(path = ...)`.
+- **Memory-mapped weight loading.** `use_mmap=true` is the default since v0.10.4. Model weights are file-backed, not anonymous RSS — iOS jetsam and Android LMK score the app much lower under memory pressure, cold load returns as soon as the file is mapped, and warm reloads stream from the kernel page cache.
+- **Hybrid on-device + cloud routing.** `leap-openai-client` ships in the same release as an opt-in OpenAI-compatible chat-completions client (OpenAI, OpenRouter, vLLM, llama-server). One binary, two code paths — route small/fast prompts on-device, fall back to a cloud model for hard ones, share the same `ChatMessage` types.
+- **Drop-in voice assistant UI.** `leap-ui` ships a Compose Multiplatform voice widget — animated orb, mic button, status label, state machine — that pairs with `VoiceConversation` to wire LFM2.5-Audio into a working voice experience without writing the recording-and-playback plumbing yourself.
+
+Implementation deep-dives: [Model Loading](/deployment/on-device/sdk/model-loading), [Conversation & Generation](/deployment/on-device/sdk/conversation-generation), [Constrained Generation](/deployment/on-device/sdk/constrained-generation), [Function Calling](/deployment/on-device/sdk/function-calling), [Voice Assistant Widget](/deployment/on-device/sdk/voice-assistant), [OpenAI-Compatible Client](/deployment/on-device/sdk/openai-client).
 
 <Info>
 **Migrating from 0.9.x?** v0.10.0 unifies the SDK into a single Kotlin Multiplatform distribution published from [`Liquid4All/leap-sdk`](https://github.com/Liquid4All/leap-sdk). The standalone `Liquid4All/leap-ios` repo is no longer the source-of-truth. See the [SDK changelog](/deployment/on-device/leap-sdk-changelog#0-9-x-0-10-x-kotlin-multiplatform-unification) for the transition story and drop-in replacements for legacy `Leap.load(...)` / `LiquidEngine(...)` call sites.
@@ -16,7 +39,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
 <Tabs>
   <Tab title="iOS / macOS">
     - Xcode 16.0+ with Swift 6.0.
-    - iOS **17.0+** or macOS **15.0+** (Mac Catalyst 17.0+ also supported).
+    - iOS **17.0+** or macOS **15.0+** (Apple Silicon only — Mac Catalyst is **not** supported; the shipped XCFrameworks contain only `ios-arm64`, `ios-arm64-simulator`, and `macos-arm64` slices, and `Package.swift` declares only `.iOS(.v17)` / `.macOS(.v15)` platforms).
     - A physical iPhone or iPad with at least 3 GB RAM for best performance. The simulator works for development but runs models much slower.
 
     <Warning>
@@ -68,7 +91,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
 
     1. In Xcode choose **File → Add Package Dependencies**.
     2. Enter `https://github.com/Liquid4All/leap-sdk.git`.
-    3. Select the `0.10.6` release (or newer).
+    3. Select the `0.10.7` release (or newer).
     4. Add the products you need to your app target.
 
     The package vends five products. Most apps only need one or two:
@@ -94,7 +117,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
     </Info>
 
     <Accordion title="Pin to explicit binary XCFrameworks">
-      For explicit pinning, declare each framework as a `.binaryTarget` in your `Package.swift`. The XCFramework assets live on the `Liquid4All/leap-sdk` v0.10.6 release page — copy the SHA-256 values from there.
+      For explicit pinning, declare each framework as a `.binaryTarget` in your `Package.swift`. The XCFramework assets live on the `Liquid4All/leap-sdk` v0.10.7 release page — copy the SHA-256 values from there.
 
       <Warning>
       The constrained-generation macros (`@Generatable`, `@Guide`) are Swift macros, not XCFrameworks — they ship as the `LeapSDKMacros` source target inside the SPM package and **cannot be installed as a `.binaryTarget`**. If you need them, use the standard SPM package URL above (or add the `LeapSDKMacros` source target separately on top of your binary targets).
@@ -103,23 +126,23 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
       ```swift
       .binaryTarget(
         name: "LeapSDK",
-        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapSDK.xcframework.zip",
-        checksum: "236fb6c897d25fc5804be64edc16a9ee73c26678d02e58dab4a1b77ab2e4898f"
+        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapSDK.xcframework.zip",
+        checksum: "6f2721aa45d7555646f78cbcaedb57aba3d869f56b24d681ad332846e131ae3d"
       ),
       .binaryTarget(
         name: "LeapModelDownloader",
-        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapModelDownloader.xcframework.zip",
-        checksum: "a2a57f9c932ef7005d42b33b69d7a67f0ffb65fb79dffa954be99a0225932a61"
+        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapModelDownloader.xcframework.zip",
+        checksum: "f649aa6c1aa3e87bbeb1073d5aeeb7224879359a24b18eeccc665d24abc725d8"
       ),
       .binaryTarget(
         name: "LeapOpenAIClient",
-        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapOpenAIClient.xcframework.zip",
-        checksum: "b661059af8bfb086931099f8fac9f54e957272d5d6bbc9dd36e3e154fddf8222"
+        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapOpenAIClient.xcframework.zip",
+        checksum: "79bc5443a1cce6fcd4c49c91eeb85727034aaca10d3ef69582c061989c3d9b70"
       ),
       .binaryTarget(
         name: "LeapUi",
-        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapUi.xcframework.zip",
-        checksum: "694f4b8a8d1a8cd9086ce718a9fc15f4e74c442541b983816fd0eef8cecc7875"
+        url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapUi.xcframework.zip",
+        checksum: "f1b198cef88c2a37eaf6dc1f36395d6aed024b0c6c2b43724d942e25b60d22e0"
       ),
       ```
 
@@ -131,14 +154,14 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
 
     ```kotlin
     dependencies {
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
-      implementation("ai.liquid.leap:leap-model-downloader:0.10.6") // Android background downloads
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Android background downloads
 
       // Optional: OpenAI-compatible cloud chat client
-      // implementation("ai.liquid.leap:leap-openai-client:0.10.6")
+      // implementation("ai.liquid.leap:leap-openai-client:0.10.7")
 
       // Optional: Voice assistant widget (Compose Multiplatform)
-      // implementation("ai.liquid.leap:leap-ui:0.10.6")
+      // implementation("ai.liquid.leap:leap-ui:0.10.7")
     }
     ```
 
@@ -147,7 +170,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
 
       ```toml
       [versions]
-      leapSdk = "0.10.6"
+      leapSdk = "0.10.7"
 
       [libraries]
       leap-sdk = { module = "ai.liquid.leap:leap-sdk", version.ref = "leapSdk" }
@@ -166,7 +189,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
       ```
     </Accordion>
 
-    Also declare these permissions in `AndroidManifest.xml` — `LeapModelDownloader` runs as a foreground service for reliable downloads:
+    Also declare these permissions in `AndroidManifest.xml` — `LeapModelDownloader.requestDownloadModel(...)` enqueues a WorkManager download worker that runs in the foreground while transferring model files:
 
     ```xml
     <uses-permission android:name="android.permission.INTERNET" />
@@ -191,11 +214,11 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
     }
 
     dependencies {
-        implementation("ai.liquid.leap:leap-sdk:0.10.6")
+        implementation("ai.liquid.leap:leap-sdk:0.10.7")
 
         // Optional:
-        // implementation("ai.liquid.leap:leap-openai-client:0.10.6")
-        // implementation("ai.liquid.leap:leap-ui:0.10.6") // Compose for Desktop voice widget
+        // implementation("ai.liquid.leap:leap-openai-client:0.10.7")
+        // implementation("ai.liquid.leap:leap-ui:0.10.7") // Compose for Desktop voice widget
     }
     ```
 
@@ -222,11 +245,11 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
     // build.gradle.kts
     plugins {
         kotlin("multiplatform") version "2.3.20"
-        id("ai.liquid.leap.nativelibs") version "0.10.6"
+        id("ai.liquid.leap.nativelibs") version "0.10.7"
     }
 
     dependencies {
-        implementation("ai.liquid.leap:leap-sdk:0.10.6")
+        implementation("ai.liquid.leap:leap-sdk:0.10.7")
     }
 
     kotlin {
@@ -242,7 +265,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver
 
 ## 3. Load a model
 
-The recommended path is **manifest-based** loading. On every platform, the platform downloader's `loadModel(...)` downloads (if needed) and loads in one call — `LeapModelDownloader.loadModel(...)` on iOS / macOS / Android, `LeapDownloader.loadModel(...)` on JVM and Linux / Windows Kotlin/Native. All paths fetch from the [LEAP Model Library](https://leap.liquid.ai/models) on first use and load from cache thereafter.
+The recommended path is **manifest-based** loading. On every platform, the platform downloader's `loadModel(...)` downloads (if needed) and loads in one call — `ModelDownloader.loadModel(...)` on iOS / macOS, `LeapModelDownloader.loadModel(...)` on Android, and `LeapDownloader.loadModel(...)` on JVM and Linux / Windows Kotlin/Native. All paths fetch from the [LEAP Model Library](https://leap.liquid.ai/models) on first use and load from cache thereafter.
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
@@ -298,8 +321,8 @@ The recommended path is **manifest-based** loading. On every platform, the platf
     import androidx.lifecycle.viewModelScope
     import ai.liquid.leap.Conversation
     import ai.liquid.leap.ModelRunner
-    import ai.liquid.leap.model_downloader.LeapModelDownloader
-    import ai.liquid.leap.model_downloader.LeapModelDownloaderNotificationConfig
+    import ai.liquid.leap.downloader.LeapModelDownloader
+    import ai.liquid.leap.downloader.LeapModelDownloaderNotificationConfig
     import kotlinx.coroutines.Dispatchers
     import kotlinx.coroutines.flow.MutableStateFlow
     import kotlinx.coroutines.flow.StateFlow
@@ -313,7 +336,7 @@ The recommended path is **manifest-based** loading. On every platform, the platf
             notificationConfig = LeapModelDownloaderNotificationConfig.build {
                 notificationTitleDownloading = "Downloading AI model..."
                 notificationTitleDownloaded = "Model ready!"
-                notificationContentDownloading = "Please wait while the model downloads"
+                notificationContentDownloadingTemplate = "Please wait while the model downloads"
             }
         )
 
@@ -351,8 +374,8 @@ The recommended path is **manifest-based** loading. On every platform, the platf
   </Tab>
   <Tab title="Kotlin (JVM / native)">
     ```kotlin
-    import ai.liquid.leap.LeapDownloader
-    import ai.liquid.leap.LeapDownloaderConfig
+    import ai.liquid.leap.manifest.LeapDownloader
+    import ai.liquid.leap.manifest.LeapDownloaderConfig
     import ai.liquid.leap.message.ChatMessage
     import ai.liquid.leap.message.MessageResponse
     import kotlinx.coroutines.runBlocking
@@ -372,7 +395,9 @@ The recommended path is **manifest-based** loading. On every platform, the platf
 
         val conversation = runner.createConversation(systemPrompt = "You are a helpful assistant.")
 
-        conversation.generateResponse(ChatMessage.user("Hello!")).collect { resp ->
+        conversation.generateResponse(
+            ChatMessage(ChatMessage.Role.USER, "Hello!")
+        ).collect { resp ->
             when (resp) {
                 is MessageResponse.Chunk -> print(resp.text)
                 is MessageResponse.Complete -> println("\n[done]")
@@ -431,12 +456,16 @@ Both platforms expose the same streaming shape: an async sequence of `MessageRes
     func send(_ text: String) {
         guard let conversation else { return }
         generationTask?.cancel()
-        let userMessage = ChatMessage(role: .user, content: [.text(text)])
+        let userMessage = ChatMessage(role: .user, textContent: text)
+        let options = GenerationOptions()
+            .with(temperature: 0.3)
+            .with(minP: 0.15)
+            .with(repetitionPenalty: 1.05)
         generationTask = Task { [weak self] in
             do {
                 for try await response in conversation.generateResponse(
                     message: userMessage,
-                    generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
+                    generationOptions: options
                 ) {
                     self?.handle(response)
                 }
@@ -479,7 +508,7 @@ Both platforms expose the same streaming shape: an async sequence of `MessageRes
                 ?.onEach { response ->
                     when (response) {
                         is MessageResponse.Chunk -> _responseText.value += response.text
-                        is MessageResponse.ReasoningChunk -> Log.d(TAG, "Reasoning: ${response.text}")
+                        is MessageResponse.ReasoningChunk -> Log.d(TAG, "Reasoning: ${response.reasoning}")
                         is MessageResponse.FunctionCalls -> handleFunctionCalls(response.functionCalls)
                         is MessageResponse.AudioSample -> audioRenderer.enqueue(response.samples, response.sampleRate)
                         is MessageResponse.Complete -> Log.d(TAG, "Done. Stats: ${response.stats}")
@@ -501,22 +530,28 @@ Cancel the in-flight task (Swift) or coroutine job (Kotlin) to interrupt generat
 If the loaded model is multimodal (and its companion files were detected), you can attach a non-text part — an image, a WAV blob, or raw PCM samples — alongside the text in a `ChatMessage`.
 
 <Info>
-**Multimodality is model-specific.** Most multimodal models we ship are text + one other modality: text + vision (the VLM family) or text + audio (the audio family) — not both in the same checkpoint. Send `.image(...)` parts only to a vision-capable model, and `.audio(...)` / `.fromFloatSamples(...)` parts only to an audio-capable model. Mixing modalities a model wasn't trained on will either fail to load the companion file or produce nonsense. Check the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up a non-text input path.
+**Multimodality is model-specific.** Most multimodal models we ship are text + one other modality: text + vision (the VLM family) or text + audio (the audio family) — not both in the same checkpoint. Send image content (`fromJPEGData(_:)`, `image(url:)`, `fromBitmap(...)` / `fromUIImage(_:)`) only to a vision-capable model, and audio content (`fromWAVData(_:)`, `fromFloatSamples(_:sampleRate:)`) only to an audio-capable model. Mixing modalities a model wasn't trained on will either fail to load the companion file or produce nonsense. Check the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up a non-text input path.
 </Info>
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
     ```swift
-    // Text + image (vision-capable model)
+    // Text + image (vision-capable model). Use `ChatMessageContent.fromJPEGData(_:)`
+    // for raw JPEG bytes, or `.image(url:)` for a data URL / remote URL.
     let imageMessage = ChatMessage(
       role: .user,
-      content: [.text("Describe what you see."), .image(jpegData)]
+      content: [.text("Describe what you see."), ChatMessageContent.fromJPEGData(jpegData)],
+      reasoningContent: nil,
+      functionCalls: nil
     )
 
-    // Text + WAV audio (audio-capable model)
+    // Text + WAV audio (audio-capable model). `fromWAVData` validates the header;
+    // use `.audio(data:format:)` if you already know the bytes are a supported format.
     let wavMessage = ChatMessage(
       role: .user,
-      content: [.text("Transcribe and summarize this clip."), .audio(wavData)]
+      content: [.text("Transcribe and summarize this clip."), ChatMessageContent.fromWAVData(wavData)],
+      reasoningContent: nil,
+      functionCalls: nil
     )
 
     // Text + raw PCM samples (audio-capable model)
@@ -525,14 +560,17 @@ If the loaded model is multimodal (and its companion files were detected), you c
       content: [
         .text("Give feedback on my pronunciation."),
         ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)
-      ]
+      ],
+      reasoningContent: nil,
+      functionCalls: nil
     )
     ```
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
     // Text + image (vision-capable model)
-    val imageMessage = ChatMessage.user(
+    val imageMessage = ChatMessage(
+        role = ChatMessage.Role.USER,
         content = listOf(
             ChatMessageContent.Text("Describe what you see."),
             ChatMessageContent.Image(jpegBytes)
@@ -540,7 +578,8 @@ If the loaded model is multimodal (and its companion files were detected), you c
     )
 
     // Text + WAV audio (audio-capable model)
-    val wavMessage = ChatMessage.user(
+    val wavMessage = ChatMessage(
+        role = ChatMessage.Role.USER,
         content = listOf(
             ChatMessageContent.Text("Transcribe and summarize this clip."),
             ChatMessageContent.Audio(wavBytes)
@@ -548,7 +587,8 @@ If the loaded model is multimodal (and its companion files were detected), you c
     )
 
     // Text + raw PCM samples (audio-capable model)
-    val pcmMessage = ChatMessage.user(
+    val pcmMessage = ChatMessage(
+        role = ChatMessage.Role.USER,
         content = listOf(
             ChatMessageContent.Text("Give feedback on my pronunciation."),
             ChatMessageContent.AudioPcmF32(samples, sampleRate = 16000)
diff --git a/deployment/on-device/sdk/utilities.mdx b/deployment/on-device/sdk/utilities.mdx
index 4766013a..e57ad7c1 100644
--- a/deployment/on-device/sdk/utilities.mdx
+++ b/deployment/on-device/sdk/utilities.mdx
@@ -9,14 +9,14 @@ This page covers error types, serialization helpers, and a few platform-specific
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
-    Errors surface as `LeapError` values. The most common cases:
+    Errors are subclasses of `LeapException` (`LeapError` is a type alias for `LeapException` provided for backward compatibility). The most common subclasses:
 
-    - **`LeapError.modelLoadingFailure`** — problems reading or validating the model bundle.
-    - **`LeapError.generationFailure`** — unexpected native inference errors.
-    - **`LeapError.promptExceedContextLengthFailure`** — prompt length exceeded the configured context size.
-    - **`LeapError.serializationFailure`** — JSON encoding/decoding problems on chat history or function calls.
+    - **`LeapModelLoadingException`** — problems reading or validating the model bundle.
+    - **`LeapGenerationException`** — unexpected native inference errors.
+    - **`LeapGenerationPromptExceedContextLengthException`** — prompt length exceeded the configured context size.
+    - **`LeapSerializationException`** — JSON encoding/decoding problems on chat history or function calls.
 
-    Handle thrown errors with `do` / `catch` on async streams, or use `onErrorCallback` on the lower-level callback APIs.
+    Handle thrown errors with `do` / `catch` on the async streams returned by `Conversation.generateResponse(...)`, or downcast with `if let err = error as? LeapModelLoadingException { ... }` to inspect a specific subclass.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     All errors are subclasses of `LeapException`:
@@ -38,19 +38,21 @@ This page covers error types, serialization helpers, and a few platform-specific
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
-    Use the JSON initializers directly on `ChatMessage` and `ChatMessageContent`:
+    Use `Conversation.exportToJSON()` to get an OpenAI-shaped JSON string, then route restores back through Kotlin's serializer (there is no `ChatMessage(from: [String: Any])` initializer):
 
     ```swift
-    // Serialize the conversation history
-    let payload: [[String: Any]] = try conversation.exportToJSON()
-    let data = try JSONSerialization.data(withJSONObject: payload, options: [])
-
-    // Round-trip a single message
-    let json: [String: Any] = ["role": "user", "content": "Hello"]
-    let message = try ChatMessage(from: json)
+    // Serialize the conversation history (compact JSON string, OpenAI chat-completions shape)
+    let jsonString: String = conversation.exportToJSON()
+    let data: Data = Data(jsonString.utf8)
+
+    // Restore — `LeapJson.decodeFromString(...)` is the Kotlin-side decoder.
+    // For Swift-only round trips, persist the `jsonString` and pass it back to
+    // `modelRunner.createConversationFromHistory(history:)` after rebuilding the
+    // `[ChatMessage]` list via your shared Kotlin code, or use a server-side
+    // round-trip that talks to your sync backend.
     ```
 
-    Persist `data` to disk, UserDefaults, or your sync backend. On restore, decode it back to `[[String: Any]]`, map each entry through `ChatMessage(from:)`, and rebuild via `modelRunner.createConversationFromHistory(history:)`.
+    Persist `data` to disk, UserDefaults, or your sync backend. On restore, decode the JSON via Kotlin's `LeapJson` (re-exported through SKIE) into a `[ChatMessage]` and rebuild via `modelRunner.createConversationFromHistory(history:)`. There is no Swift-native dictionary-based `ChatMessage` initializer — the Kotlin serializer is the source of truth on both platforms.
   </Tab>
   <Tab title="Kotlin (all platforms)">
     The SDK uses [kotlinx.serialization](https://github.com/Kotlin/kotlinx.serialization) — `@Serializable` is already declared on the relevant types in the core SDK.
@@ -102,11 +104,11 @@ This page covers error types, serialization helpers, and a few platform-specific
 This section is **Android-only**. iOS / macOS callers use the Swift `ModelDownloader` (shipped in the `LeapModelDownloader` SPM product), which routes transfers through `URLSession` — see [Model Loading → Constructing the downloader](./model-loading#constructing-the-downloader) for background-session configuration. The cross-platform `LeapDownloader` (used directly on JVM, Linux native, Windows native) is a plain async fetcher with no platform background-service hooks.
 </Info>
 
-Beyond the high-level `loadModel` / `loadSimpleModel` / `downloadModel` methods covered in [Model Loading](./model-loading), the Android `LeapModelDownloader` exposes a few lower-level methods for background staging, status polling, and service control.
+Beyond the high-level `loadModel` / `loadSimpleModel` methods covered in [Model Loading](./model-loading), the Android `LeapModelDownloader` exposes a few lower-level methods for WorkManager background staging and status polling.
 
 ### Permission setup
 
-The downloader runs as a [foreground service](https://developer.android.com/develop/background-work/services/fgs) and displays notifications. Declare these in your `AndroidManifest.xml`:
+`requestDownloadModel(...)` enqueues a WorkManager download worker. During transfer, the worker runs in the foreground and displays notifications, so declare these in your `AndroidManifest.xml`:
 
 ```xml
 <uses-permission android:name="android.permission.INTERNET" />
@@ -141,43 +143,61 @@ if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
 class LeapModelDownloader(
     private val context: Context,
     modelFileDir: File? = null,
-    private val extraHTTPRequestHeaders: Map<String, String> = mapOf(),
     private val notificationConfig: LeapModelDownloaderNotificationConfig = LeapModelDownloaderNotificationConfig(),
+    private val downloaderConfig: LeapDownloaderConfig = LeapDownloaderConfig(),
+    private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO,
 ) {
-    fun requestDownloadModel(modelName: String, quantizationType: String, forceDownload: Boolean = false)
-    fun requestStopDownload(modelName: String, quantizationType: String)
+    suspend fun requestDownloadModel(modelName: String, quantizationType: String, forceDownload: Boolean = false)
+    suspend fun requestStopDownload(modelName: String, quantizationType: String)
     suspend fun queryStatus(modelName: String, quantizationType: String): ModelDownloadStatus
-    fun observeDownloadProgress(modelName: String, quantizationType: String): Flow<ProgressData>
+    fun observeDownloadProgress(modelName: String, quantizationType: String): StateFlow<ModelDownloadProgress?>
     fun getModelResourceFolder(modelName: String, quantizationType: String): File
-    fun requestStopService()
-}
 
-sealed interface ModelDownloadStatus {
-    data object NotOnLocal : ModelDownloadStatus
-    data class DownloadInProgress(
-        val totalSizeInBytes: Long,
-        val downloadedSizeInBytes: Long,
-    ) : ModelDownloadStatus
-    data class Downloaded(val totalSizeInBytes: Long) : ModelDownloadStatus
+    @Deprecated("No longer needed with WorkManager - downloads are managed automatically")
+    suspend fun requestStopService()
+
+    // `ModelDownloadStatus` is nested under `LeapModelDownloader`.
+    sealed interface ModelDownloadStatus {
+        data object NotOnLocal : ModelDownloadStatus
+        data class DownloadInProgress(
+            val totalSizeInBytes: Long,
+            val downloadedSizeInBytes: Long,
+        ) : ModelDownloadStatus
+        data class Downloaded(val totalSizeInBytes: Long) : ModelDownloadStatus
+    }
+
+    class ModelDownloadProgress {
+        var totalSizeInBytes: Long
+        var downloadedSizeInBytes: Long
+        val progress: Double
+    }
 }
 ```
 
-- **`requestDownloadModel`** — fire-and-forget download via WorkManager. Returns immediately; the download survives app restarts.
-- **`requestStopDownload`** — cancel an in-flight background download.
-- **`queryStatus`** — one-shot status check.
-- **`observeDownloadProgress`** — `Flow<ProgressData>` for UI updates during a background download.
+Refer to the nested status type as `LeapModelDownloader.ModelDownloadStatus.NotOnLocal` / `.DownloadInProgress` / `.Downloaded` on Android — the Android downloader does not expose a top-level `ai.liquid.leap.downloader.ModelDownloadStatus`. (Apple ships a top-level `ai.liquid.leap.downloader.ModelDownloadStatus` `sealed interface` with a different payload — `DownloadInProgress(progress: Double)` and a `data object Downloaded` with no size — so don't share status-decoding code unmodified across platforms.)
+
+- **`requestDownloadModel`** — `suspend` fire-and-forget prefetch. It enqueues a unique WorkManager download worker; the download itself survives app restarts, and the call returns after staging the work request.
+- **`requestStopDownload`** — `suspend`; cancels an in-flight background download.
+- **`queryStatus`** — `suspend` one-shot status check.
+- **`observeDownloadProgress`** — `StateFlow<ModelDownloadProgress?>` for UI updates during a background download. It emits `null` when no download is active.
 - **`getModelResourceFolder`** — the directory the SDK will use for this model+quantization on disk.
-- **`requestStopService`** — gracefully stop the foreground service (it auto-stops when no work is queued, but you can force it).
+- **`requestStopService`** — `@Deprecated` no-op since v0.10.6 (WorkManager handles the worker lifecycle automatically). Kept for source compatibility; new code shouldn't call it.
 
 ### Removing a downloaded model
 
-Use the cross-platform `LeapDownloader.deleteModelResources(...)` to clean up disk:
+Use the Android downloader's resource folder to clean up disk, or construct a cross-platform `LeapDownloader` with the same `saveDir` and call its instance method `deleteModelResources(...)`:
 
 ```kotlin
-LeapDownloader.deleteModelResources(
+val resourceFolder = downloader.getModelResourceFolder(
+    modelName = "LFM2-1.2B",
+    quantizationType = "Q5_K_M",
+)
+resourceFolder.deleteRecursively()
+
+// Equivalent when you know the saveDir:
+LeapDownloader(LeapDownloaderConfig(saveDir = resourceFolder.parentFile!!.absolutePath)).deleteModelResources(
     modelName = "LFM2-1.2B",
     quantizationType = "Q5_K_M",
-    baseDir = baseDir, // same dir LeapModelDownloader / LeapDownloader was configured with
 )
 ```
 
@@ -198,14 +218,17 @@ A minimal end-to-end snippet exercising load → conversation → tool registrat
     )
     let conversation = runner.createConversation(systemPrompt: "You are a travel assistant.")
 
-    conversation.registerFunction(weatherFunction)
+    conversation.registerFunction(function: weatherFunction)
 
-    var options = GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
-    try options.setResponseFormat(type: TripRecommendation.self)
+    let options = GenerationOptions()
+      .with(temperature: 0.3)
+      .with(minP: 0.15)
+      .with(repetitionPenalty: 1.05)
+      .with(jsonSchema: TripRecommendation.jsonSchema())
 
     let userMessage = ChatMessage(
       role: .user,
-      content: [.text("Plan a 3-day trip to Kyoto with food highlights")]
+      textContent: "Plan a 3-day trip to Kyoto with food highlights"
     )
 
     for try await response in conversation.generateResponse(
@@ -218,7 +241,11 @@ A minimal end-to-end snippet exercising load → conversation → tool registrat
   </Tab>
   <Tab title="Kotlin (all platforms)">
     ```kotlin
-    val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir))
+    // `LeapDownloaderConfig.saveDir` is a `String` (filesystem path) — on Android,
+    // pass `cacheDir.absolutePath`, not the `File` itself. On Android, prefer
+    // `LeapModelDownloader(application)` (the cross-platform `LeapDownloader` works
+    // too, but doesn't integrate with WorkManager).
+    val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir.absolutePath))
     val runner = downloader.loadModel(
         modelName = "LFM2.5-1.2B-Instruct",
         quantizationType = "Q4_K_M"
@@ -231,10 +258,10 @@ A minimal end-to-end snippet exercising load → conversation → tool registrat
         temperature = 0.3f
         minP = 0.15f
         repetitionPenalty = 1.05f
-        setResponseFormatType(TripRecommendation::class)
+        setResponseFormatType<TripRecommendation>()
     }
 
-    val userMessage = ChatMessage.user("Plan a 3-day trip to Kyoto with food highlights")
+    val userMessage = ChatMessage(ChatMessage.Role.USER, "Plan a 3-day trip to Kyoto with food highlights")
 
     conversation.generateResponse(userMessage, options).onEach(::process).collect()
     ```
diff --git a/deployment/on-device/sdk/voice-assistant.mdx b/deployment/on-device/sdk/voice-assistant.mdx
index 1784dd0c..14f18848 100644
--- a/deployment/on-device/sdk/voice-assistant.mdx
+++ b/deployment/on-device/sdk/voice-assistant.mdx
@@ -11,7 +11,7 @@ The `leap-ui` module (introduced in v0.10.0) ships a ready-to-use voice assistan
 - **macOS** — bridged to AppKit via `VoiceAssistantNSViewController`. SwiftUI hosts via `NSViewControllerRepresentable` + `NSHostingController`.
 - **Android** — direct Compose for Android.
 - **JVM Desktop** — Compose for Desktop. Same Maven artifact; you provide audio I/O implementations (the demo apps in `leap-ui-demo/` ship patterns you can adapt).
-- **Web (Wasm, experimental)** — present in the source tree (`leap-ui-demo/web`) but not yet covered by the v0.10.6 stable release notes — treat as preview.
+- **Web (Wasm, experimental)** — present in the source tree (`leap-ui-demo/web`) but not yet covered by the stable release notes through v0.10.7 — treat as preview.
 
 ## Add the dependency
 
@@ -21,7 +21,7 @@ The `leap-ui` module (introduced in v0.10.0) ships a ready-to-use voice assistan
 
     ```swift
     dependencies: [
-        .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.6")
+        .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.7")
     ]
 
     targets: [
@@ -46,12 +46,12 @@ The `leap-ui` module (introduced in v0.10.0) ships a ready-to-use voice assistan
   <Tab title="Android / JVM (Gradle)">
     ```kotlin
     dependencies {
-        implementation("ai.liquid.leap:leap-sdk:0.10.6")
-        implementation("ai.liquid.leap:leap-ui:0.10.6")
+        implementation("ai.liquid.leap:leap-sdk:0.10.7")
+        implementation("ai.liquid.leap:leap-ui:0.10.7")
     }
     ```
 
-    `leap-ui` brings in Compose runtime, foundation, and material3 transitively. If your project doesn't already use Compose, add the standard Compose dependencies too.
+    `leap-ui` depends on Compose runtime, foundation, and material3 internally (with `implementation` scope), so the runtime artifacts are pulled in but their APIs are not re-exported to consumer source. If your project uses Compose directly, declare the same Compose dependencies in your own module.
   </Tab>
 </Tabs>
 
@@ -103,9 +103,11 @@ The `VoiceConversation` adapter looks similar on every platform — both impleme
                     modelName: "LFM2.5-Audio-1.5B",
                     quantizationType: "Q4_0",
                     downloadProgress: { fraction, _ in
+                        // `fraction` is `Double` from the Kotlin (Double, Long) -> Unit
+                        // closure; `setModelProgress.fraction` is `Float`, so cast.
                         Task { @MainActor in
                             self.store.setModelProgress(
-                                fraction: fraction,
+                                fraction: Float(fraction),
                                 message: "Downloading (\(Int(fraction * 100))%)"
                             )
                         }
@@ -136,7 +138,7 @@ The `VoiceConversation` adapter looks similar on every platform — both impleme
   </Tab>
   <Tab title="Kotlin (Android)">
     ```kotlin
-    import ai.liquid.leap.model_downloader.LeapModelDownloader
+    import ai.liquid.leap.downloader.LeapModelDownloader
     import ai.liquid.leap.ui.VoiceAssistantIntent
     import ai.liquid.leap.ui.VoiceAssistantStore
     import ai.liquid.leap.ui.VoiceAssistantStoreState
@@ -258,6 +260,7 @@ The `VoiceConversation` adapter looks similar on every platform — both impleme
   <Tab title="Kotlin (Android / Compose Desktop)">
     ```kotlin
     import ai.liquid.leap.ui.VoiceAssistantWidget
+    import android.os.Bundle
     import androidx.activity.ComponentActivity
     import androidx.activity.compose.setContent
     import androidx.compose.foundation.background
@@ -299,8 +302,11 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w
 
 <Tabs>
   <Tab title="Swift (iOS / macOS)">
+    The `VoiceConversation` protocol comes from `LeapUI`, so its `audioSamples` and `onAudioChunk` parameters use `LeapUi.KotlinFloatArray` / `LeapUi.KotlinInt` — not native Swift `[Float]` / `Int32`. The on-device runner lives in `LeapSDK`, which has its own `LeapSDK.KotlinFloatArray`. Bridge between the two via the `floatArrayToNSData` / `nsDataToFloatArray` helpers exposed in both frameworks (see `leap-ui-demo/shared/AppleVoiceConversation.swift` for the canonical pattern).
+
     ```swift
     import LeapModelDownloader
+    import LeapSDK
     import LeapUi
 
     final class AppleVoiceConversation: VoiceConversation {
@@ -310,21 +316,34 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w
             self.conversation = conversation
         }
 
+        // Note: this method is `__generateResponse` in the SKIE-generated overlay
+        // because `LeapUI` and `LeapSDK` are separate frameworks with separate Kotlin
+        // runtimes. The runtime-types-as-parameters force the underscore prefix.
         func generateResponse(
-            audioSamples: [Float],
+            audioSamples: LeapUi.KotlinFloatArray,
             sampleRate: Int32,
-            onAudioChunk: @escaping (_ samples: [Float], _ sampleRate: Int32) -> Void
-        ) async throws -> GenerationStats? {
+            onAudioChunk: @escaping (LeapUi.KotlinFloatArray, LeapUi.KotlinInt) -> Void
+        ) async throws -> Leap_sdkGenerationStats? {
+            // LeapUi.KotlinFloatArray -> Swift [Float] (for use inside this method body):
+            let nsData = LeapUi.ArrayConversionsKt.floatArrayToNSData(array: audioSamples)
+            let samples: [Float] = nsData.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) }
+
+            let audioContent = ChatMessageContent.fromFloatSamples(samples, sampleRate: Int(sampleRate))
             let userMessage = ChatMessage(
                 role: .user,
-                content: [ChatMessageContent.fromFloatSamples(audioSamples, sampleRate: Int(sampleRate))]
+                content: [audioContent as ChatMessageContent],
+                reasoningContent: nil,
+                functionCalls: nil
             )
 
-            var stats: GenerationStats?
+            var stats: Leap_sdkGenerationStats?
             for try await response in conversation.generateResponse(message: userMessage) {
                 switch onEnum(of: response) {
                 case .audioSample(let chunk):
-                    onAudioChunk(chunk.samples, Int32(chunk.sampleRate))
+                    // Bridge LeapSDK.KotlinFloatArray -> LeapUi.KotlinFloatArray via NSData.
+                    let data = LeapSDK.ArrayConversionsKt.floatArrayToNSData(array: chunk.samples)
+                    let uiSamples = LeapUi.ArrayConversionsKt.nsDataToFloatArray(data: data)
+                    onAudioChunk(uiSamples, LeapUi.KotlinInt(value: chunk.sampleRate))
                 case .complete(let c):
                     stats = c.stats
                 case .chunk, .reasoningChunk, .functionCalls:
@@ -335,7 +354,9 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w
         }
 
         func reset() -> VoiceConversation {
-            AppleVoiceConversation(conversation: conversation.modelRunner.createConversation())
+            AppleVoiceConversation(
+                conversation: conversation.modelRunner.createConversation(systemPrompt: nil)
+            )
         }
     }
     ```
@@ -343,11 +364,11 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w
   <Tab title="Kotlin (all platforms)">
     ```kotlin
     import ai.liquid.leap.Conversation
-    import ai.liquid.leap.MessageResponse
+    import ai.liquid.leap.audio.FloatAudioBuffer
     import ai.liquid.leap.message.ChatMessage
     import ai.liquid.leap.message.ChatMessageContent
     import ai.liquid.leap.message.GenerationStats
-    import ai.liquid.leap.message.encodePcm16Wav
+    import ai.liquid.leap.message.MessageResponse
     import ai.liquid.leap.ui.VoiceConversation
 
     class LeapVoiceConversation(private val conv: Conversation) : VoiceConversation {
@@ -357,10 +378,10 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w
             sampleRate: Int,
             onAudioChunk: (samples: FloatArray, sampleRate: Int) -> Unit,
         ): GenerationStats? {
-            val wavBytes = encodePcm16Wav(audioSamples, sampleRate)
+            // Send raw float32 PCM directly — no WAV re-encode needed.
             val userMessage = ChatMessage(
                 role = ChatMessage.Role.USER,
-                content = listOf(ChatMessageContent.Audio(wavBytes)),
+                content = listOf(ChatMessageContent.AudioPcmF32(audioSamples, sampleRate)),
             )
 
             var stats: GenerationStats? = null
diff --git a/examples/android/leap-koog-agent.mdx b/examples/android/leap-koog-agent.mdx
index 6bf77ad0..4049c64d 100644
--- a/examples/android/leap-koog-agent.mdx
+++ b/examples/android/leap-koog-agent.mdx
@@ -71,9 +71,9 @@ Before running this example, ensure you have the following:
 
 <Accordion title="Minimum SDK Requirements">
   This example requires:
-  - **Minimum SDK**: API 24 (Android 7.0)
-  - **Target SDK**: API 34 or higher
-  - **Kotlin**: 1.9.0 or higher
+  - **Minimum SDK**: API 31 (Android 12)
+  - **Target SDK**: API 36
+  - **Kotlin**: 2.3.0 or higher
 
   **Hardware recommendations:**
   - At least 4GB RAM (agents require more memory for reasoning)
@@ -87,17 +87,17 @@ Before running this example, ensure you have the following:
   # Ensure device is connected
   adb devices
 
-  # Create directory
-  adb shell mkdir -p /tmp/models
+  # Create directory (world-readable so the app can read it)
+  adb shell mkdir -p /data/local/tmp/liquid/
 
   # Push the GGUF model file
-  adb push lfm2-1.2b-q5_k_m.gguf /tmp/models/
+  adb push lfm2-1.2b-q5_k_m.gguf /data/local/tmp/liquid/
 
   # Verify deployment
-  adb shell ls -lh /tmp/models/
+  adb shell ls -lh /data/local/tmp/liquid/
   ```
 
-  **Note:** The path `/tmp/models` is used in this example. If you deploy to a different location, update the `modelPath` in your app code accordingly. The example snippets below use `loadSimpleModel(model: ModelSource(...))` to load the sideloaded file; switch to `loadModel(modelName:, quantizationType:)` if you'd rather have the SDK download the model automatically.
+  **Note:** Apps cannot read `/tmp/` on Android — use `/data/local/tmp/<your-namespace>/` for ADB-pushed assets (matches the other Android examples). If you deploy to a different location, update the `modelPath` in your app code accordingly. The example snippets below use `loadSimpleModel(model: ModelSource(...))` to load the sideloaded file; switch to `loadModel(modelName:, quantizationType:)` if you'd rather have the SDK download the model automatically.
 </Accordion>
 
 <Accordion title="Dependencies Setup">
@@ -106,8 +106,8 @@ Before running this example, ensure you have the following:
   ```kotlin
   dependencies {
       // LeapSDK for on-device AI (0.10.0+)
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
-      implementation("ai.liquid.leap:leap-model-downloader:0.10.6")
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.7")
 
       // Koog framework for AI agents
       implementation("ai.koog:koog-agents:0.5.0")
@@ -139,9 +139,9 @@ Follow these steps to build and run AI agents on Android:
    cd LeapSDK-Examples/Android/LeapKoogAgent
    ```
 
-2. **Deploy the model bundle**
+2. **Deploy the model**
    - Follow the ADB commands in the setup section above
-   - Ensure the bundle is at `/tmp/models/lfm2-1.2b-tool.bundle`
+   - Ensure the GGUF is at `/data/local/tmp/liquid/lfm2-1.2b-q5_k_m.gguf`
 
 3. **Open in Android Studio**
    - Launch Android Studio
@@ -176,7 +176,7 @@ Load the LEAP model first, then bridge it to a Koog agent. The Koog APIs below a
 ```kotlin
 import ai.liquid.leap.ModelRunner
 import ai.liquid.leap.manifest.ModelSource
-import ai.liquid.leap.model_downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloader
 
 class AgentViewModel(application: Application) : AndroidViewModel(application) {
     private val downloader = LeapModelDownloader(application)
@@ -188,7 +188,7 @@ class AgentViewModel(application: Application) : AndroidViewModel(application) {
             // Sideloaded GGUF that was pushed via ADB (see Model Setup).
             runner = downloader.loadSimpleModel(
                 model = ModelSource(
-                    modelPath = "/tmp/models/lfm2-1.2b-tool.gguf",
+                    modelPath = "/data/local/tmp/liquid/lfm2-1.2b-q5_k_m.gguf",
                     modelName = "LFM2-1.2B",
                     quantizationId = "Q5_K_M",
                 ),
diff --git a/examples/android/recipe-generator-constrained-output.mdx b/examples/android/recipe-generator-constrained-output.mdx
index 20272ccf..67efce03 100644
--- a/examples/android/recipe-generator-constrained-output.mdx
+++ b/examples/android/recipe-generator-constrained-output.mdx
@@ -63,9 +63,9 @@ Before running this example, ensure you have the following:
 
 <Accordion title="Minimum SDK Requirements">
   This example requires:
-  - **Minimum SDK**: API 24 (Android 7.0)
-  - **Target SDK**: API 34 or higher
-  - **Kotlin**: 1.9.0 or higher
+  - **Minimum SDK**: API 31 (Android 12)
+  - **Target SDK**: API 36
+  - **Kotlin**: 2.3.0 or higher
   - **LeapSDK**: 0.10.0 or higher
   - **Internet connectivity**: Required for first-time model download
 </Accordion>
@@ -105,8 +105,8 @@ Before running this example, ensure you have the following:
   ```kotlin
   dependencies {
       // LeapSDK + the Android downloader module
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
-      implementation("ai.liquid.leap:leap-model-downloader:0.10.6")
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.7")
 
       // Kotlin serialization for type-safe parsing
       implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.6.0")
@@ -148,7 +148,7 @@ Follow these steps to generate structured recipes:
 
 3. **Gradle sync**
    - Wait for Gradle to sync all dependencies
-   - Ensure LeapSDK 0.10.6 is downloaded
+   - Ensure LeapSDK 0.10.7 is downloaded
 
 4. **Run the app**
    - Connect your Android device or start an emulator
@@ -207,13 +207,12 @@ data class Ingredient(
 Annotate the data class with `@Generatable`. LeapSDK derives the JSON schema from the Kotlin types and enforces it during generation — no hand-written schema string required.
 
 ```kotlin
-import ai.liquid.leap.Generatable
-import ai.liquid.leap.Guide
+import ai.liquid.leap.structuredoutput.Generatable
+import ai.liquid.leap.structuredoutput.Guide
 import kotlinx.serialization.Serializable
 
-@Generatable
 @Serializable
-@Guide("A complete recipe with metadata, ingredients, and instructions.")
+@Generatable("A complete recipe with metadata, ingredients, and instructions.")
 data class Recipe(
     val name: String,
     val description: String,
@@ -227,8 +226,8 @@ data class Recipe(
     val tags: List<String>,
 )
 
-@Generatable
 @Serializable
+@Generatable("A single recipe ingredient with amount and unit.")
 data class Ingredient(
     val item: String,
     val amount: String,
@@ -243,7 +242,7 @@ import ai.liquid.leap.GenerationOptions
 import ai.liquid.leap.ModelRunner
 import ai.liquid.leap.message.ChatMessage
 import ai.liquid.leap.message.MessageResponse
-import ai.liquid.leap.model_downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloader
 import android.app.Application
 import androidx.lifecycle.AndroidViewModel
 import androidx.lifecycle.viewModelScope
@@ -283,7 +282,7 @@ class MainActivityViewModel(application: Application) : AndroidViewModel(applica
 
 ### Generate Structured Recipes
 
-`GenerationOptions.build { setResponseFormatType(Recipe::class) }` tells the engine to constrain the stream to the schema derived from `@Generatable`. The streamed `Chunk` values arrive as JSON; concatenate them and decode at the end with `kotlinx-serialization`.
+`GenerationOptions.build { setResponseFormatType<Recipe>() }` tells the engine to constrain the stream to the schema derived from `@Generatable`. The streamed `Chunk` values arrive as JSON; concatenate them and decode at the end with `kotlinx-serialization`.
 
 ```kotlin
 fun generateRecipe(userInput: String) {
@@ -304,12 +303,12 @@ fun generateRecipe(userInput: String) {
             temperature = 0.3f
             minP = 0.15f
             repetitionPenalty = 1.05f
-            setResponseFormatType(Recipe::class)
+            setResponseFormatType<Recipe>()
         }
 
         try {
             val buffer = StringBuilder()
-            conversation.generateResponse(ChatMessage.user(prompt), options)
+            conversation.generateResponse(ChatMessage(ChatMessage.Role.USER, prompt), options)
                 .onEach { resp ->
                     if (resp is MessageResponse.Chunk) buffer.append(resp.text)
                 }
diff --git a/examples/android/slogan-generator.mdx b/examples/android/slogan-generator.mdx
index 177450db..0bc295da 100644
--- a/examples/android/slogan-generator.mdx
+++ b/examples/android/slogan-generator.mdx
@@ -37,9 +37,9 @@ Before running this example, ensure you have the following:
 
 <Accordion title="Minimum SDK Requirements">
   This example requires:
-  - **Minimum SDK**: API 24 (Android 7.0)
-  - **Target SDK**: API 34 or higher
-  - **Kotlin**: 1.9.0 or higher
+  - **Minimum SDK**: API 31 (Android 12)
+  - **Target SDK**: API 36
+  - **Kotlin**: 2.3.0 or higher
 </Accordion>
 
 <Accordion title="LeapSDK Dependency Setup">
@@ -47,7 +47,8 @@ Before running this example, ensure you have the following:
 
   ```kotlin
   dependencies {
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.7")
 
       // Android UI components
       implementation("androidx.appcompat:appcompat:1.6.1")
@@ -147,7 +148,7 @@ import ai.liquid.leap.GenerationOptions
 import ai.liquid.leap.ModelRunner
 import ai.liquid.leap.message.ChatMessage
 import ai.liquid.leap.message.MessageResponse
-import ai.liquid.leap.model_downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloader
 import kotlinx.coroutines.MainScope
 import kotlinx.coroutines.flow.collect
 import kotlinx.coroutines.flow.onEach
@@ -191,7 +192,7 @@ class MainActivity : AppCompatActivity() {
                 minP = 0.15f
                 repetitionPenalty = 1.05f
             }
-            conversation.generateResponse(ChatMessage.user(prompt), options)
+            conversation.generateResponse(ChatMessage(ChatMessage.Role.USER, prompt), options)
                 .onEach { resp ->
                     if (resp is MessageResponse.Chunk) {
                         sloganOutput.append(resp.text)
diff --git a/examples/android/vision-language-model-example.mdx b/examples/android/vision-language-model-example.mdx
index 85656f4a..3b8f7586 100644
--- a/examples/android/vision-language-model-example.mdx
+++ b/examples/android/vision-language-model-example.mdx
@@ -22,7 +22,7 @@ The VLMExample showcases cutting-edge multimodal AI capabilities:
 - **On-device Inference** - Complete privacy with local VLM processing
 - **Interactive Q&A** - Ask questions about images and get contextual answers
 
-This example demonstrates the **LFM2-VL-1.6B** model, a vision-language model that can understand and reason about visual content.
+This example demonstrates the **LFM2.5-VL-1.6B** model, a vision-language model that can understand and reason about visual content.
 
 ## What are Vision Language Models?
 
@@ -59,9 +59,9 @@ Before running this example, ensure you have the following:
 
 <Accordion title="Minimum SDK Requirements">
   This example requires:
-  - **Minimum SDK**: API 24 (Android 7.0)
-  - **Target SDK**: API 34 or higher
-  - **Kotlin**: 1.9.0 or higher
+  - **Minimum SDK**: API 31 (Android 12)
+  - **Target SDK**: API 36
+  - **Kotlin**: 2.3.0 or higher
 
   **Hardware recommendations:**
   - At least 4GB RAM (6GB+ recommended for better performance)
@@ -69,7 +69,7 @@ Before running this example, ensure you have the following:
 </Accordion>
 
 <Accordion title="VLM Model Bundle Deployment">
-  This example requires the **LFM2-VL-1.6B** vision language model bundle.
+  This example requires the **LFM2.5-VL-1.6B** vision language model bundle.
 
   **Step 1: Obtain the model bundle**
 
@@ -92,8 +92,8 @@ Before running this example, ensure you have the following:
   ```kotlin
   dependencies {
       // LeapSDK for VLM processing (0.10.0+)
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
-      implementation("ai.liquid.leap:leap-model-downloader:0.10.6")
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.7")
 
       // Coil for image loading
       implementation("io.coil-kt:coil-compose:2.5.0")
@@ -196,8 +196,9 @@ import ai.liquid.leap.GenerationOptions
 import ai.liquid.leap.ModelRunner
 import ai.liquid.leap.message.ChatMessage
 import ai.liquid.leap.message.ChatMessageContent
+import ai.liquid.leap.message.ImageUtils
 import ai.liquid.leap.message.MessageResponse
-import ai.liquid.leap.model_downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloader
 import android.app.Application
 import androidx.lifecycle.AndroidViewModel
 import androidx.lifecycle.viewModelScope
@@ -205,7 +206,6 @@ import kotlinx.coroutines.CoroutineScope
 import kotlinx.coroutines.Dispatchers
 import kotlinx.coroutines.flow.onEach
 import kotlinx.coroutines.launch
-import java.io.ByteArrayOutputStream
 
 class VLMViewModel(application: Application) : AndroidViewModel(application) {
     private val downloader = LeapModelDownloader(application)
@@ -225,16 +225,16 @@ class VLMViewModel(application: Application) : AndroidViewModel(application) {
         val runner = runner ?: return
         viewModelScope.launch(Dispatchers.Default) {
             val bitmap = loadBitmapFromUri(imageUri)
-            val pngBytes = ByteArrayOutputStream().use { out ->
-                bitmap.compress(Bitmap.CompressFormat.PNG, 100, out)
-                out.toByteArray()
-            }
+            // ChatMessageContent.Image expects JPEG bytes — the secondary ctor wraps them in a
+            // `data:image/jpeg;base64,...` URL. Use the SDK's ImageUtils helper rather than
+            // re-encoding by hand.
+            val imageContent = ImageUtils.fromBitmap(bitmap, compressionQuality = 85)
 
             val conversation = runner.createConversation()
             val message = ChatMessage(
                 role = ChatMessage.Role.USER,
                 content = listOf(
-                    ChatMessageContent.Image(pngBytes),
+                    imageContent,
                     ChatMessageContent.Text("Describe this image in detail."),
                 ),
             )
@@ -336,41 +336,67 @@ fun ImageAnalysisDisplay(analysis: ImageAnalysis) {
 
 ### Interactive Q&A Mode
 
-Allow users to ask questions about images:
+Reuse the streaming pipeline above but parameterize the question. The image is encoded via `ImageUtils.fromBitmap(...)` (suspend, JPEG-encodes internally) and combined with the user's question into a single `ChatMessage`:
 
 ```kotlin
-fun askQuestionAboutImage(bitmap: Bitmap, question: String): String {
-    return vlmModel.generateFromImage(
-        image = bitmap,
-        prompt = "Answer this question about the image: $question",
-        maxTokens = 150
+suspend fun askQuestionAboutImage(
+    runner: ModelRunner,
+    bitmap: Bitmap,
+    question: String,
+    options: GenerationOptions,
+): String {
+    val conversation = runner.createConversation()
+    val message = ChatMessage(
+        role = ChatMessage.Role.USER,
+        content = listOf(
+            ImageUtils.fromBitmap(bitmap, compressionQuality = 85),
+            ChatMessageContent.Text("Answer this question about the image: $question"),
+        ),
     )
+
+    val builder = StringBuilder()
+    conversation.generateResponse(message, options).collect { response ->
+        if (response is MessageResponse.Chunk) builder.append(response.text)
+    }
+    return builder.toString()
 }
 
-// Example usage
-val answer1 = askQuestionAboutImage(bitmap, "What is the main object in this image?")
-val answer2 = askQuestionAboutImage(bitmap, "What colors are prominent?")
-val answer3 = askQuestionAboutImage(bitmap, "Is this indoors or outdoors?")
+// Example usage (inside a coroutine):
+// val answer = askQuestionAboutImage(runner, bitmap, "What colors are prominent?", options)
 ```
 
 ### Memory Management
 
-Vision models require more memory. Implement proper lifecycle handling:
+Vision models require more memory. Free the runner when the activity goes to the background by calling `ModelRunner.unload()`:
 
 ```kotlin
+class VLMViewModel(application: Application) : AndroidViewModel(application) {
+    private var runner: ModelRunner? = null
+
+    suspend fun releaseModel() {
+        runner?.unload()
+        runner = null
+    }
+
+    suspend fun initializeModel() {
+        if (runner != null) return // already loaded — don't re-download
+        // ...same loadModel(...) path as above; assign to runner
+    }
+}
+
 override fun onStop() {
     super.onStop()
-    // Release model when app goes to background to free memory
-    viewModel.releaseModel()
+    lifecycleScope.launch { viewModel.releaseModel() }
 }
 
 override fun onStart() {
     super.onStart()
-    // Reload model when app returns to foreground
-    viewModel.initializeModel()
+    lifecycleScope.launch { viewModel.initializeModel() }
 }
 ```
 
+`ModelRunner.unload()` is `suspend` (per `ai.liquid.leap.ModelRunner`), so call it from a coroutine scope.
+
 ## Results
 
 The VLMExample demonstrates powerful image understanding capabilities:
diff --git a/examples/android/web-content-summarizer.mdx b/examples/android/web-content-summarizer.mdx
index c5dd6d2a..ab85ae3f 100644
--- a/examples/android/web-content-summarizer.mdx
+++ b/examples/android/web-content-summarizer.mdx
@@ -50,9 +50,9 @@ Before running this example, ensure you have the following:
 
 <Accordion title="Minimum SDK Requirements">
   This example requires:
-  - **Minimum SDK**: API 24 (Android 7.0)
-  - **Target SDK**: API 34 or higher
-  - **Kotlin**: 1.9.0 or higher
+  - **Minimum SDK**: API 31 (Android 12)
+  - **Target SDK**: API 36
+  - **Kotlin**: 2.3.0 or higher
 </Accordion>
 
 <Accordion title="Dependencies Setup">
@@ -61,7 +61,8 @@ Before running this example, ensure you have the following:
   ```kotlin
   dependencies {
       // LeapSDK for AI processing (0.10.0+)
-      implementation("ai.liquid.leap:leap-sdk:0.10.6")
+      implementation("ai.liquid.leap:leap-sdk:0.10.7")
+      implementation("ai.liquid.leap:leap-model-downloader:0.10.7")
 
       // Networking for web scraping
       implementation("com.squareup.okhttp3:okhttp:4.12.0")
@@ -212,7 +213,7 @@ import ai.liquid.leap.GenerationOptions
 import ai.liquid.leap.ModelRunner
 import ai.liquid.leap.message.ChatMessage
 import ai.liquid.leap.message.MessageResponse
-import ai.liquid.leap.model_downloader.LeapModelDownloader
+import ai.liquid.leap.downloader.LeapModelDownloader
 import kotlinx.coroutines.flow.onEach
 
 // Cache the runner on a ViewModel or singleton so the model loads once.
@@ -238,7 +239,7 @@ suspend fun summarizeContent(
     }
 
     val out = StringBuilder()
-    conversation.generateResponse(ChatMessage.user(prompt), options)
+    conversation.generateResponse(ChatMessage(ChatMessage.Role.USER, prompt), options)
         .onEach { resp ->
             if (resp is MessageResponse.Chunk) out.append(resp.text)
         }
diff --git a/leap/edge-sdk/overview.mdx b/leap/edge-sdk/overview.mdx
deleted file mode 100644
index e0a43df5..00000000
--- a/leap/edge-sdk/overview.mdx
+++ /dev/null
@@ -1,38 +0,0 @@
----
-title: "Overview"
-description: "The LEAP Edge SDK is a native framework for running LFMs (and other open source models) on mobile devices."
----
-
-## Improving access[​](#improving-access "Direct link to Improving access")
-
-Up until now, deploying small language models (SLMs) on mobile devices has been an extremely painful process, generally accessible to only inference engineers or AI/ML programmers.
-
-Written for Android (Kotlin) and iOS (Swift), the goal of the Edge SDK is to make SLM deployment as easy as calling a cloud LLM API endpoint - for any app developer.
-
-## Get started[​](#get-started "Direct link to Get started")
-
-Choose your platform to get started
-
-<CardGroup cols={2}>
-  <Card title="iOS" icon="apple" href="./ios/ios-quick-start-guide">
-    Get started with the LEAP Edge SDK for iOS using Swift. Deploy models directly in your iOS app.
-  </Card>
-
-  <Card title="Android" icon="robot" href="./android/android-quick-start-guide">
-    Get started with the LEAP Edge SDK for Android using Kotlin. Deploy models directly in your Android app.
-  </Card>
-</CardGroup>
-
-## Features[​](#features "Direct link to Features")
-
-The current list of main features includes:
-
-* Model downloading service
-* Chat completion (generation)
-* Constrained generation
-* Function calling
-* Gson support (Android)
-* Image support (for LFM2-VL)
-
-We are consistently adding to this list - see our [changelog](/leap/changelog) for detailed updates.
-