diff --git a/deployment/on-device/leap-sdk-changelog.mdx b/deployment/on-device/leap-sdk-changelog.mdx index cf3c82a4..0c1a4d79 100644 --- a/deployment/on-device/leap-sdk-changelog.mdx +++ b/deployment/on-device/leap-sdk-changelog.mdx @@ -3,7 +3,7 @@ title: "Changelog" description: "Release notes for the LEAP SDK, including the 0.9.x → 0.10.x Kotlin Multiplatform transition." --- -Latest release: **v0.10.6** ([GitHub](https://github.com/Liquid4All/leap-sdk/releases/tag/v0.10.6)). +Latest release: **v0.10.7** ([GitHub](https://github.com/Liquid4All/leap-sdk/releases/tag/v0.10.7)). This page covers user-visible changes in the LEAP SDK across releases. For per-build commit detail, see the release notes on [`Liquid4All/leap-sdk`](https://github.com/Liquid4All/leap-sdk/releases). @@ -37,7 +37,7 @@ v0.10.0 raises the minimum iOS deployment target from 15.0 to **17.0** and macOS - **SPM URL change.** Point your Swift Package Manager dependency at `https://github.com/Liquid4All/leap-sdk.git` (not the deprecated `leap-ios` repo). - **CocoaPods removed.** The SDK ships exclusively through SPM in v0.10.0 onward. - **Toolchain bump.** Xcode 16 and Swift 6.0 are required. -- **`ModelDownloader` → `LeapModelDownloader`.** The downloader class was renamed; update call sites accordingly. See [Model Loading](/deployment/on-device/sdk/model-loading) for the 0.10.x constructor signature. +- **Swift downloader name.** In current 0.10.x, Swift code instantiates `ModelDownloader` from the `LeapModelDownloader` SPM product. Android code still uses the Kotlin class `ai.liquid.leap.downloader.LeapModelDownloader`. See [Model Loading](/deployment/on-device/sdk/model-loading) for the constructor signatures. ## Major additions since 0.9.x @@ -147,6 +147,34 @@ val runner = downloader.loadModel( ## Per-release notes +### v0.10.7 — 2026-05-18 + +KMP target completion for `leap-openai-client` plus a repo-wide bytecode-hardening pass. iOS / macOS Swift surface is unchanged from v0.10.6 — this is a Kotlin/JVM ergonomics release for non-Apple consumers. + +**New targets on `leap-openai-client`** ([PR #256](https://github.com/Liquid4All/leap-android-sdk/pull/256)): + +- **`jvm`** (Ktor CIO engine) — Maven Central now publishes `ai.liquid.leap:leap-openai-client-jvm:0.10.7`. Pure-JVM desktop / server apps can route OpenAI-compatible chat completions without dragging in Android or KMP targets. (The 0.10.0 — 0.10.6 SPM cascade only shipped Android + Apple + Linux/MinGW K/N + wasmJs metadata; the JVM slice was absent.) +- **`wasmJs`** (Ktor Js engine) — browser-side chat-completions client matching what `leap-sdk` already targets. + +The Apple slice (`LeapOpenAIClient.xcframework`) ships unchanged — same SSE-stream surface, same `OpenAiClientConfig`, same OpenRouter extra-headers support. SKIE is still not applied to this module in v0.10.7, so the Kotlin/Native exports remain the same as v0.10.6: `Flow` is not bridged to Swift `AsyncSequence`, and `onEnum(of:)` is not generated for `ChatCompletionEvent`. **The next release will enable SKIE on `leap-sdk-openai-client`**, bringing `for try await` over the stream, exhaustive `onEnum(of:)` switching, and SKIE-bundled Swift convenience inits — see the [OpenAI client page](/deployment/on-device/sdk/openai-client) for the current pinning guidance. + +**Bytecode hardening:** + +- The `leap-sdk-jvm`, `leap-openai-client-jvm`, `leap-ui-jvm`, and `leap-ui-android` artifacts had been silently shipping Java 17 / Java 21 bytecode against the project's stated JVM-target-11 stance. All ten published JVM / Android slices now consistently emit class-file major version `0x37` (Java 11). Consumers running on JDK 11 — particularly long-running services and JDK-11-pinned Android Gradle builds — are no longer at risk of `UnsupportedClassVersionError`. + +**Internal: KMP build centralization** (no consumer-visible API change): + +- Root-level `subprojects { tasks.withType().configureEach { compilerOptions { jvmTarget.set(JvmTarget.JVM_11) } } }` replaces 17 per-site `JVM_11` pins. +- Karma + headless Chrome runner for `wasmJs` targets centralized into the same `subprojects {}` block — replaces 3 per-site copies. Future modules pick up both patterns automatically. + +**Test coverage:** + +- `OpenAiClientTest`'s seven SSE-stream + auth-header + error-event + malformed-chunk cases were promoted from `androidHostTest` to `commonTest`. They now also run on `jvmTest`, `macosArm64Test`, `iosSimulatorArm64Test`, `linuxX64Test`, `mingwX64Test`, and `wasmJsTest`. + +**iOS surface (unchanged from v0.10.6):** + +The four XCFrameworks (`LeapSDK`, `LeapModelDownloader`, `LeapOpenAIClient`, `LeapUi`) ship the same Swift APIs as v0.10.6. The v0.10.6 ObjC class rename to `ModelDownloader`, the dual-import guard, the dynamic `LeapModelDownloader` framework, and the `LeapDownloaderConfig()` parameterless init all remain in place. + ### v0.10.6 — 2026-05-12 iOS `ModelDownloader` (the Swift class formerly known as `LeapModelDownloader` — see the rename note below) reaches parity with the cross-platform `LeapDownloader`. Callers no longer need to pair the two classes to download and load a model on Apple platforms — every entry point routes file transfer through `URLSession` and then hands off to the loader. diff --git a/deployment/on-device/sdk/advanced-features.mdx b/deployment/on-device/sdk/advanced-features.mdx index 74ce2176..88baa9de 100644 --- a/deployment/on-device/sdk/advanced-features.mdx +++ b/deployment/on-device/sdk/advanced-features.mdx @@ -11,25 +11,25 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani + `GenerationOptions` is a Kotlin `data class` bridged into Swift. Kotlin parameter defaults don't survive the ObjC bridge, so the canonical Swift idiom is the parameterless init plus chained `.with(...)` builders: + ```swift - public struct GenerationOptions { + public class GenerationOptions { public var temperature: Float? public var topP: Float? public var minP: Float? public var repetitionPenalty: Float? + public var topK: Int32? + public var rngSeed: Int64? public var jsonSchemaConstraint: String? - public var functionCallParser: LeapFunctionCallParserProtocol? - - public init( - temperature: Float? = nil, - topP: Float? = nil, - minP: Float? = nil, - repetitionPenalty: Float? = nil, - jsonSchemaConstraint: String? = nil, - functionCallParser: LeapFunctionCallParserProtocol? = LFMFunctionCallParser() - ) - - public mutating func setResponseFormat(type: T.Type) throws + public var functionCallParser: LeapFunctionCallParser? + public var injectSchemaIntoPrompt: Bool // default true + public var maxTokens: Int32? + public var inlineThinkingTags: Bool // default false + public var enableThinking: Bool // default false + public var extras: String? + + public convenience init() } ``` @@ -40,8 +40,10 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani .with(temperature: 0.3) .with(minP: 0.15) .with(repetitionPenalty: 1.05) - .with(jsonSchema: schemaString) + .with(jsonSchema: CityFact.jsonSchema()) // or any other schema string ``` + + For the legacy compat-class path (`Leap.load(...)` flows), `GenerationOptionsCompat` additionally exposes `setResponseFormat(jsonSchema: String)`. ```kotlin @@ -50,10 +52,17 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani var topP: Float? = null, var minP: Float? = null, var repetitionPenalty: Float? = null, + var topK: Int? = null, + var rngSeed: Long? = null, var jsonSchemaConstraint: String? = null, var functionCallParser: LeapFunctionCallParser? = LFMFunctionCallParser(), + var injectSchemaIntoPrompt: Boolean = true, + var maxTokens: Int? = null, + var inlineThinkingTags: Boolean = false, + var enableThinking: Boolean = false, + var extras: String? = null, ) { - fun setResponseFormatType(kClass: KClass<*>) + inline fun setResponseFormatType() companion object { fun build(buildAction: GenerationOptions.() -> Unit): GenerationOptions @@ -63,9 +72,14 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani -- **Sampling fields** — use the model card's recommended values; arbitrary defaults from generic tutorials usually underperform. -- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the `setResponseFormat(type:)` / `setResponseFormatType(...)` helpers instead of writing the schema by hand. +- **Sampling fields** — `temperature`, `topP`, `minP`, `topK`, and `repetitionPenalty`. Use the model bundle's recommended values; arbitrary defaults from generic tutorials usually underperform. +- **`rngSeed`** — deterministic sampling seed for tests and reproducible runs. +- **`maxTokens`** — maximum completion tokens to generate. Prompt tokens do not count toward this cap. +- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the higher-level helpers — Swift `options.with(jsonSchema: T.jsonSchema())` / Kotlin `setResponseFormatType()` — instead of writing the schema by hand. +- **`injectSchemaIntoPrompt`** — when `true` (default), the schema is also appended to the system message for semantic guidance. Set `false` to use only the structural constraint. - **`functionCallParser`** — `LFMFunctionCallParser` (default), `HermesFunctionCallParser()`, or `null`/`nil` to disable parsing and surface raw tool-call text in `Chunk`s. +- **`enableThinking` / `inlineThinkingTags`** — reasoning-mode controls for models that emit `` content. +- **`extras`** — backend-specific JSON payload. ## Constrained generation utilities @@ -73,11 +87,13 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani ```swift // Compile-time schema synthesis lives in the @Generatable macro. - // For ad-hoc inspection: - let schemaString = try JSONSchemaGenerator.getJSONSchema(for: CityFact.self) + // For ad-hoc inspection (ships in the LeapSDKMacros SPM product): + import LeapSDKMacros + + let schemaString = JSONSchemaGenerator.getJSONSchema(for: CityFact.self) ``` - `JSONSchemaGenerator.getJSONSchema(for:)` returns the same JSON Schema string the macro emits at compile time. Useful when embedding the schema in the prompt itself, or when you want to debug the schema the model is being constrained against. + `JSONSchemaGenerator.getJSONSchema(for:)` is non-throwing — it forwards to the `jsonSchema()` method that the `@Generatable` macro adds to the type, so the schema is produced at compile time. Useful when embedding the schema in the prompt itself, or when you want to debug the schema the model is being constrained against. See [Constrained Generation](./constrained-generation) for the full `@Generatable` / `@Guide` macro reference. @@ -89,11 +105,14 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani object JSONSchemaGenerator { @Throws(LeapGeneratableSchematizationException::class) - fun getJSONSchema(klass: KClass, indentSpaces: Int? = null): String + fun getJSONSchema(serializer: KSerializer, indentSpaces: Int? = null): String + + @Throws(LeapGeneratableSchematizationException::class) + inline fun getJSONSchema(indentSpaces: Int? = null): String } ``` - - `klass` — must be a data class annotated with `@Generatable`. + - `serializer` — the `KSerializer` for a data class annotated with `@Generatable` and `@Serializable`. The reified-`T` overload calls `serializer()` for you. - `indentSpaces` — non-null formats the output with the given indent (pretty-print). Throws `LeapGeneratableSchematizationException` if the class can't be translated. @@ -101,16 +120,18 @@ Per-request controls. Leave any field as `null` / `nil` to fall back to the mani ### `GeneratableFactory` ```kotlin + import kotlinx.serialization.json.JsonObject + object GeneratableFactory { @Throws(LeapGeneratableDeserializationException::class) - fun createFromJSONObject(jsonObject: JSONObject, klass: KClass): T + fun createFromJsonObject(jsonObject: JsonObject, serializer: KSerializer): T @Throws(LeapGeneratableDeserializationException::class) - inline fun createFromJSONObject(jsonObject: JSONObject): T + inline fun createFromJsonObject(jsonObject: JsonObject): T } ``` - The reified-`T` overload is a convenience when the target type can be inferred from context. + Note the camelCase `Json` in the method name and the `kotlinx.serialization.json.JsonObject` argument (not `org.json.JSONObject`). The reified-`T` overload is a convenience when the target type can be inferred from context. ### Annotations @@ -179,16 +200,19 @@ The full surface is documented in [Function Calling](./function-calling); the ty val optional: Boolean = false, ) - sealed class LeapFunctionParameterType(description: String? = null) { - val description: String? = description - - class String(val enumValues: List? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description) - class Number(val enumValues: List? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description) - class Integer(val enumValues: List? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description) - class Boolean(description: kotlin.String? = null) : LeapFunctionParameterType(description) - class Null : LeapFunctionParameterType() - class Array(val itemType: LeapFunctionParameterType, description: kotlin.String? = null) : LeapFunctionParameterType(description) - class Object( + sealed class LeapFunctionParameterType(typeDescription: kotlin.String? = null) { + var description: kotlin.String? = typeDescription + private set + + // Nested class names carry a `Leap` prefix so they don't shadow `kotlin.String`, + // `kotlin.Number`, etc. at use sites. + class LeapStr(val enumValues: List? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description) + class LeapNum(val enumValues: List? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description) + class LeapInt(val enumValues: List? = null, description: kotlin.String? = null) : LeapFunctionParameterType(description) + class LeapBool(description: kotlin.String? = null) : LeapFunctionParameterType(description) + class LeapNull : LeapFunctionParameterType() + class LeapArr(val itemType: LeapFunctionParameterType, description: kotlin.String? = null) : LeapFunctionParameterType(description) + class LeapObj( val properties: Map, val required: List = listOf(), description: kotlin.String? = null, @@ -210,29 +234,25 @@ Two parser implementations ship with the SDK on every platform: - **`LFMFunctionCallParser`** — default. Handles Liquid Foundation Model (LFM2) Pythonic-style control tokens (`<|tool_call_start|>` / `<|tool_call_end|>`). - **`HermesFunctionCallParser`** — Qwen3 and other models using the [Hermes function-calling format](https://github.com/NousResearch/Hermes-Function-Calling). -Implement `LeapFunctionCallParserProtocol` (Swift) / `LeapFunctionCallParser` (Kotlin) to add support for a new format. +Subclass `LeapFunctionCallParser` (Kotlin `abstract class`, bridged to Swift as a class with the same name) to add support for a new format. -## Backend-specific extras +## Prompt token budgeting -Some runtime utilities are exposed on the concrete `LiquidInferenceEngineRunner` rather than the cross-platform `ModelRunner` protocol/interface. The most common is **prompt token budgeting** — useful when you need to estimate context usage before sending a long request. +`getPromptTokensSize(messages:, addBosToken:)` is declared directly on `ModelRunner` — no cast required. Useful when you need to estimate context usage before sending a long request. ```swift - if let engine = runner as? LiquidInferenceEngineRunner { - let count = engine.getPromptTokensSize(messages: history, addBosToken: true) - print("Prompt would consume \(count) tokens") - } + let count = try await runner.getPromptTokensSize(messages: history, addBosToken: true) + print("Prompt would consume \(count) tokens") ``` ```kotlin - (runner as? LiquidInferenceEngineRunner)?.let { engine -> - val count = engine.getPromptTokensSize(messages = history, addBosToken = true) - println("Prompt would consume $count tokens") - } + val count = runner.getPromptTokensSize(messages = history, addBosToken = true) + println("Prompt would consume $count tokens") ``` + + `getPromptTokensSize` is `suspend` — call it from a coroutine. - -These methods are backend-specific and may be elevated to the `ModelRunner` interface in a future release — defensively check the cast. diff --git a/deployment/on-device/sdk/ai-agent-usage-guide.mdx b/deployment/on-device/sdk/ai-agent-usage-guide.mdx index de399d02..bd117c2b 100644 --- a/deployment/on-device/sdk/ai-agent-usage-guide.mdx +++ b/deployment/on-device/sdk/ai-agent-usage-guide.mdx @@ -49,6 +49,9 @@ Every agent has the same shape: send a `ChatMessage`, iterate the response strea Task { await dispatch(call) } } case .audioSample(let audio): + // `audio.samples` is `KotlinFloatArray` — bridge to `[Float]` via + // `LeapSDK.ArrayConversionsKt.floatArrayToNSData(array:)` if your + // renderer expects a Swift array (see the demo in leap-ui-demo/shared/). audioPlayer.enqueue(audio.samples, sampleRate: Int(audio.sampleRate)) case .complete(let completion): currentText = "" @@ -88,7 +91,7 @@ The defining feature of an agent: the model emits `FunctionCalls`, you execute t ```swift func agentLoop(initialQuestion: String) async throws { var workingConv = conversation! - var pending = ChatMessage(role: .user, content: [.text(initialQuestion)]) + var pending = ChatMessage(role: .user, textContent: initialQuestion) while true { var toolCalls: [LeapFunctionCall] = [] @@ -107,14 +110,16 @@ The defining feature of an agent: the model emits `FunctionCalls`, you execute t if toolCalls.isEmpty { break } // Agent is done - // Execute tools, append results, loop - let toolMessages = await toolCalls.asyncMap { call in + // Execute tools sequentially, append results, loop. + // (Swift's stdlib has no `asyncMap`; use a `for await` accumulation pass.) + var toolMessages: [ChatMessage] = [] + for call in toolCalls { let result = await runtimeDispatch(call) - return ChatMessage(role: .tool, content: [.text(result)]) + toolMessages.append(ChatMessage(role: .tool, textContent: result)) } let updatedHistory = workingConv.history + toolMessages workingConv = workingConv.modelRunner.createConversationFromHistory(history: updatedHistory) - pending = ChatMessage(role: .user, content: [.text("")]) // empty turn — let the model continue + pending = ChatMessage(role: .user, textContent: "") // empty turn — let the model continue } } ``` @@ -148,7 +153,7 @@ The defining feature of an agent: the model emits `FunctionCalls`, you execute t ) } val updatedHistory = workingConv.history + toolMessages - workingConv = modelRunner.createConversationFromHistory(updatedHistory) + workingConv = workingConv.modelRunner.createConversationFromHistory(updatedHistory) pending = ChatMessage(role = ChatMessage.Role.USER, content = listOf(ChatMessageContent.Text(""))) } } @@ -161,7 +166,7 @@ Define `runtimeDispatch(_:)` as your tool-call → result router: validate argum ## Multimodal inputs -**Multimodality is model-specific.** Most multimodal models ship as text + one other modality (vision OR audio), not both. Send `.image(...)` parts only to a vision-capable model and `.audio(...)` parts only to an audio-capable model. Verify on the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up the input. +**Multimodality is model-specific.** Most multimodal models ship as text + one other modality (vision OR audio), not both. Send image parts (Swift `ChatMessageContent.fromJPEGData(_:)` / Kotlin `ImageUtils.fromBitmap(...)`) only to a vision-capable model, and audio parts (Swift `ChatMessageContent.fromWAVData(_:)` / Kotlin `ChatMessageContent.Audio(...)`) only to an audio-capable model. Verify on the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up the input. @@ -170,20 +175,26 @@ Define `runtimeDispatch(_:)` as your tool-call → result router: validate argum // Vision-capable model let imageMessage = ChatMessage( role: .user, - content: [.text("Describe what you see."), .image(jpegData)] + content: [.text("Describe what you see."), ChatMessageContent.fromJPEGData(jpegData)], + reasoningContent: nil, + functionCalls: nil ) // Audio-capable model — WAV blob let audioMessage = ChatMessage( role: .user, - content: [.text("Transcribe."), .audio(wavData)] + content: [.text("Transcribe."), ChatMessageContent.fromWAVData(wavData)], + reasoningContent: nil, + functionCalls: nil ) // Audio-capable model — raw float32 PCM samples (no WAV re-encode) let pcmMessage = ChatMessage( role: .user, content: [.text("How's my pronunciation?"), - ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)] + ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)], + reasoningContent: nil, + functionCalls: nil ) ``` @@ -272,10 +283,14 @@ A `ChatViewModel` that loads the model, registers a tool, drives generation, and generationTask = Task { [weak self] in defer { Task { @MainActor in self?.isGenerating = false } } do { - let userMessage = ChatMessage(role: .user, content: [.text(text)]) + let userMessage = ChatMessage(role: .user, textContent: text) + let options = GenerationOptions() + .with(temperature: 0.3) + .with(minP: 0.15) + .with(repetitionPenalty: 1.05) for try await response in conversation.generateResponse( message: userMessage, - generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05) + generationOptions: options ) { await MainActor.run { self?.handle(response) } } @@ -314,11 +329,11 @@ A `ChatViewModel` that loads the model, registers a tool, drives generation, and import androidx.lifecycle.viewModelScope import ai.liquid.leap.Conversation import ai.liquid.leap.GenerationOptions - import ai.liquid.leap.MessageResponse + import ai.liquid.leap.message.MessageResponse import ai.liquid.leap.ModelRunner import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.ChatMessageContent - import ai.liquid.leap.model_downloader.LeapModelDownloader + import ai.liquid.leap.downloader.LeapModelDownloader import kotlinx.coroutines.* import kotlinx.coroutines.flow.* @@ -430,14 +445,14 @@ A `ChatViewModel` that loads the model, registers a tool, drives generation, and - **Min SDK 31** (Android 12). - Use a real device for testing — the emulator may crash loading model bundles. - `LeapModelDownloader` (the Android one) requires `POST_NOTIFICATIONS` at runtime on Android 13+ and a few manifest entries — see [Quick Start → Install the SDK](./quick-start#2-install-the-sdk). - - Background downloads use WorkManager + a foreground service; the SDK ships notification configuration via `LeapModelDownloaderNotificationConfig`. + - Background prefetch uses `requestDownloadModel(...)`, which enqueues the WorkManager downloader and runs it as a foreground worker while files transfer. The SDK ships notification configuration via `LeapModelDownloaderNotificationConfig`. - For most cases, hold the runner in a `ViewModel` with `viewModelScope`. Unload via `runBlocking(Dispatchers.IO) { runner.unload() }` in `onCleared()`. - JVM: JDK 11+. No `Context` parameter, no foreground service, no notifications — `LeapDownloader` is a simple async fetcher with a configurable `saveDir`. - Linux native runtime: glibc **2.34+** (Ubuntu 22.04, Debian 12, RHEL 9 or newer). Older hosts fail at process start. - Windows native: Windows 10+. DLLs co-locate next to the `.exe` (Windows' standard search order finds them). - - **Pin to 0.10.5+** for Kotlin/Native — earlier 0.10.x releases have unresolved cinterop / linker issues that prevent producing a working executable. See [Desktop & Native Platforms](./desktop-platforms). + - **Pin to 0.10.7 or newer** for Kotlin/Native — earlier 0.10.x releases (0.10.0, 0.10.1) have unresolved cinterop / linker issues that prevent producing a working executable; the fixes shipped in the 0.10.4.x point releases (SPM) and v0.10.6 / v0.10.7 (Android-SDK repo). See [Desktop & Native Platforms](./desktop-platforms). diff --git a/deployment/on-device/sdk/cloud-ai-comparison.mdx b/deployment/on-device/sdk/cloud-ai-comparison.mdx index a1298632..06be32e4 100644 --- a/deployment/on-device/sdk/cloud-ai-comparison.mdx +++ b/deployment/on-device/sdk/cloud-ai-comparison.mdx @@ -237,10 +237,10 @@ Both LEAP and the OpenAI Python streaming client run inside an async context. Th | Concept | OpenAI | LEAP | |---|---|---| -| Role-tagged messages | `{"role": "user", "content": "..."}` | `ChatMessage(role: .user, content: [.text("...")])` | -| Streaming responses | `stream=True` iterator | `AsyncThrowingStream` (Swift) / `Flow` (Kotlin) | -| Function calling | Tool definitions + `tool_calls` field | `registerFunction(LeapFunction)` + `MessageResponse.functionCalls` | -| Structured output | `response_format = json_schema` | `GenerationOptions.setResponseFormat(type:)` | +| Role-tagged messages | `{"role": "user", "content": "..."}` | `ChatMessage(role: .user, textContent: "...")` | +| Streaming responses | `stream=True` iterator | `SkieSwiftFlow` (Swift, iterable with `for try await`) / `Flow` (Kotlin) | +| Function calling | Tool definitions + `tool_calls` field | `registerFunction(LeapFunction)` + `MessageResponse.FunctionCalls` | +| Structured output | `response_format = json_schema` | Swift `options.with(jsonSchema: T.jsonSchema())` / Kotlin `setResponseFormatType()` | | Token usage stats | `usage` object on completion | `Complete.stats` (`promptTokens`, `completionTokens`, `tokenPerSecond`) | ## What's different diff --git a/deployment/on-device/sdk/constrained-generation.mdx b/deployment/on-device/sdk/constrained-generation.mdx index a92bf2b7..11573bf9 100644 --- a/deployment/on-device/sdk/constrained-generation.mdx +++ b/deployment/on-device/sdk/constrained-generation.mdx @@ -3,7 +3,7 @@ title: "Constrained Generation" description: "Generate structured JSON output with compile-time validation — same approach on every platform." --- -Constrained generation forces the model to emit JSON matching a schema. Use the language's native facility — Swift macros (`@Generatable` / `@Guide`) or Kotlin annotations (`@Generatable` / `@Guide`) — to define the structure, then set it on `GenerationOptions`. The schema is computed at compile time (Swift) or via reflection at load time (Kotlin), and the model's output decodes directly into your type. +Constrained generation forces the model to emit JSON matching a schema. Use the language's native facility — Swift macros (`@Generatable` / `@Guide`) or Kotlin annotations (`@Generatable` / `@Guide`) — to define the structure, then set it on `GenerationOptions`. The schema is computed at compile time (Swift) or built from the `kotlinx.serialization` descriptor at runtime (Kotlin), and the model's output decodes directly into your type. ## Define the structured type @@ -13,6 +13,7 @@ Constrained generation forces the model to emit JSON matching a schema. Use the ```swift import LeapModelDownloader + import LeapSDKMacros @Generatable("A joke with metadata") struct Joke: Codable { @@ -35,16 +36,26 @@ Constrained generation forces the model to emit JSON matching a schema. Use the - `@Generatable` and `@Guide` are runtime annotations on Kotlin `data class` declarations. All properties must be declared in the primary constructor. + `@Generatable` and `@Guide` are `@SerialInfo` annotations applied to `@Serializable` Kotlin `data class` declarations. All properties must be declared in the primary constructor, and the class itself must carry `@kotlinx.serialization.Serializable` so the schema generator can read its descriptor. ```kotlin package ai.liquid.leap.structuredoutput + @kotlinx.serialization.SerialInfo + @Target(AnnotationTarget.CLASS) + @Retention(AnnotationRetention.RUNTIME) annotation class Generatable(val description: String) + + @kotlinx.serialization.SerialInfo + @Target(AnnotationTarget.PROPERTY) + @Retention(AnnotationRetention.RUNTIME) annotation class Guide(val description: String) ``` ```kotlin + import kotlinx.serialization.Serializable + + @Serializable @Generatable(description = "Facts about a city") data class CityFact( @Guide(description = "Name of the city") @@ -60,6 +71,8 @@ Constrained generation forces the model to emit JSON matching a schema. Use the val placeOfInterests: List, ) ``` + + The `@Serializable` annotation is required — `JSONSchemaGenerator.getJSONSchema()` resolves the type via `kotlinx.serialization`'s `serializer()` and throws `LeapGeneratableSchematizationException("Type must be @Serializable to generate JSON Schema")` if the type isn't serializable. @@ -68,10 +81,18 @@ Constrained generation forces the model to emit JSON matching a schema. Use the ```swift - var options = GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05) - try options.setResponseFormat(type: Joke.self) + let options = GenerationOptions() + .with(temperature: 0.3) + .with(minP: 0.15) + .with(repetitionPenalty: 1.05) + .with(jsonSchema: Joke.jsonSchema()) - let message = ChatMessage(role: .user, content: [.text("Tell me a programming joke")]) + let message = ChatMessage( + role: .user, + content: [.text("Tell me a programming joke")], + reasoningContent: nil, + functionCalls: nil + ) for try await response in conversation.generateResponse( message: message, @@ -91,8 +112,11 @@ Constrained generation forces the model to emit JSON matching a schema. Use the ```kotlin + import kotlinx.serialization.json.Json + import kotlinx.serialization.json.JsonObject + val options = GenerationOptions.build { - setResponseFormatType(CityFact::class) + setResponseFormatType() temperature = 0.3f minP = 0.15f repetitionPenalty = 1.05f @@ -102,14 +126,15 @@ Constrained generation forces the model to emit JSON matching a schema. Use the .onEach { response -> if (response is MessageResponse.Complete) { val jsonContent = (response.fullMessage.content.first() as ChatMessageContent.Text).text - val cityFact: CityFact = GeneratableFactory.createFromJSONObject(JSONObject(jsonContent)) + val kxObj = Json.parseToJsonElement(jsonContent) as JsonObject + val cityFact = GeneratableFactory.createFromJsonObject(kxObj) println(cityFact) } } .collect() ``` - If the model's JSON doesn't deserialize cleanly into the data class, `GeneratableFactory.createFromJSONObject` throws `LeapGeneratableDeserializationException`. + If the model's JSON doesn't deserialize cleanly into the data class, `GeneratableFactory.createFromJsonObject` throws `LeapGeneratableDeserializationException`. @@ -120,16 +145,20 @@ Some models do better when the JSON Schema is also included in the prompt text. ```swift - let schemaString = try JSONSchemaGenerator.getJSONSchema(for: Joke.self) + let schemaString = JSONSchemaGenerator.getJSONSchema(for: Joke.self) let message = ChatMessage( role: .user, - content: [.text("Tell me a programming joke following this JSON Schema: \(schemaString)")] + content: [.text("Tell me a programming joke following this JSON Schema: \(schemaString)")], + reasoningContent: nil, + functionCalls: nil ) ``` + + `JSONSchemaGenerator.getJSONSchema(for:)` ships in the `LeapSDKMacros` product (no `try` needed; it's non-throwing — it just forwards to the macro-synthesized `Joke.jsonSchema()`). ```kotlin - val jsonSchema = JSONSchemaGenerator.getJSONSchema(CityFact::class) + val jsonSchema = JSONSchemaGenerator.getJSONSchema() conversation.generateResponse( "Show the city facts about Tokyo following this JSON Schema: $jsonSchema", options @@ -194,6 +223,9 @@ Composition types are supported as long as the leaf types are supported. ```kotlin + import kotlinx.serialization.Serializable + + @Serializable @Generatable("A recipe with ingredients and instructions") data class Recipe( @Guide("Name of the dish") @@ -215,6 +247,7 @@ Composition types are supported as long as the leaf types are supported. val nutrition: NutritionInfo? = null, ) + @Serializable @Generatable("Nutritional information for a recipe") data class NutritionInfo( @Guide("Calories per serving") @@ -247,7 +280,7 @@ Smaller, single-responsibility types produce better output than sprawling struct ### Lower temperature for structured output -Temperature `0.3–0.5` typically improves adherence to the schema. The default `0.7` is biased toward conversational variation that doesn't help when you need parseable JSON. +Temperature `0.1–0.3` typically improves adherence to the schema — high-temperature sampling adds variation that doesn't help when you need parseable JSON. Use the per-model defaults the LFM model cards recommend (e.g. `0.1` for instruct/VL, `0.3` for LFM2 text) and lower from there if the model strays. ### Validate the decoded output @@ -269,10 +302,14 @@ Even with constrained generation, you should handle parse failures gracefully. T ```kotlin - fun parse(jsonText: String, kClass: KClass): T? = try { - GeneratableFactory.createFromJSONObject(JSONObject(jsonText)) as T + import kotlinx.serialization.json.Json + import kotlinx.serialization.json.JsonObject + + inline fun parse(jsonText: String): T? = try { + val kxObj = Json.parseToJsonElement(jsonText) as JsonObject + GeneratableFactory.createFromJsonObject(kxObj) } catch (e: LeapGeneratableDeserializationException) { - Log.e(TAG, "Failed to decode response as ${kClass.simpleName}", e) + Log.e(TAG, "Failed to decode response as ${T::class.simpleName}", e) null } ``` @@ -281,8 +318,8 @@ Even with constrained generation, you should handle parse failures gracefully. T ## How it works -1. **Compile/load time** — `@Generatable` produces a JSON Schema for your type. (Swift: compile-time macro; Kotlin: reflective build at load time.) -2. **Configuration** — `GenerationOptions.setResponseFormat(type:)` / `setResponseFormatType(...)` installs the schema as `jsonSchemaConstraint` on the generation options. +1. **Compile/load time** — `@Generatable` produces a JSON Schema for your type. (Swift: compile-time macro emits `jsonSchema()`; Kotlin: built at runtime from the `kotlinx.serialization` descriptor.) +2. **Configuration** — Swift `options.with(jsonSchema: T.jsonSchema())` (or `GenerationOptionsCompat.setResponseFormat(jsonSchema:)`) / Kotlin `setResponseFormatType()` installs the schema as `jsonSchemaConstraint` on the generation options. 3. **Generation** — the SDK constrains decoding so only tokens that produce schema-valid JSON are emitted. The model's output is guaranteed to parse. ## Error handling diff --git a/deployment/on-device/sdk/conversation-generation.mdx b/deployment/on-device/sdk/conversation-generation.mdx index 6925d55a..e9815a26 100644 --- a/deployment/on-device/sdk/conversation-generation.mdx +++ b/deployment/on-device/sdk/conversation-generation.mdx @@ -60,30 +60,34 @@ Hold a strong reference for as long as you need to perform generations, then cal + `Conversation` is a Kotlin `interface` bridged to Swift as a protocol — the get-only properties surface as `{ get }` in Swift. The generation methods return a SKIE-bridged `SkieSwiftFlow` (iterable with `for try await`): + ```swift - public class Conversation { - public let modelRunner: ModelRunner - public private(set) var history: [ChatMessage] - public private(set) var functions: [LeapFunction] - public private(set) var isGenerating: Bool - - public func registerFunction(_ function: LeapFunction) - public func registerFunctions(_ functions: [LeapFunction]) - public func appendToHistory(_ message: ChatMessage) - public func removeLastMessage() - public func exportToJSON() throws -> [[String: Any]] - - public func generateResponse( + public protocol Conversation { + var modelRunner: ModelRunner { get } + var history: [ChatMessage] { get } + var functions: [LeapFunction] { get } + var isGenerating: Bool { get } + + func registerFunction(function: LeapFunction) + func registerFunctions(functions: [LeapFunction]) + func appendToHistory(message: ChatMessage) + func removeLastMessage() + func exportToJSON() -> String + + func generateResponse( userTextMessage: String, - generationOptions: GenerationOptions? = nil - ) -> AsyncThrowingStream + generationOptions: GenerationOptions? + ) -> SkieSwiftFlow - public func generateResponse( + func generateResponse( message: ChatMessage, - generationOptions: GenerationOptions? = nil - ) -> AsyncThrowingStream + generationOptions: GenerationOptions? + ) -> SkieSwiftFlow } ``` + + Kotlin parameter defaults don't propagate through Kotlin/Native, so the Swift method labels match the Kotlin parameter names (`function:`, `functions:`, `message:`) and `generationOptions` must be passed explicitly. A `ConvenienceExtensions.swift` overlay adds `generateResponse(message:)` without the options argument for the common case. ```kotlin @@ -108,8 +112,6 @@ Hold a strong reference for as long as you need to perform generations, then cal message: ChatMessage, generationOptions: GenerationOptions? = null ): Flow - - fun exportToJSONArray(): JSONArray } ``` @@ -123,7 +125,7 @@ Hold a strong reference for as long as you need to perform generations, then cal - **`history`** — a snapshot copy of the chat messages. Mutations don't affect generation. Once the stream emits `Complete`, `history` includes the final assistant reply. - **`isGenerating`** — `true` while a generation is in flight. Starting a second generation while one is running is blocked. -- **`functions`** (Swift only field, registered via `registerFunction` on both platforms) — tool definitions the model may invoke. +- **`functions`** — tool definitions the model may invoke. Registered through `registerFunction(_:)` / `registerFunctions(_:)` on both platforms. ### Streaming generation @@ -132,13 +134,17 @@ The async stream is the recommended way to drive generation — both platforms e ```swift - let user = ChatMessage(role: .user, content: [.text("Hello! What can you do?")]) + let user = ChatMessage(role: .user, textContent: "Hello! What can you do?") + let options = GenerationOptions() + .with(temperature: 0.3) + .with(minP: 0.15) + .with(repetitionPenalty: 1.05) Task { do { for try await response in conversation.generateResponse( message: user, - generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05) + generationOptions: options ) { switch onEnum(of: response) { case .chunk(let c): @@ -148,6 +154,10 @@ The async stream is the recommended way to drive generation — both platforms e case .functionCalls(let payload): handleFunctionCalls(payload.functionCalls) case .audioSample(let audio): + // `audio.samples` is a `KotlinFloatArray` from Kotlin/Native — bridge to + // `[Float]` via NSData if your renderer expects a Swift array: + // let nsData = LeapSDK.ArrayConversionsKt.floatArrayToNSData(array: audio.samples) + // let floats = nsData.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) } audioRenderer.enqueue(audio.samples, sampleRate: Int(audio.sampleRate)) case .complete(let completion): let text = completion.fullMessage.content.compactMap { part -> String? in @@ -216,18 +226,23 @@ The async stream is the recommended way to drive generation — both platforms e ### Export chat history -Both platforms expose a serializer compatible with OpenAI's chat-completions message format. Useful for persistence, analytics, or replaying conversations through a cloud fallback. +Persisting, replaying, or shipping the conversation to a cloud fallback all boil down to serializing `conversation.history`. Swift exposes `exportToJSON()` (returns a JSON string in OpenAI chat-completions shape); Kotlin uses `kotlinx.serialization` (`ChatMessage` and `ChatMessageContent` are `@Serializable`). ```swift - let payload: [[String: Any]] = try conversation.exportToJSON() + let jsonString: String = conversation.exportToJSON() ``` ```kotlin - val payload: JSONArray = conversation.exportToJSONArray() + import kotlinx.serialization.json.Json + import kotlinx.serialization.encodeToString + + val jsonString = Json.encodeToString(conversation.history) ``` + + Add `org.jetbrains.kotlinx:kotlinx-serialization-json` to your dependencies — see [Utilities → Serialization](./utilities#serialization) for the round-trip pattern. @@ -252,11 +267,11 @@ A sealed type with one case per kind of incremental output the engine emits. ```kotlin sealed interface MessageResponse { - class Chunk(val text: String) : MessageResponse - class ReasoningChunk(val reasoning: String) : MessageResponse - class FunctionCalls(val functionCalls: List) : MessageResponse - class AudioSample(val samples: FloatArray, val sampleRate: Int) : MessageResponse - class Complete( + data class Chunk(val text: String) : MessageResponse + data class ReasoningChunk(val reasoning: String) : MessageResponse + data class FunctionCalls(val functionCalls: List) : MessageResponse + data class AudioSample(val samples: FloatArray, val sampleRate: Int) : MessageResponse + data class Complete( val fullMessage: ChatMessage, val finishReason: GenerationFinishReason, val stats: GenerationStats?, @@ -270,7 +285,7 @@ A sealed type with one case per kind of incremental output the engine emits. - **`ReasoningChunk`** — thinking-style tokens emitted by reasoning models (wrapped between `` / `` upstream). Only fires when `GenerationOptions.enableThinking = true` *and* the model supports it. - **`FunctionCalls`** — one or more tool invocations the model wants you to execute. See [Function Calling](./function-calling). - **`AudioSample`** — float32 mono PCM frames from audio-capable checkpoints. The sample rate is constant for a generation; route the frames to a renderer. -- **`Complete`** — final marker. `fullMessage` is the assembled assistant `ChatMessage` (also present in `conversation.history`). `stats` holds token counts and `tokenPerSecond` (may be `null` on some backends). +- **`Complete`** — final marker. `fullMessage` is the assembled assistant `ChatMessage` (also present in `conversation.history`). `stats` is nullable (`GenerationStats?`); when present it holds `promptTokens`, `completionTokens`, `totalTokens`, `tokenPerSecond` (non-nullable `Float`), and `cachedPromptTokens`. ### `GenerationFinishReason` @@ -290,33 +305,53 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per + `GenerationOptions` is a Kotlin `data class` bridged into Swift. Kotlin parameter defaults don't survive the ObjC bridge, so the canonical Swift idiom is the parameterless init plus chained `.with(...)` builders from `ConvenienceExtensions.swift`: + ```swift - public struct GenerationOptions { + public class GenerationOptions { public var temperature: Float? public var topP: Float? public var minP: Float? - public var topK: Int32? public var repetitionPenalty: Float? + public var topK: Int32? public var rngSeed: Int64? - public var maxTokens: Int32? public var jsonSchemaConstraint: String? + public var functionCallParser: LeapFunctionCallParser? public var injectSchemaIntoPrompt: Bool // default true - public var functionCallParser: LeapFunctionCallParserProtocol? + public var maxTokens: Int32? public var inlineThinkingTags: Bool // default false public var enableThinking: Bool // default false public var extras: String? - public init(/* all fields as optional kwargs */) - public mutating func setResponseFormat(type: T.Type) throws + public convenience init() // builder entry point + + // Builders (chainable): + public func with(temperature: Float) -> GenerationOptions + public func with(topP: Float) -> GenerationOptions + public func with(minP: Float) -> GenerationOptions + public func with(repetitionPenalty: Float) -> GenerationOptions + public func with(topK: Int32) -> GenerationOptions + public func with(rngSeed: Int64) -> GenerationOptions + public func with(jsonSchema: String) -> GenerationOptions + public func with(maxTokens: Int32) -> GenerationOptions + public func with(injectSchemaIntoPrompt: Bool) -> GenerationOptions + public func with(inlineThinkingTags: Bool) -> GenerationOptions + public func with(enableThinking: Bool) -> GenerationOptions } ``` + For constrained generation, pass the schema string produced by the `@Generatable` macro into the JSON-schema builder: + ```swift - var options = GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05, maxTokens: 512) - try options.setResponseFormat(type: CityFact.self) + let options = GenerationOptions() + .with(temperature: 0.3) + .with(minP: 0.15) + .with(repetitionPenalty: 1.05) + .with(maxTokens: 512) + .with(jsonSchema: CityFact.jsonSchema()) ``` - Builder style is available too — chain `.with(temperature:)`, `.with(topP:)`, `.with(maxTokens:)`, etc. + The Apple-only `GenerationOptionsCompat` sibling type (used by legacy `Leap.load(...)` flows) additionally exposes `setResponseFormat(jsonSchema: String)` — see [Constrained Generation](./constrained-generation). ```kotlin @@ -336,7 +371,6 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per var extras: String? = null, ) { inline fun setResponseFormatType() - fun setResponseFormatType(kClass: KClass<*>) companion object { fun build(buildAction: GenerationOptions.() -> Unit): GenerationOptions @@ -350,7 +384,7 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per minP = 0.15f repetitionPenalty = 1.05f maxTokens = 512 - setResponseFormatType(CityFact::class) + setResponseFormatType() } ``` @@ -359,7 +393,7 @@ Tune sampling, structured output, tool-call parsing, and reasoning behavior per - **Sampling fields** (`temperature`, `topP`, `minP`, `topK`, `repetitionPenalty`) — standard sampling knobs. Use the values from the LEAP bundle manifest (`sampling_parameters` under `generation_time_parameters` in each model's `.json` on [LiquidAI/LeapBundles](https://huggingface.co/LiquidAI/LeapBundles)); they're tuned per checkpoint by the training team and differ from the HF model card defaults (the manifest values are the llama.cpp-engine path the SDK runs). Arbitrary "0.7" defaults from generic AI tutorials usually underperform. - **`rngSeed`** — set for deterministic / reproducible output (testing, debugging). Default is non-deterministic. - **`maxTokens`** — cap the response length. The model stops after this many completion tokens (prompt tokens don't count). Defaults to "until EOS or context limit." Useful for cost control with constrained output. -- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the higher-level `setResponseFormat(type:)` / `setResponseFormatType(...)` helpers with `@Generatable` types. See [Constrained Generation](./constrained-generation). +- **`jsonSchemaConstraint`** — JSON Schema string for constrained generation. Use the higher-level helpers — Swift `options.with(jsonSchema: T.jsonSchema())` (or `GenerationOptionsCompat.setResponseFormat(jsonSchema:)`) / Kotlin `setResponseFormatType()` — with `@Generatable` types. See [Constrained Generation](./constrained-generation). - **`injectSchemaIntoPrompt`** — when `true` (default), the schema is appended to the system message for semantic guidance *in addition* to the structural constraint at decode time. Set `false` to skip the prompt injection (matches `llama-server` grammar mode) — saves prompt tokens for large schemas. - **`functionCallParser`** — picks the tokenizer expected by the model. `LFMFunctionCallParser` (default) for Liquid Foundation Models; `HermesFunctionCallParser()` for Hermes/Qwen3 formats; `null` to receive raw tool-call text in `Chunk`s. - **`enableThinking`** — turn on reasoning mode for models that support it (e.g. LFM2.5-Thinking). Reasoning tokens arrive as `ReasoningChunk`s. diff --git a/deployment/on-device/sdk/desktop-platforms.mdx b/deployment/on-device/sdk/desktop-platforms.mdx index 0659b913..26ca1c65 100644 --- a/deployment/on-device/sdk/desktop-platforms.mdx +++ b/deployment/on-device/sdk/desktop-platforms.mdx @@ -49,13 +49,13 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux } dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") - // Optional: OpenAI-compatible cloud chat client - // implementation("ai.liquid.leap:leap-openai-client:0.10.6") + // Optional: OpenAI-compatible cloud chat client (JVM support added in v0.10.7) + // implementation("ai.liquid.leap:leap-openai-client:0.10.7") // Optional: Compose Multiplatform voice widget (also runs on JVM) - // implementation("ai.liquid.leap:leap-ui:0.10.6") + // implementation("ai.liquid.leap:leap-ui:0.10.7") } application { @@ -75,7 +75,7 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux } dependencies { - implementation 'ai.liquid.leap:leap-sdk:0.10.6' + implementation 'ai.liquid.leap:leap-sdk:0.10.7' } application { @@ -89,7 +89,7 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux ai.liquid.leap leap-sdk-jvm - 0.10.6 + 0.10.7 ``` @@ -107,9 +107,9 @@ The JVM target supports Kotlin and Java projects on macOS (Apple Silicon), Linux `LeapDownloader` is the cross-platform downloader. Point it at a writable directory and call `loadModel(modelName:, quantizationType:)` for manifest-based downloads, or `loadSimpleModel(model: ModelSource(...))` for a GGUF you already have on disk. ```kotlin -import ai.liquid.leap.LeapDownloader -import ai.liquid.leap.LeapDownloaderConfig -import ai.liquid.leap.ModelSource +import ai.liquid.leap.manifest.LeapDownloader +import ai.liquid.leap.manifest.LeapDownloaderConfig +import ai.liquid.leap.manifest.ModelSource import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.MessageResponse import kotlinx.coroutines.runBlocking @@ -132,7 +132,7 @@ fun main() = runBlocking { ) conversation.generateResponse( - ChatMessage.user("What is the capital of France?") + ChatMessage(ChatMessage.Role.USER, "What is the capital of France?") ).collect { response -> when (response) { is MessageResponse.Chunk -> print(response.text) @@ -163,7 +163,7 @@ Pass `mmprojPath = "..."` for vision models, or `audioDecoderPath = "..."` (and ### Runtime expectations -- **Memory.** Plan for at least `model_size_on_disk + 1 GiB` of free RAM. With `use_mmap=true` (the default since v0.10.4 — see the [changelog](/deployment/on-device/leap-sdk-changelog#mmap-default)) the OS pages weights in lazily, so resident memory grows as the model is exercised rather than at load time. +- **Memory.** Plan for at least `model_size_on_disk + 1 GiB` of free RAM. With `use_mmap=true` (the default since v0.10.4 — see the [changelog](/deployment/on-device/leap-sdk-changelog#memory-mapped-model-loading-by-default)) the OS pages weights in lazily, so resident memory grows as the model is exercised rather than at load time. - **Threads.** The engine defaults to a sensible CPU thread count for the host (`CpuThreadAdvisor.getRecommendedThreadCount()`). Override by passing `ModelLoadingOptions(cpuThreads = N)` through `loadModel(...)` if you need to share the box with other workloads. - **GPU acceleration.** Available on macOS (Metal, automatic) and on Linux JVM builds with a CUDA-capable GPU when the matching native variant is on the classpath. GPU offload is configured through the `extras` JSON payload on `ModelLoadingOptions` (advanced use only — most desktop workloads run pure-CPU). @@ -194,11 +194,11 @@ dependencyResolutionManagement { // build.gradle.kts plugins { kotlin("multiplatform") version "2.3.20" - id("ai.liquid.leap.nativelibs") version "0.10.6" + id("ai.liquid.leap.nativelibs") version "0.10.7" } dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") } kotlin { @@ -216,7 +216,7 @@ Build with the usual Kotlin/Native link tasks: The resulting binary lives at `build/bin/linuxX64/releaseExecutable/`, alongside the `.so` files the plugin installed (`libinference_engine.so`, `libinference_engine_llamacpp_backend.so`, `libie_zip.so`, plus their transitive dependencies). Keep them co-located when you ship — the cinterop manifest bakes `-rpath=$ORIGIN` into the binary so the dynamic linker resolves siblings. -**Versions 0.10.0, 0.10.1, and 0.10.2 cannot link a working Kotlin/Native executable** due to three separate Maven Central / cinterop issues that have all been fixed in 0.10.5. Maven Central is immutable per GAV, so the older versions cannot be republished — pin to **0.10.5 or newer**. See [the changelog](/deployment/on-device/leap-sdk-changelog#kotlin-native-linux-windows) for the full story. +**Versions 0.10.0 and 0.10.1 cannot link a working Kotlin/Native executable** due to Maven Central / cinterop issues; v0.10.2 and v0.10.3 were never published, and the fixes shipped across v0.10.4.x, v0.10.6, and v0.10.7. Maven Central is immutable per GAV, so the older versions cannot be republished — pin to **0.10.7 or newer**. See [the changelog](/deployment/on-device/leap-sdk-changelog#kotlin-native-linux-windows) for the full story. ### Manual recipe (if you can't apply the plugin) @@ -227,7 +227,7 @@ plugins { } dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") } val nativesDir = layout.buildDirectory.dir("bin/linuxX64/releaseExecutable") @@ -241,7 +241,7 @@ kotlin { val leapSdkNatives by configurations.creating dependencies { - leapSdkNatives("ai.liquid.leap:leap-sdk-linuxx64:0.10.6:natives@zip") + leapSdkNatives("ai.liquid.leap:leap-sdk-linuxx64:0.10.7:natives@zip") } val installLeapNatives by tasks.registering(Copy::class) { @@ -259,8 +259,8 @@ tasks.named("linkReleaseExecutableLinuxX64") { dependsOn(installLeapNatives) } The Maven coordinates for the `-natives.zip` artifacts: -- `ai.liquid.leap:leap-sdk-linuxx64:0.10.6:natives@zip` -- `ai.liquid.leap:leap-sdk-linuxarm64:0.10.6:natives@zip` +- `ai.liquid.leap:leap-sdk-linuxx64:0.10.7:natives@zip` +- `ai.liquid.leap:leap-sdk-linuxarm64:0.10.7:natives@zip` ## Windows native (MinGW x64) @@ -269,11 +269,11 @@ The same Kotlin/Native flow works for Windows x86_64 via the MinGW-w64 toolchain ```kotlin plugins { kotlin("multiplatform") version "2.3.20" - id("ai.liquid.leap.nativelibs") version "0.10.6" + id("ai.liquid.leap.nativelibs") version "0.10.7" } dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") } kotlin { @@ -291,7 +291,7 @@ The plugin installs `inference_engine.dll`, `libinference_engine_llamacpp_backen The Maven coordinates for the `-natives.zip` artifact: -- `ai.liquid.leap:leap-sdk-mingwx64:0.10.6:natives@zip` +- `ai.liquid.leap:leap-sdk-mingwx64:0.10.7:natives@zip` **Building from macOS or Linux for Windows?** Kotlin/Native does not support cross-compiling to MinGW from a non-Windows host as of 2.3.20 — the build must run on Windows (native or in CI). GitHub Actions `windows-latest` works without extra setup. @@ -316,12 +316,12 @@ Identical Swift API to iOS — same `ModelDownloader`, `Conversation`, `ChatMess ```swift .binaryTarget( name: "LeapSDK", - url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapSDK.xcframework.zip", - checksum: "ae9ecddbe5dc226ddd4ec8fe42178b721faeab71a20b3f14efceaae5a2495b7e" + url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapSDK.xcframework.zip", + checksum: "6f2721aa45d7555646f78cbcaedb57aba3d869f56b24d681ad332846e131ae3d" ) ``` -The XCFramework slice for macOS ARM64 is in the same zip as the iOS slices. Mac Catalyst (`x86_64-apple-ios13.0-macabi`, `arm64-apple-ios13.0-macabi`) is also included. +The XCFramework slice for macOS ARM64 is in the same zip as the iOS slices. The released framework ships exactly three slices — `ios-arm64`, `ios-arm64-simulator`, `macos-arm64`; **Mac Catalyst is not supported**, and the iOS simulator slice is ARM64-only (Intel-Mac simulator hosts cannot run it). ### From Kotlin (JVM, Compose for Desktop) @@ -329,8 +329,8 @@ If you're targeting macOS as a JVM host — for example with Compose Multiplatfo ```kotlin dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-ui:0.10.6") // Compose voice widget runs on JVM too + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-ui:0.10.7") // Compose voice widget runs on JVM too } ``` diff --git a/deployment/on-device/sdk/function-calling.mdx b/deployment/on-device/sdk/function-calling.mdx index e3a016d1..1bec418e 100644 --- a/deployment/on-device/sdk/function-calling.mdx +++ b/deployment/on-device/sdk/function-calling.mdx @@ -19,23 +19,28 @@ Vision and audio-capable models require companion files. Bundles embed these ref + The Kotlin `LeapFunction` / `LeapFunctionParameter` constructors carry `@ObjCName` annotations on `description:`, so the Swift labels are `functionDescription:` and `parameterDescription:`. `LeapFunctionParameter`'s `optional` parameter has no Swift default — pass `optional: false` for required parameters. + ```swift conversation.registerFunction( - LeapFunction( + function: LeapFunction( name: "get_weather", - description: "Query the weather of a city", + functionDescription: "Query the weather of a city", parameters: [ LeapFunctionParameter( name: "city", - type: LeapFunctionParameterType.string(StringType()), - description: "The city to query weather for" + type: LeapFunctionParameterType.LeapStr(enumValues: nil, description: nil), + parameterDescription: "The city to query weather for", + optional: false ), LeapFunctionParameter( name: "unit", - type: LeapFunctionParameterType.string( - StringType(enumValues: ["celsius", "fahrenheit"]) + type: LeapFunctionParameterType.LeapStr( + enumValues: ["celsius", "fahrenheit"], + description: nil ), - description: "Temperature unit (celsius or fahrenheit)" + parameterDescription: "Temperature unit (celsius or fahrenheit)", + optional: false ), ] ) @@ -73,24 +78,26 @@ Vision and audio-capable models require companion files. Bundles embed these ref Use normal identifiers — letters, underscores, and digits (not starting with a digit). Most models trained for tool use recognize that shape. -The Kotlin parameter type classes are named with a `Leap` prefix (`LeapStr`, `LeapNum`, `LeapInt`, `LeapBool`, `LeapArr`, `LeapObj`, `LeapNull`) to avoid collisions with Kotlin's built-in `String`, `Number`, `Int`, `Boolean`, etc. The Swift bindings expose the same primitives under cleaner names (`.string(...)`, `.number(...)`, etc.) via SKIE. +The Kotlin parameter type classes are named with a `Leap` prefix (`LeapStr`, `LeapNum`, `LeapInt`, `LeapBool`, `LeapArr`, `LeapObj`, `LeapNull`) to avoid collisions with Kotlin's built-in `String`, `Number`, `Int`, `Boolean`, etc. The Swift bindings expose the same names — there are no separate `.string(...)` / `.number(...)` aliases; SKIE preserves the Kotlin nested-class names. ## Handle the response -Function calls arrive as `MessageResponse.functionCalls` (Swift) / `MessageResponse.FunctionCalls` (Kotlin), which wraps a list of `LeapFunctionCall`. +Function calls arrive as the `MessageResponse.FunctionCalls` variant on both platforms, wrapping a list of `LeapFunctionCall` payloads. + `LeapFunctionCall` is a Kotlin `data class` bridged into Swift. `arguments` is a Kotlin `Map` exposed as Swift `[String: Any]` (the ObjC bridge collapses `Any?` to non-optional `id`): + ```swift - public struct LeapFunctionCall { - public let name: String - public let arguments: [String: Any?] + public class LeapFunctionCall { + public var name: String + public var arguments: [String: Any] } ``` ```swift - let userMessage = ChatMessage(role: .user, content: [.text("What's the weather in NYC?")]) + let userMessage = ChatMessage(role: .user, textContent: "What's the weather in NYC?") for try await response in conversation.generateResponse(message: userMessage) { switch onEnum(of: response) { @@ -147,12 +154,12 @@ Append the tool's output as a `tool`-role message and continue the conversation. ```swift let toolMessage = ChatMessage( role: .tool, - content: [.text(#"{"temperature":72,"conditions":"sunny"}"#)] + textContent: #"{"temperature":72,"conditions":"sunny"}"# ) - guard let current = conversation else { return } - let updatedHistory = current.history + [toolMessage] - conversation = current.modelRunner.createConversationFromHistory(history: updatedHistory) + let updatedHistory = conversation.history + [toolMessage] + let nextConversation = conversation.modelRunner.createConversationFromHistory(history: updatedHistory) + // Continue generation against `nextConversation`. ``` @@ -176,18 +183,29 @@ Then call `generateResponse(...)` on the new conversation to get the model's too + Both types are Kotlin `data class`es bridged into Swift. `@ObjCName` annotations rename the `description` parameter on the Swift inits to `functionDescription:` / `parameterDescription:`. + ```swift - public struct LeapFunction: Equatable { - public let name: String - public let description: String - public let parameters: [LeapFunctionParameter] + public class LeapFunction { + public var name: String + public var functionDescription: String // ObjC-renamed from Kotlin `description` + public var parameters: [LeapFunctionParameter] + + public init(name: String, functionDescription: String, parameters: [LeapFunctionParameter]) } - public struct LeapFunctionParameter: Equatable { - public let name: String - public let type: LeapFunctionParameterType - public let description: String - public let optional: Bool + public class LeapFunctionParameter { + public var name: String + public var type: LeapFunctionParameterType + public var parameterDescription: String // ObjC-renamed from Kotlin `description` + public var optional: Bool + + public init( + name: String, + type: LeapFunctionParameterType, + parameterDescription: String, + optional: Bool // no default in Swift — pass `false` for required + ) } ``` @@ -215,22 +233,27 @@ Then call `generateResponse(...)` on the new conversation to get the model's too + `LeapFunctionParameterType` is a Kotlin `sealed class`. SKIE generates an `onEnum(of:)`-compatible enum view, but the constructors you use to build instances keep the Kotlin nested-class names — there is no `.string(...)` / `.number(...)` alias. + ```swift - public indirect enum LeapFunctionParameterType: Codable, Equatable { - case string(StringType) - case number(NumberType) - case integer(IntegerType) - case boolean(BooleanType) - case array(ArrayType) - case object(ObjectType) - case null(NullType) - } + // Direct constructors (use these to build parameter types): + LeapFunctionParameterType.LeapStr(enumValues: [String]?, description: String?) + LeapFunctionParameterType.LeapNum(enumValues: [NSNumber]?, description: String?) + LeapFunctionParameterType.LeapInt(enumValues: [KotlinInt]?, description: String?) + LeapFunctionParameterType.LeapBool(description: String?) + LeapFunctionParameterType.LeapArr(itemType: LeapFunctionParameterType, description: String?) + LeapFunctionParameterType.LeapObj( + properties: [String: LeapFunctionParameterType], + required: [String], + description: String? + ) + LeapFunctionParameterType.LeapNull() // no description parameter ``` - - `StringType`, `NumberType`, `IntegerType` accept `enumValues` to constrain valid values. - - `ArrayType` has `itemType` describing element type. - - `ObjectType` has `properties: [String: LeapFunctionParameterType]` and `required: [String]`. - - All non-`null` types take an optional `description` (only used when nested via `ArrayType.itemType` or object properties — when used directly as `LeapFunctionParameter.type`, the outer `description` wins). + - `LeapStr` / `LeapNum` / `LeapInt` accept `enumValues` to constrain valid values. + - `LeapArr` has `itemType` describing the element type. + - `LeapObj` has `properties: [String: LeapFunctionParameterType]` and `required: [String]`. + - The nested `description` is overridden when the type is used directly as `LeapFunctionParameter.type`; it's only consulted when the type is used inside `LeapArr.itemType` or `LeapObj.properties`. ```kotlin @@ -258,21 +281,25 @@ Then call `generateResponse(...)` on the new conversation to get the model's too ```swift LeapFunction( name: "get_weather", - description: "Query the weather of cities", + functionDescription: "Query the weather of cities", parameters: [ LeapFunctionParameter( name: "cities", - type: LeapFunctionParameterType.array( - ArrayType(itemType: .string(StringType())) + type: LeapFunctionParameterType.LeapArr( + itemType: LeapFunctionParameterType.LeapStr(enumValues: nil, description: nil), + description: nil ), - description: "Names of the cities to query weather for" + parameterDescription: "Names of the cities to query weather for", + optional: false ), LeapFunctionParameter( name: "unit", - type: LeapFunctionParameterType.string( - StringType(enumValues: ["celsius", "fahrenheit"]) + type: LeapFunctionParameterType.LeapStr( + enumValues: ["celsius", "fahrenheit"], + description: nil ), - description: "Temperature unit" + parameterDescription: "Temperature unit", + optional: false ), ] ) diff --git a/deployment/on-device/sdk/messages-content.mdx b/deployment/on-device/sdk/messages-content.mdx index 085d4f52..f30edb60 100644 --- a/deployment/on-device/sdk/messages-content.mdx +++ b/deployment/on-device/sdk/messages-content.mdx @@ -3,76 +3,87 @@ title: "Messages & Content" description: "ChatMessage, ChatMessageContent, audio format requirements — same shape on every platform." --- -`ChatMessage` and `ChatMessageContent` mirror the OpenAI chat-completions message schema. The same fields exist on iOS / macOS (`struct ChatMessage`, `enum ChatMessageContent`) and the Kotlin platforms (`data class ChatMessage`, `sealed interface ChatMessageContent`). +`ChatMessage` and `ChatMessageContent` mirror the OpenAI chat-completions message schema. Both are declared once in `commonMain` (`data class ChatMessage`, `sealed class ChatMessageContent`) and Kotlin/Native + SKIE bridge the Kotlin types into Swift — there are no separate "native" Swift declarations. ## `ChatMessage` + The Swift class is generated from the Kotlin `data class`. Kotlin parameter defaults don't propagate, so the primary init requires all four arguments explicitly: + ```swift - public struct ChatMessage { - public var role: ChatMessageRole + public class ChatMessage { + public var role: ChatMessage.Role public var content: [ChatMessageContent] public var reasoningContent: String? public var functionCalls: [LeapFunctionCall]? + // Primary init — pass `reasoningContent: nil, functionCalls: nil` for ordinary messages. public init( - role: ChatMessageRole, + role: ChatMessage.Role, content: [ChatMessageContent], - reasoningContent: String? = nil, - functionCalls: [LeapFunctionCall]? = nil + reasoningContent: String?, + functionCalls: [LeapFunctionCall]? ) - public init(from json: [String: Any]) throws - } + // Secondary inits (from Kotlin secondary constructors): + public init(role: ChatMessage.Role, content: ChatMessageContent) // single content + public init(role: ChatMessage.Role, textContent: String) // plain text - public enum ChatMessageRole: String { - case user, system, assistant, tool + public enum Role { + case system, user, assistant, tool + } } ``` ```kotlin + @Serializable(with = ChatMessageJsonSerializer::class) data class ChatMessage( val role: Role, val content: List, - val reasoningContent: String? = null, - val functionCalls: List? = null, + @SerialName("reasoning_content") val reasoningContent: String? = null, + @SerialName("tool_calls") val functionCalls: List? = null, ) { + // Single-content secondary ctor (wraps the part in a list, drops defaults). + constructor(role: Role, content: ChatMessageContent) + // Plain-text secondary ctor (parameter name is `textContent`). + constructor(role: Role, textContent: String) + enum class Role(val type: String) { SYSTEM("system"), USER("user"), ASSISTANT("assistant"), - TOOL("tool"), - } + TOOL("tool"); - fun toJSONObject(): JSONObject - - companion object { - fun fromJSONObject(obj: JSONObject): ChatMessage + companion object { + fun fromTypeString(type: String): Role // throws LeapSerializationException on unknown values + } } } ``` + + `ChatMessage` is `@Serializable` via the dedicated `ChatMessageJsonSerializer` — encode/decode through `kotlinx.serialization.json.Json` rather than ad-hoc `JSONObject` helpers. See [Utilities → Serialization](./utilities#serialization). ### Fields - **`role`** — the speaker (`user`, `system`, `assistant`, or `tool`). Use `tool` when appending function-call results back into the history. -- **`content`** — ordered fragments. Supported part types: `Text`, `Image` (JPEG bytes), `Audio` (WAV bytes), and on Kotlin `AudioPcmF32` for raw float samples. +- **`content`** — ordered fragments. Supported part types: `Text`, `Image` (JPEG bytes wrapped in a data URL), `Audio` (WAV bytes or `input_audio` payload), and on Kotlin `AudioPcmF32` for raw float samples. - **`reasoningContent`** — text emitted by reasoning models inside `` / `` tags. `null` for non-reasoning responses. -- **`functionCalls`** — calls returned by `MessageResponse.functionCalls` on the previous turn, included when appending tool-call results to history. +- **`functionCalls`** — calls returned by `MessageResponse.FunctionCalls` on the previous turn, included when appending tool-call results to history. ### Serialization -Both platforms expose round-trip JSON helpers compatible with OpenAI's `ChatCompletionRequestMessage`. +Round-trip the message through `kotlinx.serialization` — there is no separate "from `[String: Any]`" initializer on either platform. - `ChatMessage(from: [String: Any])` constructs a message from an OpenAI-style payload. Throws `LeapSerializationError` on unrecognized shapes. + Encode with `LeapJson.encodeToString` (or your own `JSONEncoder` against the OpenAI shape) and decode with the matching Kotlin serializer. See [Utilities → Serialization](./utilities#serialization) for examples that route through `LeapJson`. - `ChatMessage.toJSONObject()` / `ChatMessage.fromJSONObject(obj)`. Throws `LeapSerializationException` on unrecognized shapes. See [Utilities → Serialization Support](./utilities#serialization-support). + `ChatMessage` is `@Serializable`. Encode with `Json.encodeToString(message)` and decode with `Json.decodeFromString(jsonString)` — see [Utilities → Serialization](./utilities#serialization). On error, expect a `LeapSerializationException` (not `LeapSerializationError`). @@ -80,48 +91,78 @@ Both platforms expose round-trip JSON helpers compatible with OpenAI's `ChatComp - ```swift - public enum ChatMessageContent { - case text(String) - case image(Data) // JPEG bytes - case audio(Data) // WAV bytes + `ChatMessageContent` is the Kotlin `sealed class` bridged to Swift — switch on its subclasses with SKIE's `onEnum(of:)` helper. There is no native Swift `enum`, no positional `.image(_:)` / `.audio(_:)` factory, and no `init(from json:)`. Use the static factories on the Swift overlay: - public init(from json: [String: Any]) throws - } + ```swift + // Text (cross-platform): + ChatMessageContent.text(_ text: String) -> ChatMessageContent + + // Image: + ChatMessageContent.fromJPEGData(_ jpegData: Data) -> ChatMessageContent.Image + ChatMessageContent.image(url: String) -> ChatMessageContent.Image // data URL or remote URL + + // Audio: + ChatMessageContent.fromWAVData(_ wavData: Data) -> ChatMessageContent.Audio + ChatMessageContent.audio(data: Data, format: String = "wav") -> ChatMessageContent.Audio + ChatMessageContent.fromFloatSamples(_ samples: [Float], sampleRate: Int, channelCount: Int = 1) + -> ChatMessageContent.Audio + + // iOS only — UIKit: + public static func fromUIImage(_ image: UIImage) throws -> ChatMessageContent + // (JPEG quality is fixed at 0.85; no compressionQuality parameter is exposed.) ``` - Helper initializers simplify interop with platform-native buffers: + `fromUIImage` is iOS-only and takes only the image — JPEG compression quality is hard-coded to `0.85` in the overlay (`leap-sdk/src/iosMain/.../ChatMessageContentExtensionsIos.kt`). There is no `fromNSImage` factory; on macOS, convert your `NSImage` to JPEG `Data` yourself and pass it through `fromJPEGData(_:)`. - - `ChatMessageContent.fromUIImage(image, compressionQuality:)` — UIKit - - `ChatMessageContent.fromNSImage(image, compressionQuality:)` — AppKit - - `ChatMessageContent.fromWAVData(data)` — pass-through validator - - `ChatMessageContent.fromFloatSamples(samples, sampleRate:, channelCount:)` — wrap raw float32 PCM into a WAV blob - - On the wire, image parts are encoded as OpenAI-style `image_url` payloads and audio parts as `input_audio` arrays with Base64 data. + On the wire, image parts are encoded as OpenAI-style `image_url` payloads (with a `data:image/jpeg;base64,...` URL) and audio parts as `input_audio` arrays with Base64 data. ```kotlin - sealed interface ChatMessageContent { - fun clone(): ChatMessageContent - fun toJSONObject(): JSONObject - - data class Text(val text: String) : ChatMessageContent - data class Image(val jpegByteArray: ByteArray) : ChatMessageContent - data class Audio(val wavByteArray: ByteArray) : ChatMessageContent - data class AudioPcmF32(val samples: FloatArray, val sampleRate: Int) : ChatMessageContent - } + sealed class ChatMessageContent { + data class Text(val text: String) : ChatMessageContent() + + data class Image(val imageUrl: ImageUrl) : ChatMessageContent() { + // Convenience secondary ctor — wraps the bytes in a data: URL. + constructor(jpegByteArray: ByteArray) + val jpegByteArray: ByteArray // derived property: decodes the data: URL - fun ChatMessageContent.fromJSONObject(obj: JSONObject): ChatMessageContent + // Nested wrapper for the OpenAI `image_url` wire shape. + data class ImageUrl(val url: String) + } + + data class Audio(val inputAudio: InputAudio) : ChatMessageContent() { + // Convenience secondary ctor — wraps the bytes in an InputAudio. + constructor(data: ByteArray) + val data: ByteArray // derived property: decodes the base64 InputAudio payload + + data class InputAudio(val data: String, val format: String) // base64-encoded `data` + } + + // Convenience helpers (declared on the sealed class) wrap raw PCM into Audio: + fun toWavBytes(): ByteArray // on AudioPcmF32 — encodes float samples as 16-bit PCM WAV + fun toAudio(): Audio // on AudioPcmF32 — same bytes wrapped as ChatMessageContent.Audio + + data class AudioPcmF32(val samples: FloatArray, val sampleRate: Int) : ChatMessageContent() + } ``` - Android-specific helper: `ChatMessageContent.Image.fromBitmap(bitmap, compressionQuality = 85)` re-encodes an Android `Bitmap` to JPEG. + Serialize via `kotlinx.serialization` (every variant is `@Serializable`). + + Android-specific helper: `ImageUtils.fromBitmap(bitmap, compressionQuality = 85)` (in `ai.liquid.leap.message`) re-encodes an Android `Bitmap` to JPEG and returns a `ChatMessageContent.Image`. It's a `suspend` function — call it from a coroutine. + + ```kotlin + import ai.liquid.leap.message.ImageUtils + + val image: ChatMessageContent.Image = ImageUtils.fromBitmap(bitmap, compressionQuality = 85) + ``` - **`Text`** — plain text fragment. - **`Image`** — JPEG-encoded image bytes. Only vision-capable models can interpret image parts. - **`Audio`** — WAV-encoded audio bytes (see [audio format requirements](#audio-format-requirements) below). -- **`AudioPcmF32`** (Kotlin) / `fromFloatSamples(...)` (Swift) — raw float32 mono PCM in memory. Avoids re-encoding when you already have samples. +- **`AudioPcmF32`** (Kotlin) — raw float32 mono PCM in memory. Avoids the WAV encoding step when you already have samples; the engine handles framing internally. Kotlin-only. +- **`fromFloatSamples(...)` (Swift)** — convenience that wraps `[Float]` samples into a `ChatMessageContent.Audio` WAV blob (via `FloatAudioBuffer.makeAudioContent()`). Different from Kotlin's `AudioPcmF32`: this one DOES re-encode through WAV. There is no Swift surface for raw `AudioPcmF32` today. ## Audio format requirements @@ -168,8 +209,10 @@ The engine **only accepts WAV**. M4A, MP3, AAC, OGG, and other compressed format role: .user, content: [ .text("What is being said in this audio?"), - .audio(wavData) - ] + ChatMessageContent.fromWAVData(wavData) + ], + reasoningContent: nil, + functionCalls: nil ) ``` @@ -205,7 +248,9 @@ The engine **only accepts WAV**. M4A, MP3, AAC, OGG, and other compressed format let message = ChatMessage( role: .user, - content: [.text("Transcribe this audio"), audioContent] + content: [.text("Transcribe this audio"), audioContent], + reasoningContent: nil, + functionCalls: nil ) ``` @@ -252,7 +297,7 @@ The engine **only accepts WAV**. M4A, MP3, AAC, OGG, and other compressed format recorder.stop() let wavData = try Data(contentsOf: audioURL) - let audioContent: ChatMessageContent = .audio(wavData) + let audioContent: ChatMessageContent = ChatMessageContent.fromWAVData(wavData) ``` diff --git a/deployment/on-device/sdk/model-loading.mdx b/deployment/on-device/sdk/model-loading.mdx index 1e79299a..e33ee29a 100644 --- a/deployment/on-device/sdk/model-loading.mdx +++ b/deployment/on-device/sdk/model-loading.mdx @@ -11,12 +11,12 @@ The LEAP SDK ships two downloader classes built on the same pipeline. They diffe | **iOS / macOS (Swift)** | `ModelDownloader` | One-shot `loadModel(...)` and `loadSimpleModel(...)` that route every file transfer through `URLSession`. Pass `sessionConfiguration: .background(withIdentifier:)` for downloads that survive app suspension. Also exposes the underlying `downloadModel` / `requestDownloadModel` / `queryStatus` lifecycle for prefetch flows. The class ships in the `LeapModelDownloader` SPM library product. | | **All platforms (iOS, Android, JVM, Linux native, Windows native, macOS Kotlin)** | `LeapDownloader` | The cross-platform manifest loader. One-shot `loadModel(...)` and `loadSimpleModel(...)`. No platform-native background integration — the iOS `ModelDownloader` and Android `LeapModelDownloader` classes wrap one of these internally. | -Both classes return the same `ModelRunner` and share an on-disk model cache when constructed with the same `LeapDownloaderConfig.saveDir`. The platform downloader wraps a `LeapDownloader` internally — once a download has landed, calling `LeapDownloader.loadModel(...)` against the shared cache picks up the files without re-downloading. +All downloader classes return the same `ModelRunner` type. They share an on-disk model cache when pointed at the same directory: `LeapDownloaderConfig.saveDir` for Swift / JVM / native, and `modelFileDir` for Android `LeapModelDownloader`. Once a download has landed, calling `LeapDownloader.loadModel(...)` against the shared cache picks up the files without re-downloading. **Parameter naming.** Every loader uses the same parameter labels across Swift and Kotlin: -- **`loadModel(...)` / `downloadModel(...)` / `requestDownloadModel(...)` / `queryStatus(...)` / `removeModel(...)`** all use `modelName:` / `quantizationType:` on the Swift `ModelDownloader` (iOS, macOS), the Kotlin `LeapModelDownloader` (Android), and the cross-platform `LeapDownloader`. +- Manifest loaders and lifecycle methods use `modelName:` / `quantizationType:` consistently. Swift `ModelDownloader` exposes `downloadModel(...)`, `requestDownloadModel(...)`, `queryStatus(...)`, and `removeModel(...)`; Android `LeapModelDownloader` exposes `requestDownloadModel(...)`, `requestStopDownload(...)`, `queryStatus(...)`, and `getModelResourceFolder(...)`; cross-platform `LeapDownloader` exposes foreground `downloadModel(...)` / `loadModel(...)` plus cache cleanup helpers. - **`ModelSource` (sideloaded)** uses `quantizationId` — the field is part of the source descriptor, not a loader parameter. @@ -68,17 +68,19 @@ Both classes return the same `ModelRunner` and share an on-disk model cache when class LeapModelDownloader( private val context: Context, modelFileDir: File? = null, - private val extraHTTPRequestHeaders: Map = mapOf(), private val notificationConfig: LeapModelDownloaderNotificationConfig = LeapModelDownloaderNotificationConfig(), + private val downloaderConfig: LeapDownloaderConfig = LeapDownloaderConfig(), + private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO, ) ``` | Field | Description | |---|---| | `context` | Activity or Application context. | - | `modelFileDir` | Override the model cache directory. Defaults to app's external files directory. | - | `extraHTTPRequestHeaders` | Extra headers to attach to download requests. | - | `notificationConfig` | Foreground service notification title/content/icon strings. | + | `modelFileDir` | Override the model cache directory. Defaults to `File(context.filesDir, "leap_models")`. | + | `notificationConfig` | Notification channel, title, and content strings used by the WorkManager download worker. | + | `downloaderConfig` | Network / validation settings for the underlying `LeapDownloader` (`baseUrl`, SHA-256 validation, SSL, and timeouts). The cache directory comes from `modelFileDir`, not `downloaderConfig.saveDir`. | + | `ioDispatcher` | Coroutine dispatcher for blocking I/O. Defaults to `Dispatchers.IO`. | ```kotlin @@ -87,6 +89,11 @@ Both classes return the same `ModelRunner` and share an on-disk model cache when data class LeapDownloaderConfig( val saveDir: String = "leap_models", val validateSha256: Boolean = true, + val disableSslValidation: Boolean = false, + val baseUrl: String? = null, + val connectTimeoutMillis: Long = 30_000, + val socketTimeoutMillis: Long = 60_000, + val requestTimeoutMillis: Long = 600_000, ) ``` @@ -195,8 +202,8 @@ Resolves the GGUF manifest for the given model + quantization slug, downloads an - **`forceDownload`** — re-fetch even when cached. Use after a corrupted download or when the manifest has changed upstream. - **`forceLocal`** — skip the Leap Model Service and load in-process. Useful for testing the local path when the service is installed. - - **`progress`** — pass a callback to load eagerly inside `loadModel(...)` and observe progress; pass `null` (the default) to defer loading until the first session is created. - - **Background staging** — use `requestDownloadModel(modelName, quantizationType, forceDownload)` + `observeDownloadProgress(modelName, quantizationType): Flow` for WorkManager-backed transfers. See [Utilities](./utilities). + - **`progress`** — observe manifest / model download bytes as `ProgressData`. On the Leap Model Service path, passing `null` preserves the service's deferred-load behavior; if the service is unavailable, the in-process fallback still loads before `loadModel(...)` returns. + - **Background staging** — call `requestDownloadModel(modelName, quantizationType, forceDownload)` to enqueue a unique WorkManager download worker, then observe `observeDownloadProgress(modelName, quantizationType): StateFlow`. See [Utilities](./utilities). ```kotlin @@ -211,7 +218,9 @@ Resolves the GGUF manifest for the given model + quantization slug, downloads an ``` ```kotlin - val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir)) + // `saveDir` is a String filesystem path (not java.io.File). On Android pass + // `context.cacheDir.absolutePath`; on JVM/native pass any writable directory: + val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = "/var/cache/leap")) val runner = downloader.loadModel( modelName = "LFM2-1.2B", @@ -289,19 +298,33 @@ Use this path when you ship the model as an app asset, `adb push` it for develop ``` - The 0.9.x-style URL-based loader still works: + The 0.9.x-style URL-based loader still works for the common case (auto-detection picks up sibling `mmproj-*.gguf` for vision and audio decoder files whose name contains "audio" and "decoder"): ```swift let runner = try await Leap.load(url: ggufURL) + ``` + + If you need to override the companion-file picks, build a fully-specified `LiquidInferenceEngineOptions`. The Kotlin/Native ObjC bridge strips default-argument metadata, so the Swift designated init requires every field — there is no `LiquidInferenceEngineOptions(bundlePath: …)` single-arg overload today. Pass `nil` for fields you don't need to set: + ```swift let options = LiquidInferenceEngineOptions( bundlePath: ggufURL.path, - mmProjPath: mmprojURL.path + cacheOptions: nil, + cpuThreads: nil, + contextSize: nil, + nGpuLayers: nil, + mmProjPath: mmprojURL.path, + audioDecoderPath: nil, + chatTemplate: nil, + audioTokenizerPath: nil, + audioDecoderUseGpu: false, + useMmap: nil, + extras: nil ) let runner = try await Leap.load(url: ggufURL, options: options, autoDetectCompanionFiles: false) ``` - Auto-detection picks up sibling `mmproj-*.gguf` (vision) and audio decoder files (`.gguf`/`.bin` whose name contains "audio" and "decoder"). New code should prefer `loadSimpleModel(model: ModelSource(...))` for race-free, explicit wiring. + New code should prefer `loadSimpleModel(model: ModelSource(...))` for race-free, explicit wiring. @@ -407,7 +430,7 @@ Useful for onboarding flows that prefetch over Wi-Fi or staging models you'll lo } public struct DownloadedModelManifest { - public let manifest: ModelManifest + public let manifest: Manifest public let localModelPath: String public let localMultimodalProjectorPath: String? public let localAudioDecoderPath: String? @@ -418,22 +441,15 @@ Useful for onboarding flows that prefetch over Wi-Fi or staging models you'll lo ```kotlin - suspend fun downloadModel( - modelName: String, - quantizationType: String, - progress: ((ProgressData) -> Unit)? = null, - ): Manifest - - // Background variant (WorkManager): fire-and-forget, returns immediately + // Enqueues a unique WorkManager download worker and returns after staging it. suspend fun requestDownloadModel(modelName: String, quantizationType: String, forceDownload: Boolean = false) suspend fun requestStopDownload(modelName: String, quantizationType: String) suspend fun queryStatus(modelName: String, quantizationType: String): ModelDownloadStatus - fun observeDownloadProgress(modelName: String, quantizationType: String): Flow + fun observeDownloadProgress(modelName: String, quantizationType: String): StateFlow fun getModelResourceFolder(modelName: String, quantizationType: String): File - suspend fun requestStopService() ``` - The background variant runs on WorkManager and survives app restarts. See [Utilities → Android background staging](./utilities) for the full status-polling lifecycle. + Android `LeapModelDownloader` does not expose foreground-only `downloadModel(...)`; use `requestDownloadModel(...)` to prefetch by enqueuing the WorkManager downloader, or `loadModel(...)` when you want download + load in one call. The queued worker survives app restarts. See [Utilities → Android background staging](./utilities) for the full status-polling lifecycle. ```kotlin @@ -458,7 +474,7 @@ Per-load runtime overrides. Default values come from the model bundle's manifest ```swift public struct LiquidInferenceEngineOptions { - public var bundlePath: String + public let bundlePath: String public let cacheOptions: LiquidCacheOptions? public let cpuThreads: UInt32? public let contextSize: UInt32? @@ -468,20 +484,30 @@ Per-load runtime overrides. Default values come from the model bundle's manifest public let audioTokenizerPath: String? public let audioDecoderUseGpu: Bool // default false public let chatTemplate: String? + public let useMmap: Bool? public let extras: String? } - // Manifest-based variant — accepts cacheOptions + contextSize without bundlePath + // Manifest-based variant — used with downloader.loadModel(...). No bundlePath + // (the downloader supplies it) and no companion-path / mmap fields (the manifest + // pins those). Only cache + tuning fields are exposed: public struct LiquidInferenceEngineManifestOptions { public let cacheOptions: LiquidCacheOptions? + public let cpuThreads: UInt32? public let contextSize: UInt32? - // …same companion-file and tuning fields… + public let nGpuLayers: UInt32? + public let audioDecoderUseGpu: Bool // default false + public let chatTemplate: String? + public let extras: String? } ``` Pass `LiquidInferenceEngineManifestOptions` to `ModelDownloader.loadModel(modelName:, quantizationType:, options:, ...)` for manifest-based loads, and `LiquidInferenceEngineOptions` to `Leap.load(url:, options:)` for sideloaded GGUFs: ```swift + // Manifest-based load (preferred — LiquidInferenceEngineManifestOptions has a + // SKIE-bundled convenience init in ConvenienceExtensions.swift that lets you + // pass just the fields you care about): let manifestOpts = LiquidInferenceEngineManifestOptions( contextSize: 8192, cpuThreads: 6 @@ -491,25 +517,17 @@ Per-load runtime overrides. Default values come from the model bundle's manifest quantizationType: "Q4_K_M", options: manifestOpts ) - - // Sideloaded variant (URL-based) - let options = LiquidInferenceEngineOptions( - bundlePath: ggufURL.path, - cpuThreads: 6, - contextSize: 8192 - ) - let runner = try await Leap.load(url: ggufURL, options: options) ``` - **Builder style.** Chain `.with(...)` on `GenerationOptions`, `LiquidInferenceEngineOptions`, or `LiquidInferenceEngineManifestOptions`: + **Builder style on the manifest variant** — `LiquidInferenceEngineManifestOptions` exposes `.with(...)` chains that match the Kotlin builder surface: ```swift - let opts = LiquidInferenceEngineOptions(bundlePath: ggufURL.path) + let opts = LiquidInferenceEngineManifestOptions(contextSize: 8192) .with(cpuThreads: 6) - .with(contextSize: 8192) - .with(useMmap: false) .with(cacheOptions: .enabled(path: cacheDir.path)) ``` + + **Sideloaded `LiquidInferenceEngineOptions` (URL-based load).** The non-manifest variant does NOT ship a Swift convenience init in v0.10.7 — the K/N-generated designated init takes all 12 fields. Either build it fully (verbose) or use `loadSimpleModel(model: ModelSource(...))` on `ModelDownloader` (preferred for new code; see the Sideloaded files section). The builder `.with(...)` overloads exist but they create a new instance internally via the same 12-arg init, so you still need a fully-built starting instance — there is no `LiquidInferenceEngineOptions(bundlePath: …)` 1-arg form today. ```kotlin @@ -523,7 +541,6 @@ Per-load runtime overrides. Default values come from the model bundle's manifest var extras: String? = null, ) { companion object { - fun build(action: ModelLoadingOptions.() -> Unit): ModelLoadingOptions fun cacheOptions(path: String, maxEntriesDisk: Int = 40): EngineOptions.CacheOptions } } @@ -536,10 +553,10 @@ Per-load runtime overrides. Default values come from the model bundle's manifest modelName = "LFM2-1.2B", quantizationId = "Q5_K_M", ), - options = ModelLoadingOptions.build { - cpuThreads = 6 - contextSize = 4096 - } + options = ModelLoadingOptions( + cpuThreads = 6, + contextSize = 4096, + ) ) ``` @@ -583,6 +600,7 @@ data class SamplingParameters( val topP: Double? = null, val minP: Double? = null, val repetitionPenalty: Double? = null, + val topK: Int? = null, ) ``` @@ -758,9 +776,9 @@ The service requires the `POST_NOTIFICATIONS` runtime permission (Android 13+) t ### Notes -- The service ignores caller-supplied `cacheDir` paths (it maintains its own KV cache directory) — pass `cacheOptions` on `ModelLoadingOptions` to control the in-memory + disk caps, not the path. +- The service does not accept caller-supplied `cacheOptions`; it maintains its own KV cache directory and policy. `LeapModelDownloader` forwards first-class load options such as `cpuThreads`, `randomSeed`, `chatTemplate`, `contextSize`, `extras`, and `useMmap`, but intentionally omits `cacheOptions` from the AIDL parcel. Use `forceLocal = true` when you need caller-controlled KV cache settings. - First-load wins: when multiple apps request the same model simultaneously, the first call's `ModelLoadingOptions` are applied; subsequent callers receive the shared runner regardless of their options. Read the effective config back via `LeapServiceClient.getLoadedModelConfig`. -- Models stay loaded until the service is shut down or restarted. `evictUnusedModel` is a no-op by design — eviction would race with in-flight generations. +- Models stay loaded until the service is shut down or restarted. The service has no public mid-flight eviction API — caller-driven eviction would race with in-flight generations. ## `ProgressData` / `Manifest` diff --git a/deployment/on-device/sdk/openai-client.mdx b/deployment/on-device/sdk/openai-client.mdx index 09432202..b63416b2 100644 --- a/deployment/on-device/sdk/openai-client.mdx +++ b/deployment/on-device/sdk/openai-client.mdx @@ -9,7 +9,7 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs — - **Hybrid on-device + cloud routing.** Run small / fast models on-device with `LeapSDK`, fall back to a larger cloud model for hard prompts. - **Standardised cloud API.** Talk to any OpenAI-compatible backend without pulling in a heavier OpenAI SDK. -- **Streaming first.** SSE streaming is the only mode — non-streaming requests aren't exposed (`stream = true` is the default). +- **Streaming first.** SSE streaming is the only mode — non-streaming requests aren't exposed. `streamChatCompletion(...)` forces `stream = true` on the outgoing request regardless of the `stream` field on the `ChatCompletionRequest` you pass in. ## Add the dependency @@ -19,7 +19,7 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs — ```swift dependencies: [ - .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.6") + .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.7") ] targets: [ @@ -37,22 +37,32 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs — ```kotlin dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-openai-client:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-openai-client:0.10.7") } ``` Bundles an OkHttp-engine Ktor client. No extra HTTP setup needed. - + ```kotlin dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-openai-client:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-openai-client:0.10.7") } ``` - Bundles the CIO Ktor engine on JVM, and platform-appropriate engines on Linux native / Windows native. Maven users: use `leap-openai-client-jvm` for the JVM artifact. + JVM support landed in v0.10.7 (the `jvm` slice was absent in the v0.10.0–v0.10.6 cascade). Pure-Maven JVM projects should consume the `-jvm` classifier directly: `ai.liquid.leap:leap-openai-client-jvm:0.10.7`. Bundles the CIO Ktor engine. + + + ```kotlin + dependencies { + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-openai-client:0.10.7") + } + ``` + + Targets `linuxX64`, `linuxArm64`, `mingwX64` (Windows native), and `wasmJs` (browser via Ktor Js engine, added in v0.10.7). @@ -60,10 +70,23 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs — + + The `leap-sdk-openai-client` Kotlin module does **not** apply the SKIE plugin in v0.10.7 (only `leap-sdk`, `leap-sdk-model-downloader`, and `leap-ui` do). That means `Flow` is **not** bridged to a Swift `AsyncSequence` and the `onEnum(of:)` helper is **not** generated for `ChatCompletionEvent`. Swift consumers on v0.10.7 must collect the Kotlin `Flow` through its native collector and downcast each event with `as?`. For most Swift apps that just need cloud chat completions, an off-the-shelf OpenAI Swift client is more ergonomic — use `LeapOpenAIClient` from Swift only if you need to share Kotlin code with Android. + + **Coming in the next release:** SKIE will be enabled on `leap-sdk-openai-client`, adding the same Swift-friendly surface as `LeapSDK` — `for try await event in client.streamChatCompletion(...)`, `onEnum(of: event)` exhaustive switching, and nested-class Swift names (`ChatCompletionEvent.Delta` instead of the current flattened `ChatCompletionEventDelta`). Swift convenience inits and builders for `OpenAiClientConfig` are also planned. Pin to v0.10.7 if you need the current behavior frozen; otherwise expect the more ergonomic surface to land soon. + + + Manual collection pattern (the `Flow.collect(...)` shape varies by Kotlin/Native version — check the framework header in your Xcode build for the exact label): + ```swift import LeapOpenAIClient - let client = OpenAiClient( + // The Kotlin top-level `fun OpenAiClient(config: OpenAiClientConfig)` exports as + // `OpenAiClientKt.OpenAiClient(config:)` (PascalCase preserved from the Kotlin + // function name). Without SKIE the K/N export also flattens Kotlin's nested + // class names — `ChatMessage.User` → `ChatMessageUser`, + // `ChatCompletionEvent.Delta` → `ChatCompletionEventDelta`, etc. + let client = OpenAiClientKt.OpenAiClient( config: OpenAiClientConfig( apiKey: "sk-…", baseUrl: "https://api.openai.com/v1" @@ -73,24 +96,25 @@ description: "Lightweight client for OpenAI-compatible chat completions APIs — let request = ChatCompletionRequest( model: "gpt-4o-mini", messages: [ - ChatMessage.System(content: "You are a helpful assistant."), - ChatMessage.User(content: "What is the capital of Japan?") + ChatMessageSystem(content: "You are a helpful assistant."), + ChatMessageUser(content: "What is the capital of Japan?") ], temperature: 0.7 ) - for try await event in client.streamChatCompletion(request: request) { - switch onEnum(of: event) { - case .delta(let d): - print(d.content, terminator: "") - case .done(let d): - if let usage = d.usage { - print("\nTokens: \(usage.totalTokens)") + // Pseudocode — actual collector signature depends on your Kotlin/Native version + // and framework headers. Without SKIE, there is no `for try await` integration. + try await client.streamChatCompletion(request: request).collect( + collector: FlowCollector { event in + if let delta = event as? ChatCompletionEventDelta { + print(delta.content, terminator: "") + } else if let done = event as? ChatCompletionEventDone { + if let usage = done.usage { print("\nTokens: \(usage.totalTokens)") } + } else if let err = event as? ChatCompletionEventError { + print("\nError: \(err.message)") } - case .error(let e): - print("\nError: \(e.message)") } - } + ) client.close() // closes the underlying URLSession-backed HttpClient ``` @@ -157,7 +181,11 @@ data class OpenAiClientConfig( ```swift - let client = OpenAiClient( + // The leap-sdk-openai-client module has no SKIE plugin applied, so the + // top-level Kotlin `fun OpenAiClient(config:)` factory is exported as + // `OpenAiClientKt.OpenAiClient(config:)`. See the [Basic usage](#basic-usage) + // warning for the full reasoning. + let client = OpenAiClientKt.OpenAiClient( config: OpenAiClientConfig( apiKey: "sk-or-…", baseUrl: "https://openrouter.ai/api/v1", @@ -190,7 +218,7 @@ data class OpenAiClientConfig( ```swift - let client = OpenAiClient( + let client = OpenAiClientKt.OpenAiClient( config: OpenAiClientConfig( apiKey: "anything", // Required by config but typically unused baseUrl: "http://10.0.0.42:8000/v1" @@ -242,7 +270,7 @@ data class ChatCompletionRequest( ## Response shape -`streamChatCompletion(request)` returns an `AsyncSequence` (Swift) / `Flow` (Kotlin): +`streamChatCompletion(request)` returns a `Flow` (Kotlin) — and the same `Flow` is exposed verbatim to Swift in v0.10.7 (no SKIE on this module yet, so it's not bridged to a Swift `AsyncSequence`; collect it via the native `Flow.collect(...)` shape shown above). Events: | Variant | Meaning | |---|---| @@ -276,15 +304,24 @@ Route simple prompts to a small on-device LFM; escalate harder prompts to a clou func send(_ text: String, useCloud: Bool) async throws { if useCloud { + // Cloud path: leap-sdk-openai-client has no SKIE — collect the Kotlin + // Flow manually and downcast each event with `as?`. Note the flattened + // Swift type names (`ChatMessageUser`, `ChatCompletionEventDelta`). let request = ChatCompletionRequest( model: "gpt-4o-mini", - messages: [ChatMessage.User(content: text)] + messages: [ChatMessageUser(content: text)] + ) + try await cloud.streamChatCompletion(request: request).collect( + collector: FlowCollector { event in + if let delta = event as? ChatCompletionEventDelta { + appendChunk(delta.content) + } + } ) - for try await event in cloud.streamChatCompletion(request: request) { - if case let .delta(d) = onEnum(of: event) { appendChunk(d.content) } - } } else { - let userMessage = LeapModelDownloader.ChatMessage(role: .user, content: [.text(text)]) + // On-device path: leap-sdk has SKIE — `for try await` + `onEnum(of:)` + // work as written. + let userMessage = ChatMessage(role: .user, textContent: text) for try await response in onDevice.generateResponse(message: userMessage) { if case let .chunk(c) = onEnum(of: response) { appendChunk(c.text) } } @@ -300,7 +337,7 @@ Route simple prompts to a small on-device LFM; escalate harder prompts to a clou ```kotlin import ai.liquid.leap.Conversation - import ai.liquid.leap.MessageResponse + import ai.liquid.leap.message.MessageResponse import ai.liquid.leap.openai.ChatCompletionEvent import ai.liquid.leap.openai.ChatCompletionRequest import ai.liquid.leap.openai.ChatMessage as CloudChatMessage @@ -375,7 +412,7 @@ See [Cloud AI Comparison](./cloud-ai-comparison) for a side-by-side feature brea ## Lifecycle -The platform `OpenAiClient(config:)` factory creates an `HttpClient` internally and ties it to the returned client — call `close()` when you're done. +The platform `OpenAiClient(config:)` factory (Kotlin `fun OpenAiClient(config:)` → Swift `OpenAiClientKt.OpenAiClient(config:)`) creates an `HttpClient` internally and ties it to the returned client — call `close()` when you're done. @@ -383,7 +420,7 @@ The platform `OpenAiClient(config:)` factory creates an `HttpClient` internally deinit { client.close() } ``` - The lower-level constructor that accepts an externally-managed `HttpClient` is part of the Kotlin/Ktor surface and isn't a useful entry point from Swift — the Ktor engine machinery isn't bridged into the public Swift API. Use `OpenAiClient(config:)` and let the SDK own the session. If multiple consumers share a client, share the `OpenAiClient` instance and `close()` once at teardown. + The lower-level constructor that accepts an externally-managed `HttpClient` is part of the Kotlin/Ktor surface and isn't a useful entry point from Swift — the Ktor engine machinery isn't bridged into the public Swift API. Use `OpenAiClientKt.OpenAiClient(config:)` and let the SDK own the session. If multiple consumers share a client, share the `OpenAiClient` instance and `close()` once at teardown. ```kotlin diff --git a/deployment/on-device/sdk/quick-start.mdx b/deployment/on-device/sdk/quick-start.mdx index a9d5e40c..3496557e 100644 --- a/deployment/on-device/sdk/quick-start.mdx +++ b/deployment/on-device/sdk/quick-start.mdx @@ -3,9 +3,32 @@ title: "Quick Start" description: "Install the LEAP SDK on iOS, macOS, Android, JVM, Linux, or Windows — same API everywhere." --- -Latest version: `v0.10.6` +Latest version: `v0.10.7` -The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conversation` / `MessageResponse` API runs on every supported target. The code differs only in **language** (Swift vs. Kotlin) and **packaging** (SPM, Gradle, or Kotlin/Native plugin) — the call shapes are identical. +## What is the Leap SDK? + +The **Leap SDK** is Liquid AI's official on-device inference SDK and the **only SDK with first-class support for [Liquid Foundation Models](https://www.liquid.ai/blog/liquid-foundation-models-our-first-series-of-generative-ai-models) (LFMs)** — LFM2, LFM2.5 (text, thinking, JP, VL), and LFM2.5-Audio. "First-class" means every published Liquid checkpoint is supported, validated, and shipped through this SDK on day-one — the same team that trains the models ships the engine, sampler defaults, chat templates, and tool-call parsers that run them. There is no separate adapter layer, no community port, no upstream-rebase lag. + +It's also a Kotlin Multiplatform library: the same `ModelRunner` / `Conversation` / `MessageResponse` API runs on iOS, macOS, Android, JVM desktop, Linux native, Windows native, and (preview) wasmJs. The Swift surface is generated through Kotlin/Native + SKIE and ships as XCFrameworks; the Android/JVM surface ships as Maven Central artifacts. Both call shapes are identical — only the language and packaging differ. + +### What "first-class support for Liquid models" gets you + +- **Day-one model coverage.** New LFM checkpoints land in the SDK release that announces them — no waiting for a generic runtime to catch up to a new architecture, no manual quant conversion, no template-mismatch debugging. The [LEAP Model Library](https://leap.liquid.ai/models) is the canonical distribution path and the SDK pulls directly from it. +- **Per-checkpoint validated defaults.** The sampling parameters baked into each model's bundle manifest (`sampling_parameters` under `generation_time_parameters` in each `.json` on [LiquidAI/LeapBundles](https://huggingface.co/LiquidAI/LeapBundles)) are the values the training team validated for that exact checkpoint. The SDK applies them automatically — no `temperature=0.7` placeholder retuning, no token-stream artifacts from the wrong `min_p` / `repetition_penalty`. +- **LFM-native special tokens and chat templates.** The shipped engine knows how to filter LFM control tokens before they reach your stream, applies the right chat template per checkpoint, and parses LFM's hermes and pythonic function-call dialects out of the box. Generic SDKs treat these as opaque text and surface raw tokens; Leap surfaces typed `MessageResponse.FunctionCalls` with parsed argument maps. +- **Multimodal LFMs in one API.** Vision (LFM2-VL family) and audio (LFM2.5-Audio) plug into the same `ChatMessage` / `ChatMessageContent` types you already use for text. Image inputs travel as JPEG bytes; audio travels as WAV blobs (or raw float32 PCM on Kotlin via `AudioPcmF32`). Output `MessageResponse.AudioSample` streams float32 PCM frames for audio-out checkpoints. No separate runtime per modality. +- **Constrained generation, end-to-end.** Kotlin annotations (`@Generatable` / `@Guide` on `@Serializable` data classes) and Swift macros (`@Generatable` / `@Guide` synthesizing `jsonSchema()` at compile time) produce JSON Schemas the engine enforces at decode time. The model's output is guaranteed to parse into your type. +- **One-call model fetching from the LEAP Model Library.** `LeapModelDownloader.loadModel(modelName:, quantizationType:)` resolves a manifest, downloads the right GGUF + matching `mmproj`/audio-decoder companion files for the checkpoint, caches them on disk, and hands back a `ModelRunner` — one call, no manual path wiring, no companion-file detection. Background-safe on iOS (`URLSessionConfiguration.background(withIdentifier:)`), WorkManager-backed on Android (survives app restarts). + +### Other features + +- **On-device by default.** No cloud round-trip, no per-token cost, full privacy, full offline operation. +- **KV cache reuse for fast multi-turn.** Bounded-LRU disk + memory `CacheOptions` skip the prefill step for shared prompt prefixes — TTFT on a long system prompt or RAG preamble drops from seconds to under a hundred milliseconds on cache hits. Disabled by default; opt in with `LiquidCacheOptions.enabled(path:)` / `ModelLoadingOptions.cacheOptions(path = ...)`. +- **Memory-mapped weight loading.** `use_mmap=true` is the default since v0.10.4. Model weights are file-backed, not anonymous RSS — iOS jetsam and Android LMK score the app much lower under memory pressure, cold load returns as soon as the file is mapped, and warm reloads stream from the kernel page cache. +- **Hybrid on-device + cloud routing.** `leap-openai-client` ships in the same release as an opt-in OpenAI-compatible chat-completions client (OpenAI, OpenRouter, vLLM, llama-server). One binary, two code paths — route small/fast prompts on-device, fall back to a cloud model for hard ones, share the same `ChatMessage` types. +- **Drop-in voice assistant UI.** `leap-ui` ships a Compose Multiplatform voice widget — animated orb, mic button, status label, state machine — that pairs with `VoiceConversation` to wire LFM2.5-Audio into a working voice experience without writing the recording-and-playback plumbing yourself. + +Implementation deep-dives: [Model Loading](/deployment/on-device/sdk/model-loading), [Conversation & Generation](/deployment/on-device/sdk/conversation-generation), [Constrained Generation](/deployment/on-device/sdk/constrained-generation), [Function Calling](/deployment/on-device/sdk/function-calling), [Voice Assistant Widget](/deployment/on-device/sdk/voice-assistant), [OpenAI-Compatible Client](/deployment/on-device/sdk/openai-client). **Migrating from 0.9.x?** v0.10.0 unifies the SDK into a single Kotlin Multiplatform distribution published from [`Liquid4All/leap-sdk`](https://github.com/Liquid4All/leap-sdk). The standalone `Liquid4All/leap-ios` repo is no longer the source-of-truth. See the [SDK changelog](/deployment/on-device/leap-sdk-changelog#0-9-x-0-10-x-kotlin-multiplatform-unification) for the transition story and drop-in replacements for legacy `Leap.load(...)` / `LiquidEngine(...)` call sites. @@ -16,7 +39,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver - Xcode 16.0+ with Swift 6.0. - - iOS **17.0+** or macOS **15.0+** (Mac Catalyst 17.0+ also supported). + - iOS **17.0+** or macOS **15.0+** (Apple Silicon only — Mac Catalyst is **not** supported; the shipped XCFrameworks contain only `ios-arm64`, `ios-arm64-simulator`, and `macos-arm64` slices, and `Package.swift` declares only `.iOS(.v17)` / `.macOS(.v15)` platforms). - A physical iPhone or iPad with at least 3 GB RAM for best performance. The simulator works for development but runs models much slower. @@ -68,7 +91,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver 1. In Xcode choose **File → Add Package Dependencies**. 2. Enter `https://github.com/Liquid4All/leap-sdk.git`. - 3. Select the `0.10.6` release (or newer). + 3. Select the `0.10.7` release (or newer). 4. Add the products you need to your app target. The package vends five products. Most apps only need one or two: @@ -94,7 +117,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver - For explicit pinning, declare each framework as a `.binaryTarget` in your `Package.swift`. The XCFramework assets live on the `Liquid4All/leap-sdk` v0.10.6 release page — copy the SHA-256 values from there. + For explicit pinning, declare each framework as a `.binaryTarget` in your `Package.swift`. The XCFramework assets live on the `Liquid4All/leap-sdk` v0.10.7 release page — copy the SHA-256 values from there. The constrained-generation macros (`@Generatable`, `@Guide`) are Swift macros, not XCFrameworks — they ship as the `LeapSDKMacros` source target inside the SPM package and **cannot be installed as a `.binaryTarget`**. If you need them, use the standard SPM package URL above (or add the `LeapSDKMacros` source target separately on top of your binary targets). @@ -103,23 +126,23 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver ```swift .binaryTarget( name: "LeapSDK", - url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapSDK.xcframework.zip", - checksum: "236fb6c897d25fc5804be64edc16a9ee73c26678d02e58dab4a1b77ab2e4898f" + url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapSDK.xcframework.zip", + checksum: "6f2721aa45d7555646f78cbcaedb57aba3d869f56b24d681ad332846e131ae3d" ), .binaryTarget( name: "LeapModelDownloader", - url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapModelDownloader.xcframework.zip", - checksum: "a2a57f9c932ef7005d42b33b69d7a67f0ffb65fb79dffa954be99a0225932a61" + url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapModelDownloader.xcframework.zip", + checksum: "f649aa6c1aa3e87bbeb1073d5aeeb7224879359a24b18eeccc665d24abc725d8" ), .binaryTarget( name: "LeapOpenAIClient", - url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapOpenAIClient.xcframework.zip", - checksum: "b661059af8bfb086931099f8fac9f54e957272d5d6bbc9dd36e3e154fddf8222" + url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapOpenAIClient.xcframework.zip", + checksum: "79bc5443a1cce6fcd4c49c91eeb85727034aaca10d3ef69582c061989c3d9b70" ), .binaryTarget( name: "LeapUi", - url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.6/LeapUi.xcframework.zip", - checksum: "694f4b8a8d1a8cd9086ce718a9fc15f4e74c442541b983816fd0eef8cecc7875" + url: "https://github.com/Liquid4All/leap-sdk/releases/download/v0.10.7/LeapUi.xcframework.zip", + checksum: "f1b198cef88c2a37eaf6dc1f36395d6aed024b0c6c2b43724d942e25b60d22e0" ), ``` @@ -131,14 +154,14 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver ```kotlin dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-model-downloader:0.10.6") // Android background downloads + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Android background downloads // Optional: OpenAI-compatible cloud chat client - // implementation("ai.liquid.leap:leap-openai-client:0.10.6") + // implementation("ai.liquid.leap:leap-openai-client:0.10.7") // Optional: Voice assistant widget (Compose Multiplatform) - // implementation("ai.liquid.leap:leap-ui:0.10.6") + // implementation("ai.liquid.leap:leap-ui:0.10.7") } ``` @@ -147,7 +170,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver ```toml [versions] - leapSdk = "0.10.6" + leapSdk = "0.10.7" [libraries] leap-sdk = { module = "ai.liquid.leap:leap-sdk", version.ref = "leapSdk" } @@ -166,7 +189,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver ``` - Also declare these permissions in `AndroidManifest.xml` — `LeapModelDownloader` runs as a foreground service for reliable downloads: + Also declare these permissions in `AndroidManifest.xml` — `LeapModelDownloader.requestDownloadModel(...)` enqueues a WorkManager download worker that runs in the foreground while transferring model files: ```xml @@ -191,11 +214,11 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver } dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") // Optional: - // implementation("ai.liquid.leap:leap-openai-client:0.10.6") - // implementation("ai.liquid.leap:leap-ui:0.10.6") // Compose for Desktop voice widget + // implementation("ai.liquid.leap:leap-openai-client:0.10.7") + // implementation("ai.liquid.leap:leap-ui:0.10.7") // Compose for Desktop voice widget } ``` @@ -222,11 +245,11 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver // build.gradle.kts plugins { kotlin("multiplatform") version "2.3.20" - id("ai.liquid.leap.nativelibs") version "0.10.6" + id("ai.liquid.leap.nativelibs") version "0.10.7" } dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") } kotlin { @@ -242,7 +265,7 @@ The LEAP SDK is a Kotlin Multiplatform library: the same `ModelRunner` / `Conver ## 3. Load a model -The recommended path is **manifest-based** loading. On every platform, the platform downloader's `loadModel(...)` downloads (if needed) and loads in one call — `LeapModelDownloader.loadModel(...)` on iOS / macOS / Android, `LeapDownloader.loadModel(...)` on JVM and Linux / Windows Kotlin/Native. All paths fetch from the [LEAP Model Library](https://leap.liquid.ai/models) on first use and load from cache thereafter. +The recommended path is **manifest-based** loading. On every platform, the platform downloader's `loadModel(...)` downloads (if needed) and loads in one call — `ModelDownloader.loadModel(...)` on iOS / macOS, `LeapModelDownloader.loadModel(...)` on Android, and `LeapDownloader.loadModel(...)` on JVM and Linux / Windows Kotlin/Native. All paths fetch from the [LEAP Model Library](https://leap.liquid.ai/models) on first use and load from cache thereafter. @@ -298,8 +321,8 @@ The recommended path is **manifest-based** loading. On every platform, the platf import androidx.lifecycle.viewModelScope import ai.liquid.leap.Conversation import ai.liquid.leap.ModelRunner - import ai.liquid.leap.model_downloader.LeapModelDownloader - import ai.liquid.leap.model_downloader.LeapModelDownloaderNotificationConfig + import ai.liquid.leap.downloader.LeapModelDownloader + import ai.liquid.leap.downloader.LeapModelDownloaderNotificationConfig import kotlinx.coroutines.Dispatchers import kotlinx.coroutines.flow.MutableStateFlow import kotlinx.coroutines.flow.StateFlow @@ -313,7 +336,7 @@ The recommended path is **manifest-based** loading. On every platform, the platf notificationConfig = LeapModelDownloaderNotificationConfig.build { notificationTitleDownloading = "Downloading AI model..." notificationTitleDownloaded = "Model ready!" - notificationContentDownloading = "Please wait while the model downloads" + notificationContentDownloadingTemplate = "Please wait while the model downloads" } ) @@ -351,8 +374,8 @@ The recommended path is **manifest-based** loading. On every platform, the platf ```kotlin - import ai.liquid.leap.LeapDownloader - import ai.liquid.leap.LeapDownloaderConfig + import ai.liquid.leap.manifest.LeapDownloader + import ai.liquid.leap.manifest.LeapDownloaderConfig import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.MessageResponse import kotlinx.coroutines.runBlocking @@ -372,7 +395,9 @@ The recommended path is **manifest-based** loading. On every platform, the platf val conversation = runner.createConversation(systemPrompt = "You are a helpful assistant.") - conversation.generateResponse(ChatMessage.user("Hello!")).collect { resp -> + conversation.generateResponse( + ChatMessage(ChatMessage.Role.USER, "Hello!") + ).collect { resp -> when (resp) { is MessageResponse.Chunk -> print(resp.text) is MessageResponse.Complete -> println("\n[done]") @@ -431,12 +456,16 @@ Both platforms expose the same streaming shape: an async sequence of `MessageRes func send(_ text: String) { guard let conversation else { return } generationTask?.cancel() - let userMessage = ChatMessage(role: .user, content: [.text(text)]) + let userMessage = ChatMessage(role: .user, textContent: text) + let options = GenerationOptions() + .with(temperature: 0.3) + .with(minP: 0.15) + .with(repetitionPenalty: 1.05) generationTask = Task { [weak self] in do { for try await response in conversation.generateResponse( message: userMessage, - generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05) + generationOptions: options ) { self?.handle(response) } @@ -479,7 +508,7 @@ Both platforms expose the same streaming shape: an async sequence of `MessageRes ?.onEach { response -> when (response) { is MessageResponse.Chunk -> _responseText.value += response.text - is MessageResponse.ReasoningChunk -> Log.d(TAG, "Reasoning: ${response.text}") + is MessageResponse.ReasoningChunk -> Log.d(TAG, "Reasoning: ${response.reasoning}") is MessageResponse.FunctionCalls -> handleFunctionCalls(response.functionCalls) is MessageResponse.AudioSample -> audioRenderer.enqueue(response.samples, response.sampleRate) is MessageResponse.Complete -> Log.d(TAG, "Done. Stats: ${response.stats}") @@ -501,22 +530,28 @@ Cancel the in-flight task (Swift) or coroutine job (Kotlin) to interrupt generat If the loaded model is multimodal (and its companion files were detected), you can attach a non-text part — an image, a WAV blob, or raw PCM samples — alongside the text in a `ChatMessage`. -**Multimodality is model-specific.** Most multimodal models we ship are text + one other modality: text + vision (the VLM family) or text + audio (the audio family) — not both in the same checkpoint. Send `.image(...)` parts only to a vision-capable model, and `.audio(...)` / `.fromFloatSamples(...)` parts only to an audio-capable model. Mixing modalities a model wasn't trained on will either fail to load the companion file or produce nonsense. Check the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up a non-text input path. +**Multimodality is model-specific.** Most multimodal models we ship are text + one other modality: text + vision (the VLM family) or text + audio (the audio family) — not both in the same checkpoint. Send image content (`fromJPEGData(_:)`, `image(url:)`, `fromBitmap(...)` / `fromUIImage(_:)`) only to a vision-capable model, and audio content (`fromWAVData(_:)`, `fromFloatSamples(_:sampleRate:)`) only to an audio-capable model. Mixing modalities a model wasn't trained on will either fail to load the companion file or produce nonsense. Check the model's [Hugging Face card](https://huggingface.co/LiquidAI) before wiring up a non-text input path. ```swift - // Text + image (vision-capable model) + // Text + image (vision-capable model). Use `ChatMessageContent.fromJPEGData(_:)` + // for raw JPEG bytes, or `.image(url:)` for a data URL / remote URL. let imageMessage = ChatMessage( role: .user, - content: [.text("Describe what you see."), .image(jpegData)] + content: [.text("Describe what you see."), ChatMessageContent.fromJPEGData(jpegData)], + reasoningContent: nil, + functionCalls: nil ) - // Text + WAV audio (audio-capable model) + // Text + WAV audio (audio-capable model). `fromWAVData` validates the header; + // use `.audio(data:format:)` if you already know the bytes are a supported format. let wavMessage = ChatMessage( role: .user, - content: [.text("Transcribe and summarize this clip."), .audio(wavData)] + content: [.text("Transcribe and summarize this clip."), ChatMessageContent.fromWAVData(wavData)], + reasoningContent: nil, + functionCalls: nil ) // Text + raw PCM samples (audio-capable model) @@ -525,14 +560,17 @@ If the loaded model is multimodal (and its companion files were detected), you c content: [ .text("Give feedback on my pronunciation."), ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000) - ] + ], + reasoningContent: nil, + functionCalls: nil ) ``` ```kotlin // Text + image (vision-capable model) - val imageMessage = ChatMessage.user( + val imageMessage = ChatMessage( + role = ChatMessage.Role.USER, content = listOf( ChatMessageContent.Text("Describe what you see."), ChatMessageContent.Image(jpegBytes) @@ -540,7 +578,8 @@ If the loaded model is multimodal (and its companion files were detected), you c ) // Text + WAV audio (audio-capable model) - val wavMessage = ChatMessage.user( + val wavMessage = ChatMessage( + role = ChatMessage.Role.USER, content = listOf( ChatMessageContent.Text("Transcribe and summarize this clip."), ChatMessageContent.Audio(wavBytes) @@ -548,7 +587,8 @@ If the loaded model is multimodal (and its companion files were detected), you c ) // Text + raw PCM samples (audio-capable model) - val pcmMessage = ChatMessage.user( + val pcmMessage = ChatMessage( + role = ChatMessage.Role.USER, content = listOf( ChatMessageContent.Text("Give feedback on my pronunciation."), ChatMessageContent.AudioPcmF32(samples, sampleRate = 16000) diff --git a/deployment/on-device/sdk/utilities.mdx b/deployment/on-device/sdk/utilities.mdx index 4766013a..e57ad7c1 100644 --- a/deployment/on-device/sdk/utilities.mdx +++ b/deployment/on-device/sdk/utilities.mdx @@ -9,14 +9,14 @@ This page covers error types, serialization helpers, and a few platform-specific - Errors surface as `LeapError` values. The most common cases: + Errors are subclasses of `LeapException` (`LeapError` is a type alias for `LeapException` provided for backward compatibility). The most common subclasses: - - **`LeapError.modelLoadingFailure`** — problems reading or validating the model bundle. - - **`LeapError.generationFailure`** — unexpected native inference errors. - - **`LeapError.promptExceedContextLengthFailure`** — prompt length exceeded the configured context size. - - **`LeapError.serializationFailure`** — JSON encoding/decoding problems on chat history or function calls. + - **`LeapModelLoadingException`** — problems reading or validating the model bundle. + - **`LeapGenerationException`** — unexpected native inference errors. + - **`LeapGenerationPromptExceedContextLengthException`** — prompt length exceeded the configured context size. + - **`LeapSerializationException`** — JSON encoding/decoding problems on chat history or function calls. - Handle thrown errors with `do` / `catch` on async streams, or use `onErrorCallback` on the lower-level callback APIs. + Handle thrown errors with `do` / `catch` on the async streams returned by `Conversation.generateResponse(...)`, or downcast with `if let err = error as? LeapModelLoadingException { ... }` to inspect a specific subclass. All errors are subclasses of `LeapException`: @@ -38,19 +38,21 @@ This page covers error types, serialization helpers, and a few platform-specific - Use the JSON initializers directly on `ChatMessage` and `ChatMessageContent`: + Use `Conversation.exportToJSON()` to get an OpenAI-shaped JSON string, then route restores back through Kotlin's serializer (there is no `ChatMessage(from: [String: Any])` initializer): ```swift - // Serialize the conversation history - let payload: [[String: Any]] = try conversation.exportToJSON() - let data = try JSONSerialization.data(withJSONObject: payload, options: []) - - // Round-trip a single message - let json: [String: Any] = ["role": "user", "content": "Hello"] - let message = try ChatMessage(from: json) + // Serialize the conversation history (compact JSON string, OpenAI chat-completions shape) + let jsonString: String = conversation.exportToJSON() + let data: Data = Data(jsonString.utf8) + + // Restore — `LeapJson.decodeFromString(...)` is the Kotlin-side decoder. + // For Swift-only round trips, persist the `jsonString` and pass it back to + // `modelRunner.createConversationFromHistory(history:)` after rebuilding the + // `[ChatMessage]` list via your shared Kotlin code, or use a server-side + // round-trip that talks to your sync backend. ``` - Persist `data` to disk, UserDefaults, or your sync backend. On restore, decode it back to `[[String: Any]]`, map each entry through `ChatMessage(from:)`, and rebuild via `modelRunner.createConversationFromHistory(history:)`. + Persist `data` to disk, UserDefaults, or your sync backend. On restore, decode the JSON via Kotlin's `LeapJson` (re-exported through SKIE) into a `[ChatMessage]` and rebuild via `modelRunner.createConversationFromHistory(history:)`. There is no Swift-native dictionary-based `ChatMessage` initializer — the Kotlin serializer is the source of truth on both platforms. The SDK uses [kotlinx.serialization](https://github.com/Kotlin/kotlinx.serialization) — `@Serializable` is already declared on the relevant types in the core SDK. @@ -102,11 +104,11 @@ This page covers error types, serialization helpers, and a few platform-specific This section is **Android-only**. iOS / macOS callers use the Swift `ModelDownloader` (shipped in the `LeapModelDownloader` SPM product), which routes transfers through `URLSession` — see [Model Loading → Constructing the downloader](./model-loading#constructing-the-downloader) for background-session configuration. The cross-platform `LeapDownloader` (used directly on JVM, Linux native, Windows native) is a plain async fetcher with no platform background-service hooks. -Beyond the high-level `loadModel` / `loadSimpleModel` / `downloadModel` methods covered in [Model Loading](./model-loading), the Android `LeapModelDownloader` exposes a few lower-level methods for background staging, status polling, and service control. +Beyond the high-level `loadModel` / `loadSimpleModel` methods covered in [Model Loading](./model-loading), the Android `LeapModelDownloader` exposes a few lower-level methods for WorkManager background staging and status polling. ### Permission setup -The downloader runs as a [foreground service](https://developer.android.com/develop/background-work/services/fgs) and displays notifications. Declare these in your `AndroidManifest.xml`: +`requestDownloadModel(...)` enqueues a WorkManager download worker. During transfer, the worker runs in the foreground and displays notifications, so declare these in your `AndroidManifest.xml`: ```xml @@ -141,43 +143,61 @@ if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) { class LeapModelDownloader( private val context: Context, modelFileDir: File? = null, - private val extraHTTPRequestHeaders: Map = mapOf(), private val notificationConfig: LeapModelDownloaderNotificationConfig = LeapModelDownloaderNotificationConfig(), + private val downloaderConfig: LeapDownloaderConfig = LeapDownloaderConfig(), + private val ioDispatcher: CoroutineDispatcher = Dispatchers.IO, ) { - fun requestDownloadModel(modelName: String, quantizationType: String, forceDownload: Boolean = false) - fun requestStopDownload(modelName: String, quantizationType: String) + suspend fun requestDownloadModel(modelName: String, quantizationType: String, forceDownload: Boolean = false) + suspend fun requestStopDownload(modelName: String, quantizationType: String) suspend fun queryStatus(modelName: String, quantizationType: String): ModelDownloadStatus - fun observeDownloadProgress(modelName: String, quantizationType: String): Flow + fun observeDownloadProgress(modelName: String, quantizationType: String): StateFlow fun getModelResourceFolder(modelName: String, quantizationType: String): File - fun requestStopService() -} -sealed interface ModelDownloadStatus { - data object NotOnLocal : ModelDownloadStatus - data class DownloadInProgress( - val totalSizeInBytes: Long, - val downloadedSizeInBytes: Long, - ) : ModelDownloadStatus - data class Downloaded(val totalSizeInBytes: Long) : ModelDownloadStatus + @Deprecated("No longer needed with WorkManager - downloads are managed automatically") + suspend fun requestStopService() + + // `ModelDownloadStatus` is nested under `LeapModelDownloader`. + sealed interface ModelDownloadStatus { + data object NotOnLocal : ModelDownloadStatus + data class DownloadInProgress( + val totalSizeInBytes: Long, + val downloadedSizeInBytes: Long, + ) : ModelDownloadStatus + data class Downloaded(val totalSizeInBytes: Long) : ModelDownloadStatus + } + + class ModelDownloadProgress { + var totalSizeInBytes: Long + var downloadedSizeInBytes: Long + val progress: Double + } } ``` -- **`requestDownloadModel`** — fire-and-forget download via WorkManager. Returns immediately; the download survives app restarts. -- **`requestStopDownload`** — cancel an in-flight background download. -- **`queryStatus`** — one-shot status check. -- **`observeDownloadProgress`** — `Flow` for UI updates during a background download. +Refer to the nested status type as `LeapModelDownloader.ModelDownloadStatus.NotOnLocal` / `.DownloadInProgress` / `.Downloaded` on Android — the Android downloader does not expose a top-level `ai.liquid.leap.downloader.ModelDownloadStatus`. (Apple ships a top-level `ai.liquid.leap.downloader.ModelDownloadStatus` `sealed interface` with a different payload — `DownloadInProgress(progress: Double)` and a `data object Downloaded` with no size — so don't share status-decoding code unmodified across platforms.) + +- **`requestDownloadModel`** — `suspend` fire-and-forget prefetch. It enqueues a unique WorkManager download worker; the download itself survives app restarts, and the call returns after staging the work request. +- **`requestStopDownload`** — `suspend`; cancels an in-flight background download. +- **`queryStatus`** — `suspend` one-shot status check. +- **`observeDownloadProgress`** — `StateFlow` for UI updates during a background download. It emits `null` when no download is active. - **`getModelResourceFolder`** — the directory the SDK will use for this model+quantization on disk. -- **`requestStopService`** — gracefully stop the foreground service (it auto-stops when no work is queued, but you can force it). +- **`requestStopService`** — `@Deprecated` no-op since v0.10.6 (WorkManager handles the worker lifecycle automatically). Kept for source compatibility; new code shouldn't call it. ### Removing a downloaded model -Use the cross-platform `LeapDownloader.deleteModelResources(...)` to clean up disk: +Use the Android downloader's resource folder to clean up disk, or construct a cross-platform `LeapDownloader` with the same `saveDir` and call its instance method `deleteModelResources(...)`: ```kotlin -LeapDownloader.deleteModelResources( +val resourceFolder = downloader.getModelResourceFolder( + modelName = "LFM2-1.2B", + quantizationType = "Q5_K_M", +) +resourceFolder.deleteRecursively() + +// Equivalent when you know the saveDir: +LeapDownloader(LeapDownloaderConfig(saveDir = resourceFolder.parentFile!!.absolutePath)).deleteModelResources( modelName = "LFM2-1.2B", quantizationType = "Q5_K_M", - baseDir = baseDir, // same dir LeapModelDownloader / LeapDownloader was configured with ) ``` @@ -198,14 +218,17 @@ A minimal end-to-end snippet exercising load → conversation → tool registrat ) let conversation = runner.createConversation(systemPrompt: "You are a travel assistant.") - conversation.registerFunction(weatherFunction) + conversation.registerFunction(function: weatherFunction) - var options = GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05) - try options.setResponseFormat(type: TripRecommendation.self) + let options = GenerationOptions() + .with(temperature: 0.3) + .with(minP: 0.15) + .with(repetitionPenalty: 1.05) + .with(jsonSchema: TripRecommendation.jsonSchema()) let userMessage = ChatMessage( role: .user, - content: [.text("Plan a 3-day trip to Kyoto with food highlights")] + textContent: "Plan a 3-day trip to Kyoto with food highlights" ) for try await response in conversation.generateResponse( @@ -218,7 +241,11 @@ A minimal end-to-end snippet exercising load → conversation → tool registrat ```kotlin - val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir)) + // `LeapDownloaderConfig.saveDir` is a `String` (filesystem path) — on Android, + // pass `cacheDir.absolutePath`, not the `File` itself. On Android, prefer + // `LeapModelDownloader(application)` (the cross-platform `LeapDownloader` works + // too, but doesn't integrate with WorkManager). + val downloader = LeapDownloader(LeapDownloaderConfig(saveDir = cacheDir.absolutePath)) val runner = downloader.loadModel( modelName = "LFM2.5-1.2B-Instruct", quantizationType = "Q4_K_M" @@ -231,10 +258,10 @@ A minimal end-to-end snippet exercising load → conversation → tool registrat temperature = 0.3f minP = 0.15f repetitionPenalty = 1.05f - setResponseFormatType(TripRecommendation::class) + setResponseFormatType() } - val userMessage = ChatMessage.user("Plan a 3-day trip to Kyoto with food highlights") + val userMessage = ChatMessage(ChatMessage.Role.USER, "Plan a 3-day trip to Kyoto with food highlights") conversation.generateResponse(userMessage, options).onEach(::process).collect() ``` diff --git a/deployment/on-device/sdk/voice-assistant.mdx b/deployment/on-device/sdk/voice-assistant.mdx index 1784dd0c..14f18848 100644 --- a/deployment/on-device/sdk/voice-assistant.mdx +++ b/deployment/on-device/sdk/voice-assistant.mdx @@ -11,7 +11,7 @@ The `leap-ui` module (introduced in v0.10.0) ships a ready-to-use voice assistan - **macOS** — bridged to AppKit via `VoiceAssistantNSViewController`. SwiftUI hosts via `NSViewControllerRepresentable` + `NSHostingController`. - **Android** — direct Compose for Android. - **JVM Desktop** — Compose for Desktop. Same Maven artifact; you provide audio I/O implementations (the demo apps in `leap-ui-demo/` ship patterns you can adapt). -- **Web (Wasm, experimental)** — present in the source tree (`leap-ui-demo/web`) but not yet covered by the v0.10.6 stable release notes — treat as preview. +- **Web (Wasm, experimental)** — present in the source tree (`leap-ui-demo/web`) but not yet covered by the stable release notes through v0.10.7 — treat as preview. ## Add the dependency @@ -21,7 +21,7 @@ The `leap-ui` module (introduced in v0.10.0) ships a ready-to-use voice assistan ```swift dependencies: [ - .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.6") + .package(url: "https://github.com/Liquid4All/leap-sdk.git", from: "0.10.7") ] targets: [ @@ -46,12 +46,12 @@ The `leap-ui` module (introduced in v0.10.0) ships a ready-to-use voice assistan ```kotlin dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-ui:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-ui:0.10.7") } ``` - `leap-ui` brings in Compose runtime, foundation, and material3 transitively. If your project doesn't already use Compose, add the standard Compose dependencies too. + `leap-ui` depends on Compose runtime, foundation, and material3 internally (with `implementation` scope), so the runtime artifacts are pulled in but their APIs are not re-exported to consumer source. If your project uses Compose directly, declare the same Compose dependencies in your own module. @@ -103,9 +103,11 @@ The `VoiceConversation` adapter looks similar on every platform — both impleme modelName: "LFM2.5-Audio-1.5B", quantizationType: "Q4_0", downloadProgress: { fraction, _ in + // `fraction` is `Double` from the Kotlin (Double, Long) -> Unit + // closure; `setModelProgress.fraction` is `Float`, so cast. Task { @MainActor in self.store.setModelProgress( - fraction: fraction, + fraction: Float(fraction), message: "Downloading (\(Int(fraction * 100))%)" ) } @@ -136,7 +138,7 @@ The `VoiceConversation` adapter looks similar on every platform — both impleme ```kotlin - import ai.liquid.leap.model_downloader.LeapModelDownloader + import ai.liquid.leap.downloader.LeapModelDownloader import ai.liquid.leap.ui.VoiceAssistantIntent import ai.liquid.leap.ui.VoiceAssistantStore import ai.liquid.leap.ui.VoiceAssistantStoreState @@ -258,6 +260,7 @@ The `VoiceConversation` adapter looks similar on every platform — both impleme ```kotlin import ai.liquid.leap.ui.VoiceAssistantWidget + import android.os.Bundle import androidx.activity.ComponentActivity import androidx.activity.compose.setContent import androidx.compose.foundation.background @@ -299,8 +302,11 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w + The `VoiceConversation` protocol comes from `LeapUI`, so its `audioSamples` and `onAudioChunk` parameters use `LeapUi.KotlinFloatArray` / `LeapUi.KotlinInt` — not native Swift `[Float]` / `Int32`. The on-device runner lives in `LeapSDK`, which has its own `LeapSDK.KotlinFloatArray`. Bridge between the two via the `floatArrayToNSData` / `nsDataToFloatArray` helpers exposed in both frameworks (see `leap-ui-demo/shared/AppleVoiceConversation.swift` for the canonical pattern). + ```swift import LeapModelDownloader + import LeapSDK import LeapUi final class AppleVoiceConversation: VoiceConversation { @@ -310,21 +316,34 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w self.conversation = conversation } + // Note: this method is `__generateResponse` in the SKIE-generated overlay + // because `LeapUI` and `LeapSDK` are separate frameworks with separate Kotlin + // runtimes. The runtime-types-as-parameters force the underscore prefix. func generateResponse( - audioSamples: [Float], + audioSamples: LeapUi.KotlinFloatArray, sampleRate: Int32, - onAudioChunk: @escaping (_ samples: [Float], _ sampleRate: Int32) -> Void - ) async throws -> GenerationStats? { + onAudioChunk: @escaping (LeapUi.KotlinFloatArray, LeapUi.KotlinInt) -> Void + ) async throws -> Leap_sdkGenerationStats? { + // LeapUi.KotlinFloatArray -> Swift [Float] (for use inside this method body): + let nsData = LeapUi.ArrayConversionsKt.floatArrayToNSData(array: audioSamples) + let samples: [Float] = nsData.withUnsafeBytes { Array($0.bindMemory(to: Float.self)) } + + let audioContent = ChatMessageContent.fromFloatSamples(samples, sampleRate: Int(sampleRate)) let userMessage = ChatMessage( role: .user, - content: [ChatMessageContent.fromFloatSamples(audioSamples, sampleRate: Int(sampleRate))] + content: [audioContent as ChatMessageContent], + reasoningContent: nil, + functionCalls: nil ) - var stats: GenerationStats? + var stats: Leap_sdkGenerationStats? for try await response in conversation.generateResponse(message: userMessage) { switch onEnum(of: response) { case .audioSample(let chunk): - onAudioChunk(chunk.samples, Int32(chunk.sampleRate)) + // Bridge LeapSDK.KotlinFloatArray -> LeapUi.KotlinFloatArray via NSData. + let data = LeapSDK.ArrayConversionsKt.floatArrayToNSData(array: chunk.samples) + let uiSamples = LeapUi.ArrayConversionsKt.nsDataToFloatArray(data: data) + onAudioChunk(uiSamples, LeapUi.KotlinInt(value: chunk.sampleRate)) case .complete(let c): stats = c.stats case .chunk, .reasoningChunk, .functionCalls: @@ -335,7 +354,9 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w } func reset() -> VoiceConversation { - AppleVoiceConversation(conversation: conversation.modelRunner.createConversation()) + AppleVoiceConversation( + conversation: conversation.modelRunner.createConversation(systemPrompt: nil) + ) } } ``` @@ -343,11 +364,11 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w ```kotlin import ai.liquid.leap.Conversation - import ai.liquid.leap.MessageResponse + import ai.liquid.leap.audio.FloatAudioBuffer import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.ChatMessageContent import ai.liquid.leap.message.GenerationStats - import ai.liquid.leap.message.encodePcm16Wav + import ai.liquid.leap.message.MessageResponse import ai.liquid.leap.ui.VoiceConversation class LeapVoiceConversation(private val conv: Conversation) : VoiceConversation { @@ -357,10 +378,10 @@ The store calls into a `VoiceConversation` you provide. A minimal adapter that w sampleRate: Int, onAudioChunk: (samples: FloatArray, sampleRate: Int) -> Unit, ): GenerationStats? { - val wavBytes = encodePcm16Wav(audioSamples, sampleRate) + // Send raw float32 PCM directly — no WAV re-encode needed. val userMessage = ChatMessage( role = ChatMessage.Role.USER, - content = listOf(ChatMessageContent.Audio(wavBytes)), + content = listOf(ChatMessageContent.AudioPcmF32(audioSamples, sampleRate)), ) var stats: GenerationStats? = null diff --git a/examples/android/leap-koog-agent.mdx b/examples/android/leap-koog-agent.mdx index 6bf77ad0..4049c64d 100644 --- a/examples/android/leap-koog-agent.mdx +++ b/examples/android/leap-koog-agent.mdx @@ -71,9 +71,9 @@ Before running this example, ensure you have the following: This example requires: - - **Minimum SDK**: API 24 (Android 7.0) - - **Target SDK**: API 34 or higher - - **Kotlin**: 1.9.0 or higher + - **Minimum SDK**: API 31 (Android 12) + - **Target SDK**: API 36 + - **Kotlin**: 2.3.0 or higher **Hardware recommendations:** - At least 4GB RAM (agents require more memory for reasoning) @@ -87,17 +87,17 @@ Before running this example, ensure you have the following: # Ensure device is connected adb devices - # Create directory - adb shell mkdir -p /tmp/models + # Create directory (world-readable so the app can read it) + adb shell mkdir -p /data/local/tmp/liquid/ # Push the GGUF model file - adb push lfm2-1.2b-q5_k_m.gguf /tmp/models/ + adb push lfm2-1.2b-q5_k_m.gguf /data/local/tmp/liquid/ # Verify deployment - adb shell ls -lh /tmp/models/ + adb shell ls -lh /data/local/tmp/liquid/ ``` - **Note:** The path `/tmp/models` is used in this example. If you deploy to a different location, update the `modelPath` in your app code accordingly. The example snippets below use `loadSimpleModel(model: ModelSource(...))` to load the sideloaded file; switch to `loadModel(modelName:, quantizationType:)` if you'd rather have the SDK download the model automatically. + **Note:** Apps cannot read `/tmp/` on Android — use `/data/local/tmp//` for ADB-pushed assets (matches the other Android examples). If you deploy to a different location, update the `modelPath` in your app code accordingly. The example snippets below use `loadSimpleModel(model: ModelSource(...))` to load the sideloaded file; switch to `loadModel(modelName:, quantizationType:)` if you'd rather have the SDK download the model automatically. @@ -106,8 +106,8 @@ Before running this example, ensure you have the following: ```kotlin dependencies { // LeapSDK for on-device AI (0.10.0+) - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-model-downloader:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Koog framework for AI agents implementation("ai.koog:koog-agents:0.5.0") @@ -139,9 +139,9 @@ Follow these steps to build and run AI agents on Android: cd LeapSDK-Examples/Android/LeapKoogAgent ``` -2. **Deploy the model bundle** +2. **Deploy the model** - Follow the ADB commands in the setup section above - - Ensure the bundle is at `/tmp/models/lfm2-1.2b-tool.bundle` + - Ensure the GGUF is at `/data/local/tmp/liquid/lfm2-1.2b-q5_k_m.gguf` 3. **Open in Android Studio** - Launch Android Studio @@ -176,7 +176,7 @@ Load the LEAP model first, then bridge it to a Koog agent. The Koog APIs below a ```kotlin import ai.liquid.leap.ModelRunner import ai.liquid.leap.manifest.ModelSource -import ai.liquid.leap.model_downloader.LeapModelDownloader +import ai.liquid.leap.downloader.LeapModelDownloader class AgentViewModel(application: Application) : AndroidViewModel(application) { private val downloader = LeapModelDownloader(application) @@ -188,7 +188,7 @@ class AgentViewModel(application: Application) : AndroidViewModel(application) { // Sideloaded GGUF that was pushed via ADB (see Model Setup). runner = downloader.loadSimpleModel( model = ModelSource( - modelPath = "/tmp/models/lfm2-1.2b-tool.gguf", + modelPath = "/data/local/tmp/liquid/lfm2-1.2b-q5_k_m.gguf", modelName = "LFM2-1.2B", quantizationId = "Q5_K_M", ), diff --git a/examples/android/recipe-generator-constrained-output.mdx b/examples/android/recipe-generator-constrained-output.mdx index 20272ccf..67efce03 100644 --- a/examples/android/recipe-generator-constrained-output.mdx +++ b/examples/android/recipe-generator-constrained-output.mdx @@ -63,9 +63,9 @@ Before running this example, ensure you have the following: This example requires: - - **Minimum SDK**: API 24 (Android 7.0) - - **Target SDK**: API 34 or higher - - **Kotlin**: 1.9.0 or higher + - **Minimum SDK**: API 31 (Android 12) + - **Target SDK**: API 36 + - **Kotlin**: 2.3.0 or higher - **LeapSDK**: 0.10.0 or higher - **Internet connectivity**: Required for first-time model download @@ -105,8 +105,8 @@ Before running this example, ensure you have the following: ```kotlin dependencies { // LeapSDK + the Android downloader module - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-model-downloader:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Kotlin serialization for type-safe parsing implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.6.0") @@ -148,7 +148,7 @@ Follow these steps to generate structured recipes: 3. **Gradle sync** - Wait for Gradle to sync all dependencies - - Ensure LeapSDK 0.10.6 is downloaded + - Ensure LeapSDK 0.10.7 is downloaded 4. **Run the app** - Connect your Android device or start an emulator @@ -207,13 +207,12 @@ data class Ingredient( Annotate the data class with `@Generatable`. LeapSDK derives the JSON schema from the Kotlin types and enforces it during generation — no hand-written schema string required. ```kotlin -import ai.liquid.leap.Generatable -import ai.liquid.leap.Guide +import ai.liquid.leap.structuredoutput.Generatable +import ai.liquid.leap.structuredoutput.Guide import kotlinx.serialization.Serializable -@Generatable @Serializable -@Guide("A complete recipe with metadata, ingredients, and instructions.") +@Generatable("A complete recipe with metadata, ingredients, and instructions.") data class Recipe( val name: String, val description: String, @@ -227,8 +226,8 @@ data class Recipe( val tags: List, ) -@Generatable @Serializable +@Generatable("A single recipe ingredient with amount and unit.") data class Ingredient( val item: String, val amount: String, @@ -243,7 +242,7 @@ import ai.liquid.leap.GenerationOptions import ai.liquid.leap.ModelRunner import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.MessageResponse -import ai.liquid.leap.model_downloader.LeapModelDownloader +import ai.liquid.leap.downloader.LeapModelDownloader import android.app.Application import androidx.lifecycle.AndroidViewModel import androidx.lifecycle.viewModelScope @@ -283,7 +282,7 @@ class MainActivityViewModel(application: Application) : AndroidViewModel(applica ### Generate Structured Recipes -`GenerationOptions.build { setResponseFormatType(Recipe::class) }` tells the engine to constrain the stream to the schema derived from `@Generatable`. The streamed `Chunk` values arrive as JSON; concatenate them and decode at the end with `kotlinx-serialization`. +`GenerationOptions.build { setResponseFormatType() }` tells the engine to constrain the stream to the schema derived from `@Generatable`. The streamed `Chunk` values arrive as JSON; concatenate them and decode at the end with `kotlinx-serialization`. ```kotlin fun generateRecipe(userInput: String) { @@ -304,12 +303,12 @@ fun generateRecipe(userInput: String) { temperature = 0.3f minP = 0.15f repetitionPenalty = 1.05f - setResponseFormatType(Recipe::class) + setResponseFormatType() } try { val buffer = StringBuilder() - conversation.generateResponse(ChatMessage.user(prompt), options) + conversation.generateResponse(ChatMessage(ChatMessage.Role.USER, prompt), options) .onEach { resp -> if (resp is MessageResponse.Chunk) buffer.append(resp.text) } diff --git a/examples/android/slogan-generator.mdx b/examples/android/slogan-generator.mdx index 177450db..0bc295da 100644 --- a/examples/android/slogan-generator.mdx +++ b/examples/android/slogan-generator.mdx @@ -37,9 +37,9 @@ Before running this example, ensure you have the following: This example requires: - - **Minimum SDK**: API 24 (Android 7.0) - - **Target SDK**: API 34 or higher - - **Kotlin**: 1.9.0 or higher + - **Minimum SDK**: API 31 (Android 12) + - **Target SDK**: API 36 + - **Kotlin**: 2.3.0 or higher @@ -47,7 +47,8 @@ Before running this example, ensure you have the following: ```kotlin dependencies { - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Android UI components implementation("androidx.appcompat:appcompat:1.6.1") @@ -147,7 +148,7 @@ import ai.liquid.leap.GenerationOptions import ai.liquid.leap.ModelRunner import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.MessageResponse -import ai.liquid.leap.model_downloader.LeapModelDownloader +import ai.liquid.leap.downloader.LeapModelDownloader import kotlinx.coroutines.MainScope import kotlinx.coroutines.flow.collect import kotlinx.coroutines.flow.onEach @@ -191,7 +192,7 @@ class MainActivity : AppCompatActivity() { minP = 0.15f repetitionPenalty = 1.05f } - conversation.generateResponse(ChatMessage.user(prompt), options) + conversation.generateResponse(ChatMessage(ChatMessage.Role.USER, prompt), options) .onEach { resp -> if (resp is MessageResponse.Chunk) { sloganOutput.append(resp.text) diff --git a/examples/android/vision-language-model-example.mdx b/examples/android/vision-language-model-example.mdx index 85656f4a..3b8f7586 100644 --- a/examples/android/vision-language-model-example.mdx +++ b/examples/android/vision-language-model-example.mdx @@ -22,7 +22,7 @@ The VLMExample showcases cutting-edge multimodal AI capabilities: - **On-device Inference** - Complete privacy with local VLM processing - **Interactive Q&A** - Ask questions about images and get contextual answers -This example demonstrates the **LFM2-VL-1.6B** model, a vision-language model that can understand and reason about visual content. +This example demonstrates the **LFM2.5-VL-1.6B** model, a vision-language model that can understand and reason about visual content. ## What are Vision Language Models? @@ -59,9 +59,9 @@ Before running this example, ensure you have the following: This example requires: - - **Minimum SDK**: API 24 (Android 7.0) - - **Target SDK**: API 34 or higher - - **Kotlin**: 1.9.0 or higher + - **Minimum SDK**: API 31 (Android 12) + - **Target SDK**: API 36 + - **Kotlin**: 2.3.0 or higher **Hardware recommendations:** - At least 4GB RAM (6GB+ recommended for better performance) @@ -69,7 +69,7 @@ Before running this example, ensure you have the following: - This example requires the **LFM2-VL-1.6B** vision language model bundle. + This example requires the **LFM2.5-VL-1.6B** vision language model bundle. **Step 1: Obtain the model bundle** @@ -92,8 +92,8 @@ Before running this example, ensure you have the following: ```kotlin dependencies { // LeapSDK for VLM processing (0.10.0+) - implementation("ai.liquid.leap:leap-sdk:0.10.6") - implementation("ai.liquid.leap:leap-model-downloader:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Coil for image loading implementation("io.coil-kt:coil-compose:2.5.0") @@ -196,8 +196,9 @@ import ai.liquid.leap.GenerationOptions import ai.liquid.leap.ModelRunner import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.ChatMessageContent +import ai.liquid.leap.message.ImageUtils import ai.liquid.leap.message.MessageResponse -import ai.liquid.leap.model_downloader.LeapModelDownloader +import ai.liquid.leap.downloader.LeapModelDownloader import android.app.Application import androidx.lifecycle.AndroidViewModel import androidx.lifecycle.viewModelScope @@ -205,7 +206,6 @@ import kotlinx.coroutines.CoroutineScope import kotlinx.coroutines.Dispatchers import kotlinx.coroutines.flow.onEach import kotlinx.coroutines.launch -import java.io.ByteArrayOutputStream class VLMViewModel(application: Application) : AndroidViewModel(application) { private val downloader = LeapModelDownloader(application) @@ -225,16 +225,16 @@ class VLMViewModel(application: Application) : AndroidViewModel(application) { val runner = runner ?: return viewModelScope.launch(Dispatchers.Default) { val bitmap = loadBitmapFromUri(imageUri) - val pngBytes = ByteArrayOutputStream().use { out -> - bitmap.compress(Bitmap.CompressFormat.PNG, 100, out) - out.toByteArray() - } + // ChatMessageContent.Image expects JPEG bytes — the secondary ctor wraps them in a + // `data:image/jpeg;base64,...` URL. Use the SDK's ImageUtils helper rather than + // re-encoding by hand. + val imageContent = ImageUtils.fromBitmap(bitmap, compressionQuality = 85) val conversation = runner.createConversation() val message = ChatMessage( role = ChatMessage.Role.USER, content = listOf( - ChatMessageContent.Image(pngBytes), + imageContent, ChatMessageContent.Text("Describe this image in detail."), ), ) @@ -336,41 +336,67 @@ fun ImageAnalysisDisplay(analysis: ImageAnalysis) { ### Interactive Q&A Mode -Allow users to ask questions about images: +Reuse the streaming pipeline above but parameterize the question. The image is encoded via `ImageUtils.fromBitmap(...)` (suspend, JPEG-encodes internally) and combined with the user's question into a single `ChatMessage`: ```kotlin -fun askQuestionAboutImage(bitmap: Bitmap, question: String): String { - return vlmModel.generateFromImage( - image = bitmap, - prompt = "Answer this question about the image: $question", - maxTokens = 150 +suspend fun askQuestionAboutImage( + runner: ModelRunner, + bitmap: Bitmap, + question: String, + options: GenerationOptions, +): String { + val conversation = runner.createConversation() + val message = ChatMessage( + role = ChatMessage.Role.USER, + content = listOf( + ImageUtils.fromBitmap(bitmap, compressionQuality = 85), + ChatMessageContent.Text("Answer this question about the image: $question"), + ), ) + + val builder = StringBuilder() + conversation.generateResponse(message, options).collect { response -> + if (response is MessageResponse.Chunk) builder.append(response.text) + } + return builder.toString() } -// Example usage -val answer1 = askQuestionAboutImage(bitmap, "What is the main object in this image?") -val answer2 = askQuestionAboutImage(bitmap, "What colors are prominent?") -val answer3 = askQuestionAboutImage(bitmap, "Is this indoors or outdoors?") +// Example usage (inside a coroutine): +// val answer = askQuestionAboutImage(runner, bitmap, "What colors are prominent?", options) ``` ### Memory Management -Vision models require more memory. Implement proper lifecycle handling: +Vision models require more memory. Free the runner when the activity goes to the background by calling `ModelRunner.unload()`: ```kotlin +class VLMViewModel(application: Application) : AndroidViewModel(application) { + private var runner: ModelRunner? = null + + suspend fun releaseModel() { + runner?.unload() + runner = null + } + + suspend fun initializeModel() { + if (runner != null) return // already loaded — don't re-download + // ...same loadModel(...) path as above; assign to runner + } +} + override fun onStop() { super.onStop() - // Release model when app goes to background to free memory - viewModel.releaseModel() + lifecycleScope.launch { viewModel.releaseModel() } } override fun onStart() { super.onStart() - // Reload model when app returns to foreground - viewModel.initializeModel() + lifecycleScope.launch { viewModel.initializeModel() } } ``` +`ModelRunner.unload()` is `suspend` (per `ai.liquid.leap.ModelRunner`), so call it from a coroutine scope. + ## Results The VLMExample demonstrates powerful image understanding capabilities: diff --git a/examples/android/web-content-summarizer.mdx b/examples/android/web-content-summarizer.mdx index c5dd6d2a..ab85ae3f 100644 --- a/examples/android/web-content-summarizer.mdx +++ b/examples/android/web-content-summarizer.mdx @@ -50,9 +50,9 @@ Before running this example, ensure you have the following: This example requires: - - **Minimum SDK**: API 24 (Android 7.0) - - **Target SDK**: API 34 or higher - - **Kotlin**: 1.9.0 or higher + - **Minimum SDK**: API 31 (Android 12) + - **Target SDK**: API 36 + - **Kotlin**: 2.3.0 or higher @@ -61,7 +61,8 @@ Before running this example, ensure you have the following: ```kotlin dependencies { // LeapSDK for AI processing (0.10.0+) - implementation("ai.liquid.leap:leap-sdk:0.10.6") + implementation("ai.liquid.leap:leap-sdk:0.10.7") + implementation("ai.liquid.leap:leap-model-downloader:0.10.7") // Networking for web scraping implementation("com.squareup.okhttp3:okhttp:4.12.0") @@ -212,7 +213,7 @@ import ai.liquid.leap.GenerationOptions import ai.liquid.leap.ModelRunner import ai.liquid.leap.message.ChatMessage import ai.liquid.leap.message.MessageResponse -import ai.liquid.leap.model_downloader.LeapModelDownloader +import ai.liquid.leap.downloader.LeapModelDownloader import kotlinx.coroutines.flow.onEach // Cache the runner on a ViewModel or singleton so the model loads once. @@ -238,7 +239,7 @@ suspend fun summarizeContent( } val out = StringBuilder() - conversation.generateResponse(ChatMessage.user(prompt), options) + conversation.generateResponse(ChatMessage(ChatMessage.Role.USER, prompt), options) .onEach { resp -> if (resp is MessageResponse.Chunk) out.append(resp.text) } diff --git a/leap/edge-sdk/overview.mdx b/leap/edge-sdk/overview.mdx deleted file mode 100644 index e0a43df5..00000000 --- a/leap/edge-sdk/overview.mdx +++ /dev/null @@ -1,38 +0,0 @@ ---- -title: "Overview" -description: "The LEAP Edge SDK is a native framework for running LFMs (and other open source models) on mobile devices." ---- - -## Improving access[​](#improving-access "Direct link to Improving access") - -Up until now, deploying small language models (SLMs) on mobile devices has been an extremely painful process, generally accessible to only inference engineers or AI/ML programmers. - -Written for Android (Kotlin) and iOS (Swift), the goal of the Edge SDK is to make SLM deployment as easy as calling a cloud LLM API endpoint - for any app developer. - -## Get started[​](#get-started "Direct link to Get started") - -Choose your platform to get started - - - - Get started with the LEAP Edge SDK for iOS using Swift. Deploy models directly in your iOS app. - - - - Get started with the LEAP Edge SDK for Android using Kotlin. Deploy models directly in your Android app. - - - -## Features[​](#features "Direct link to Features") - -The current list of main features includes: - -* Model downloading service -* Chat completion (generation) -* Constrained generation -* Function calling -* Gson support (Android) -* Image support (for LFM2-VL) - -We are consistently adding to this list - see our [changelog](/leap/changelog) for detailed updates. -