recipe: llama-cpp-python 0.3.32 by ndonkoHenri · Pull Request #91 · flet-dev/mobile-forge

ndonkoHenri · 2026-07-01T10:37:58Z

Closes flet-dev/flet#6627 — run local GGUF LLMs on-device with Flet.

llama-cpp-python is a scikit-build-core / CMake package that vendors the full llama.cpp engine; its Python layer is a pure-ctypes binding that loads the bundled libllama + libggml* shared libs. No separate flet-lib* recipe.

Recipe shape

CPU-only baseline: all GPU backends (GGML_METAL/CUDA/VULKAN/OPENCL/HIP/RPC), BLAS, Accelerate, OpenMP and LLAMAFILE off; GGML_NATIVE=OFF; LLAVA_BUILD=OFF (the multimodal mtmd surface is imported lazily, so text inference never needs it).
Android links libc++_shared (flet-libcpp-shared) and gets the 16 KB page-size flags; iOS uses the Unix Makefiles generator.
-DCMAKE_INSTALL_LIBDIR=llama_cpp/lib to merge llama.cpp's standard install into the package dir (drops a duplicate top-level lib/).

`mobile.patch` (4 parts)

Gate the CMakeLists Apple block to skip iOS — it FORCE-enables GGML_METAL (via CACHE … FORCE, so a -D can't override) and guesses the arch from uname -m; the iOS cross build is CPU-only with an explicit arch.
Skip llama.cpp's unused common helper lib (~5 MB).
Strip SONAME versioning from the shipped libs, so the wheel carries single unversioned files instead of a lib*.dylib → .0 → .0.15.3 symlink triplet (forge's packer dereferences those into 3 copies / colliding iOS frameworks). This cut the wheel from ~14 MB to ~1.7 MB.
Rewrite the ctypes loader to (a) find the lib under its iOS framework name (lib<name>.fwork) and on sys.platform == "ios", and (b) preload the ggml dependency chain with RTLD_GLOBAL — the bundled libs carry no RUNPATH, so the platform linker can't resolve siblings on its own.

Testing

Full 6-slice matrix builds green (iOS device / arm64-sim / x86_64-sim, Android arm64-v8a / x86_64 / armeabi-v7a), ~1.7–2.0 MB per wheel.

On-device (recipe-tester): import + native ctypes calls + real GGUF inference (SmolLM2-135M Q4) all pass on Android arm64 (emulator), and iOS arm64 simulator.

CI note

The Android job passes fully, including the on-device x86_64 emulator test.

The iOS-simulator on-device test currently fails — but not because of this recipe. It's a serious-python limitation: its darwin packaging only converts .so C-extensions into per-slice xcframeworks, so a ctypes-loaded .dylib ships the device build into the simulator app and fails to dlopen (incompatible platform (have 'iOS', need 'iOS-simulator')). The recipe's iOS wheels are correct (verified platform 7), and it was proven to run on the simulator.
Fix: companion PR flet-dev/serious-python#223 (fix/darwin-ctypes-dylib-xcframework) — with it, the iOS-sim CI test passes with zero change to this recipe.

Notes

GGUF weights are user-supplied at runtime (download-on-first-run); only ~1B–3B Q4 models are practical on a phone.

Runs local GGUF LLMs on-device (flet-dev/flet#6627). scikit-build-core / CMake package that vendors the full llama.cpp engine; the Python layer is a pure-ctypes binding that loads the bundled libllama/libggml* shared libs, so this is the duckdb archetype crossed with a pyzbar-style loader. CPU-only baseline: all GPU backends (Metal/CUDA/Vulkan/OpenCL/HIP/RPC), BLAS, Accelerate, OpenMP and LLAMAFILE are disabled, GGML_NATIVE=OFF, LLAVA_BUILD=OFF (the multimodal mtmd surface is imported lazily so text inference never needs it). Android links libc++_shared (flet-libcpp-shared) and gets the 16 KB page-size flags; iOS uses the Unix Makefiles generator. mobile.patch (4 parts): 1. Gate the CMakeLists Apple block to skip iOS — it FORCE-enables GGML_METAL (via CACHE...FORCE, which -DGGML_METAL=OFF can't override) and guesses the arch from `uname -m`; iOS is CPU-only with an explicit arch. 2. Skip llama.cpp's unused `common` helper lib (~5 MB). 3. Strip SONAME versioning from the shipped libs, so the wheel carries single unversioned files instead of a lib*.dylib -> .0 -> .0.15.3 symlink triplet (forge's packer dereferences those into 3 copies / colliding iOS frameworks). Cuts the wheel from ~14 MB to ~1.7 MB. 4. Rewrite the ctypes loader to (a) find the lib under its iOS framework name (lib<name>.fwork) and on sys.platform == "ios", and (b) preload the ggml dependency chain with RTLD_GLOBAL — the bundled libs carry no RUNPATH, so the platform linker can't resolve siblings on its own. Also -DCMAKE_INSTALL_LIBDIR=llama_cpp/lib to merge llama.cpp's standard install into the package dir (drops the duplicate top-level lib/). Full 6-slice matrix builds green (iOS device/arm64-sim/x86_64-sim, Android arm64-v8a/x86_64/armeabi-v7a). On-device validated end to end on Android arm64 and the iOS arm64 simulator: import + native calls + real GGUF inference (SmolLM2-135M Q4) via the recipe-tester app.

ndonkoHenri and others added 2 commits July 1, 2026 02:49

build number 1

92f4c79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

recipe: llama-cpp-python 0.3.32#91

recipe: llama-cpp-python 0.3.32#91
ndonkoHenri wants to merge 2 commits into
mainfrom
llama-cpp-python

ndonkoHenri commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ndonkoHenri commented Jul 1, 2026

Recipe shape

mobile.patch (4 parts)

Testing

CI note

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`mobile.patch` (4 parts)