recipe: llama-cpp-python 0.3.32#91
Open
ndonkoHenri wants to merge 2 commits into
Open
Conversation
Runs local GGUF LLMs on-device (flet-dev/flet#6627). scikit-build-core / CMake package that vendors the full llama.cpp engine; the Python layer is a pure-ctypes binding that loads the bundled libllama/libggml* shared libs, so this is the duckdb archetype crossed with a pyzbar-style loader. CPU-only baseline: all GPU backends (Metal/CUDA/Vulkan/OpenCL/HIP/RPC), BLAS, Accelerate, OpenMP and LLAMAFILE are disabled, GGML_NATIVE=OFF, LLAVA_BUILD=OFF (the multimodal mtmd surface is imported lazily so text inference never needs it). Android links libc++_shared (flet-libcpp-shared) and gets the 16 KB page-size flags; iOS uses the Unix Makefiles generator. mobile.patch (4 parts): 1. Gate the CMakeLists Apple block to skip iOS — it FORCE-enables GGML_METAL (via CACHE...FORCE, which -DGGML_METAL=OFF can't override) and guesses the arch from `uname -m`; iOS is CPU-only with an explicit arch. 2. Skip llama.cpp's unused `common` helper lib (~5 MB). 3. Strip SONAME versioning from the shipped libs, so the wheel carries single unversioned files instead of a lib*.dylib -> .0 -> .0.15.3 symlink triplet (forge's packer dereferences those into 3 copies / colliding iOS frameworks). Cuts the wheel from ~14 MB to ~1.7 MB. 4. Rewrite the ctypes loader to (a) find the lib under its iOS framework name (lib<name>.fwork) and on sys.platform == "ios", and (b) preload the ggml dependency chain with RTLD_GLOBAL — the bundled libs carry no RUNPATH, so the platform linker can't resolve siblings on its own. Also -DCMAKE_INSTALL_LIBDIR=llama_cpp/lib to merge llama.cpp's standard install into the package dir (drops the duplicate top-level lib/). Full 6-slice matrix builds green (iOS device/arm64-sim/x86_64-sim, Android arm64-v8a/x86_64/armeabi-v7a). On-device validated end to end on Android arm64 and the iOS arm64 simulator: import + native calls + real GGUF inference (SmolLM2-135M Q4) via the recipe-tester app.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes flet-dev/flet#6627 — run local GGUF LLMs on-device with Flet.
llama-cpp-pythonis a scikit-build-core / CMake package that vendors the fullllama.cppengine; its Python layer is a pure-ctypesbinding that loads the bundledlibllama+libggml*shared libs. No separateflet-lib*recipe.Recipe shape
GGML_METAL/CUDA/VULKAN/OPENCL/HIP/RPC), BLAS, Accelerate, OpenMP and LLAMAFILE off;GGML_NATIVE=OFF;LLAVA_BUILD=OFF(the multimodalmtmdsurface is imported lazily, so text inference never needs it).libc++_shared(flet-libcpp-shared) and gets the 16 KB page-size flags; iOS uses the Unix Makefiles generator.-DCMAKE_INSTALL_LIBDIR=llama_cpp/libto merge llama.cpp's standard install into the package dir (drops a duplicate top-levellib/).mobile.patch(4 parts)FORCE-enablesGGML_METAL(viaCACHE … FORCE, so a-Dcan't override) and guesses the arch fromuname -m; the iOS cross build is CPU-only with an explicit arch.commonhelper lib (~5 MB).lib*.dylib → .0 → .0.15.3symlink triplet (forge's packer dereferences those into 3 copies / colliding iOS frameworks). This cut the wheel from ~14 MB to ~1.7 MB.lib<name>.fwork) and onsys.platform == "ios", and (b) preload the ggml dependency chain withRTLD_GLOBAL— the bundled libs carry no RUNPATH, so the platform linker can't resolve siblings on its own.Testing
Full 6-slice matrix builds green (iOS device / arm64-sim / x86_64-sim, Android arm64-v8a / x86_64 / armeabi-v7a), ~1.7–2.0 MB per wheel.
On-device (recipe-tester): import + native
ctypescalls + real GGUF inference (SmolLM2-135M Q4) all pass on Android arm64 (emulator), and iOS arm64 simulator.CI note
The Android job passes fully, including the on-device x86_64 emulator test.
The iOS-simulator on-device test currently fails — but not because of this recipe. It's a serious-python limitation: its darwin packaging only converts
.soC-extensions into per-slice xcframeworks, so actypes-loaded.dylibships the device build into the simulator app and fails todlopen(incompatible platform (have 'iOS', need 'iOS-simulator')). The recipe's iOS wheels are correct (verifiedplatform 7), and it was proven to run on the simulator.Fix: companion PR flet-dev/serious-python#223 (
fix/darwin-ctypes-dylib-xcframework) — with it, the iOS-sim CI test passes with zero change to this recipe.Notes