Add C++ static runner for CoreML by metascroy · Pull Request #16463 · pytorch/executorch

metascroy · 2026-01-06T01:36:48Z

Summary

Add a C++ runner for static attention CoreML LLM models exported with export_static_llm_coreml.py. This runner:

Extends TextDecoderRunner from executorch/extension/llm/runner/text_decoder_runner.h
Uses existing StaticAttentionIOManager from executorch/examples/models/llama/runner/static_attention_io_manager.h for KV cache management
Auto-detects model configuration (input_len, cache_len, n_layers, n_kv_heads, head_dim, vocab_size, generate_full_logits) from model metadata
Supports both regular greedy decoding and lookahead (speculative) decoding
Processes prompts in chunks during prefill

New files:

examples/apple/coreml/llama/runner/static_llm_runner.h - Runner header with StaticLLMConfig, StaticLLMIOManager, StaticLLMTextDecoderRunner, and StaticLLMRunner classes
examples/apple/coreml/llama/runner/static_llm_runner.cpp - Runner implementation
examples/apple/coreml/llama/runner/main.cpp - CLI entry point with gflags
examples/apple/coreml/llama/runner/CMakeLists.txt - CMake build configuration
examples/apple/coreml/llama/runner/build_and_run.sh - Build and run helper script

Modified files:

CMakeLists.txt - Add subdirectory for static LLM CoreML runner (when EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER, EXECUTORCH_BUILD_COREML, and APPLE are enabled)
examples/apple/coreml/llama/export_static_llm_coreml.py - Add --cpu_only flag for CI testing (ANE not accessible in CI) and --no_generate_full_logits flag for more efficient models
.ci/scripts/test_ane_static_llama.sh - Build and test the C++ runner in CI

Known issue: Lookahead decoding currently produces incorrect output (<unk> tokens) for stories110M, but does work for llama1B. This will be addressed in a follow-up PR.

Test plan

CI script .ci/scripts/test_ane_static_llama.sh tests:

Export static ANE model and CPU-only model
Build C++ runner with CMake/Ninja
Run regular decoding and validate output contains expected prefix "Once upon a time, there was"
Run lookahead decoding (runs without crashing, but output is incorrect - known issue)

pytorch-bot · 2026-01-06T01:36:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16463

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Unrelated Failures

As of commit ff8ae0b with merge base 913436a ():

NEW FAILURES - The following jobs have failed:

Apple / build-benchmark-app / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 65
Apple / build-frameworks-ios / macos-job (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 65
Build Presets / apple (ios) / build (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 65
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 6c871d30a664c4b88a0c3fc70dd3ed7e77195087d7bb92d5fc051108f6284464 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 729ab886adb1d23e3bb62f972e017b8c35348117738d67d2e32b2a0b9fb7c265 /exec failed with exit code 1
trunk / test-qnn-optimum-model (fp32, focalnet) / linux-job (gh)
RuntimeError: Command docker exec -t f2cc5d0688eb206c30752686b40a24d94dd18c8a6c81b9bbf6f9095a5afcf25a /exec failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-qnn-models-linux / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128
Test CUDA Builds / test-models-cuda (resnet18) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-06T01:37:40Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

metascroy · 2026-01-06T01:37:50Z

@larryliu0820 I added a static LLM runner based on the APIs in extension/llm. Can you have a look and give feedback?

metascroy · 2026-01-07T21:37:45Z

@JacobSzwejbka are you able to review this while Mengwei is out? Or is there a better person?

github-actions · 2026-03-09T00:56:03Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Add C++ static runner for CoreML

aa58075

metascroy requested review from cccclai, kirklandsign and larryliu0820 as code owners January 6, 2026 01:36

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 6, 2026

metascroy added the ciflow/trunk label Jan 6, 2026

metascroy added 5 commits January 6, 2026 15:17

up

d014e5f

up

7bb8149

up

a2637c1

up

14302bf

up

ff8ae0b

metascroy mentioned this pull request Jan 7, 2026

CoreML sometimes produces garbage output on cached models #16492

Open

metascroy requested a review from JacobSzwejbka January 7, 2026 21:37

metascroy mentioned this pull request Feb 11, 2026

Qualcomm AI Engine Direct - [Multimodal] Muti-turn VLM conversation #17308

Merged

github-actions Bot added the stale PRs inactive for over 60 days label Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add C++ static runner for CoreML#16463

Add C++ static runner for CoreML#16463
metascroy wants to merge 6 commits intomainfrom
add-static-runner

metascroy commented Jan 6, 2026

Uh oh!

pytorch-bot Bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 6, 2026

Uh oh!

metascroy commented Jan 6, 2026

Uh oh!

metascroy commented Jan 7, 2026

Uh oh!

github-actions Bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

metascroy commented Jan 6, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16463

❌ 6 New Failures, 3 Unrelated Failures

Uh oh!

github-actions Bot commented Jan 6, 2026

This PR needs a release notes: label

Uh oh!

metascroy commented Jan 6, 2026

Uh oh!

metascroy commented Jan 7, 2026

Uh oh!

github-actions Bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented Jan 6, 2026 •

edited

Loading

This PR needs a `release notes:` label