Skip to content

Add C++ static runner for CoreML#16463

Open
metascroy wants to merge 6 commits intomainfrom
add-static-runner
Open

Add C++ static runner for CoreML#16463
metascroy wants to merge 6 commits intomainfrom
add-static-runner

Conversation

@metascroy
Copy link
Copy Markdown
Contributor

Summary

Add a C++ runner for static attention CoreML LLM models exported with export_static_llm_coreml.py. This runner:

  • Extends TextDecoderRunner from executorch/extension/llm/runner/text_decoder_runner.h
  • Uses existing StaticAttentionIOManager from executorch/examples/models/llama/runner/static_attention_io_manager.h for KV cache management
  • Auto-detects model configuration (input_len, cache_len, n_layers, n_kv_heads, head_dim, vocab_size, generate_full_logits) from model metadata
  • Supports both regular greedy decoding and lookahead (speculative) decoding
  • Processes prompts in chunks during prefill

New files:

  • examples/apple/coreml/llama/runner/static_llm_runner.h - Runner header with StaticLLMConfig, StaticLLMIOManager, StaticLLMTextDecoderRunner, and StaticLLMRunner classes
  • examples/apple/coreml/llama/runner/static_llm_runner.cpp - Runner implementation
  • examples/apple/coreml/llama/runner/main.cpp - CLI entry point with gflags
  • examples/apple/coreml/llama/runner/CMakeLists.txt - CMake build configuration
  • examples/apple/coreml/llama/runner/build_and_run.sh - Build and run helper script

Modified files:

  • CMakeLists.txt - Add subdirectory for static LLM CoreML runner (when EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER, EXECUTORCH_BUILD_COREML, and APPLE are enabled)
  • examples/apple/coreml/llama/export_static_llm_coreml.py - Add --cpu_only flag for CI testing (ANE not accessible in CI) and --no_generate_full_logits flag for more efficient models
  • .ci/scripts/test_ane_static_llama.sh - Build and test the C++ runner in CI

Known issue: Lookahead decoding currently produces incorrect output (<unk> tokens) for stories110M, but does work for llama1B. This will be addressed in a follow-up PR.

Test plan

CI script .ci/scripts/test_ane_static_llama.sh tests:

  1. Export static ANE model and CPU-only model
  2. Build C++ runner with CMake/Ninja
  3. Run regular decoding and validate output contains expected prefix "Once upon a time, there was"
  4. Run lookahead decoding (runs without crashing, but output is incorrect - known issue)

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jan 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16463

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 3 Unrelated Failures

As of commit ff8ae0b with merge base 913436a (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 6, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@metascroy
Copy link
Copy Markdown
Contributor Author

@larryliu0820 I added a static LLM runner based on the APIs in extension/llm. Can you have a look and give feedback?

@metascroy
Copy link
Copy Markdown
Contributor Author

@JacobSzwejbka are you able to review this while Mengwei is out? Or is there a better person?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 9, 2026

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the stale PRs inactive for over 60 days label Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale PRs inactive for over 60 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant