Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16463
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New Failures, 3 Unrelated FailuresAs of commit ff8ae0b with merge base 913436a ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@larryliu0820 I added a static LLM runner based on the APIs in extension/llm. Can you have a look and give feedback? |
|
@JacobSzwejbka are you able to review this while Mengwei is out? Or is there a better person? |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Summary
Add a C++ runner for static attention CoreML LLM models exported with
export_static_llm_coreml.py. This runner:TextDecoderRunnerfromexecutorch/extension/llm/runner/text_decoder_runner.hStaticAttentionIOManagerfromexecutorch/examples/models/llama/runner/static_attention_io_manager.hfor KV cache managementNew files:
examples/apple/coreml/llama/runner/static_llm_runner.h- Runner header withStaticLLMConfig,StaticLLMIOManager,StaticLLMTextDecoderRunner, andStaticLLMRunnerclassesexamples/apple/coreml/llama/runner/static_llm_runner.cpp- Runner implementationexamples/apple/coreml/llama/runner/main.cpp- CLI entry point with gflagsexamples/apple/coreml/llama/runner/CMakeLists.txt- CMake build configurationexamples/apple/coreml/llama/runner/build_and_run.sh- Build and run helper scriptModified files:
CMakeLists.txt- Add subdirectory for static LLM CoreML runner (whenEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER,EXECUTORCH_BUILD_COREML, andAPPLEare enabled)examples/apple/coreml/llama/export_static_llm_coreml.py- Add--cpu_onlyflag for CI testing (ANE not accessible in CI) and--no_generate_full_logitsflag for more efficient models.ci/scripts/test_ane_static_llama.sh- Build and test the C++ runner in CIKnown issue: Lookahead decoding currently produces incorrect output (
<unk>tokens) for stories110M, but does work for llama1B. This will be addressed in a follow-up PR.Test plan
CI script
.ci/scripts/test_ane_static_llama.shtests: