Skip to content

Commit 4a6594d

Browse files
committed
Qualcomm AI Engine Direct - Python API Refactor
1 parent ca2a616 commit 4a6594d

75 files changed

Lines changed: 2064 additions & 2598 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.ci/scripts/test_qnn_static_llm.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,11 +47,11 @@ if [[ "${TASK_NAME}" == "stories_110m" ]]; then
4747
$PYTHON_EXECUTABLE -m pytorch_tokenizers.tools.llama2c.convert -t tokenizer.model -o tokenizer.bin
4848

4949
# Compile only as weight sharing is not applicable on x86.
50-
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --model SM8650 --build_folder build-android/ --executorch_root . --artifact_dir ./stories_110m_pte_size --llama_artifacts . --compile_only
50+
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --soc_model SM8650 --build_folder build-android/ --executorch_root . --artifact_dir ./stories_110m_pte_size --llama_artifacts . --compile_only
5151
exit_code1=$?
5252

5353
# Checks accuracy with weight sharing disabled since x86 does not support weight sharing.
54-
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./stories_110m_accuracy --llama_artifacts . --enable_x86_64
54+
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --soc_model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./stories_110m_accuracy --llama_artifacts . --enable_x86_64
5555
exit_code2=$?
5656

5757
# Check the exit codes and print messages
@@ -84,7 +84,7 @@ elif [[ "${TASK_NAME}" == "smollm2_135m" ]]; then
8484
if [ -n "$2" ]; then
8585
EXTRA_FLAGS="$EXTRA_FLAGS --static_llm_eval_method $2"
8686
fi
87-
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_llm_model --model_name smollm2_135m --model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./static_smollm2 --enable_x86_64 $EXTRA_FLAGS
87+
$PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_static_llm_model --model_name smollm2_135m --soc_model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir ./static_smollm2 --enable_x86_64 $EXTRA_FLAGS
8888
exit_code1=$?
8989
if [ $exit_code1 -ne 0 ]; then
9090
exit 1

backends/qualcomm/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import os
22

3+
import torch
4+
35
from .scripts.download_qnn_sdk import install_qnn_sdk, is_linux_x86
46

57

@@ -11,3 +13,4 @@
1113
ok = install_qnn_sdk()
1214
if not ok:
1315
raise RuntimeError("Failed to install QNN SDK. Please check the logs above.")
16+
torch.backends.mkldnn.enabled = False

backends/qualcomm/bc/test_qnn_static_llama_bc.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@ touch ${llama_artifacts}/params.json
2727
echo '{"dim": 64, "n_layers": 5, "n_heads": 8, "n_kv_heads": 4, "vocab_size": 512, "multiple_of": 4, "max_seq_len": 512}' > ${llama_artifacts}/params.json
2828

2929
# Checks e2e accuracy
30-
expected=$($PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_260k --model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir . --llama_artifacts $llama_artifacts --enable_x86_64 | grep "Model CI result:")
30+
expected=$($PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_260k --soc_model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir . --llama_artifacts $llama_artifacts --enable_x86_64 | grep "Model CI result:")
3131
exit_code1=$?
3232

3333
# Checks accuracy with precompiled
34-
output=$($PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_260k --model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir $PTE_ARTIFACT --llama_artifacts $llama_artifacts --enable_x86_64 --pre_gen_pte $PTE_ARTIFACT | grep "Model CI result:")
34+
output=$($PYTHON_EXECUTABLE backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_260k --soc_model SM8650 --build_folder build-x86/ --executorch_root . --artifact_dir $PTE_ARTIFACT --llama_artifacts $llama_artifacts --enable_x86_64 --pre_gen_pte $PTE_ARTIFACT | grep "Model CI result:")
3535
exit_code2=$?
3636

3737
if [[ "$output" == "$expected" ]]; then

backends/qualcomm/builders/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,11 @@ class MyModel(torch.nn.Module):
4141
```
4242
At the time we try to lower it with Qualcomm backend:
4343
```python
44-
from executorch.examples.qualcomm.utils import build_executorch_binary
44+
from executorch.backends.qualcomm.export_utils import build_executorch_binary
4545

4646
build_executorch_binary(
4747
model=MyModel(),
48-
inputs=(torch.randn(200, 768),),
49-
soc_model="SM8650"
48+
qnn_config=qnn_config,
5049
file_name="my_model",
5150
dataset=None,
5251
)

backends/qualcomm/debugger/README.md

Lines changed: 12 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,14 @@ To enable model visualization, please add the `--online_prepare` flag.
3131
## Details
3232
### 1. Lower to QNN backend
3333
Generate an ExecuTorch binary for Qualcomm platforms.
34+
Ensure that qnn_config.profile_level is set to 3, which will generate op_trace.
3435
```python
36+
qnn_config.profile_level = 3
3537
build_executorch_binary(
36-
model,
37-
example_input,
38-
args.model,
39-
f"{args.artifact}/{pte_filename}",
40-
[example_input],
38+
model=model,
39+
qnn_config=qnn_config,
40+
file_name=f"{args.artifact}/{pte_filename}",
41+
dataset=[example_input],
4142
quant_dtype=QuantDtype.use_8a8w,
4243
online_prepare=args.online_prepare,
4344
optrace=True,
@@ -47,14 +48,9 @@ build_executorch_binary(
4748
Generate optrace and QHAS files using QNN tools under $QNN_SDK_ROOT. After finishing, you will get a `binaries_trace` dictionary.
4849
``` python
4950
adb = SimpleADB(
50-
qnn_sdk=os.getenv("QNN_SDK_ROOT"),
51-
build_path=f"{args.build_folder}",
51+
qnn_config=qnn_config,
5252
pte_path=f"{args.artifact}/{pte_filename}.pte",
53-
workspace=f"/data/local/tmp/executorch/{pte_filename}",
54-
device_id=args.device,
55-
host_id=args.host,
56-
soc_model=args.model,
57-
target=args.target,
53+
workspace=f"/data/local/tmp/executorch/{pte_filename},
5854
)
5955
binaries_trace = generate_optrace(
6056
args, adb, f"{args.artifact}/{pte_filename}.pte", example_input
@@ -139,42 +135,23 @@ When executing the script, please add the flag `--dump_intermediate_outputs`. Th
139135
Initialize a `QNNIntermediateDebugger`. Please pass initialized `QNNIntermediateDebugger` and the `args.dump_intermediate_outputs` to `build_executorch_binary` method as well.
140136
#### Example:
141137
```python
142-
from executorch.examples.qualcomm.utils import build_executorch_binary
138+
from executorch.backends.qualcomm.export_utils import build_executorch_binary
143139
from executorch.backends.qualcomm.debugger.qnn_intermediate_debugger import QNNIntermediateDebugger
144140

145141
qnn_intermediate_debugger = QNNIntermediateDebugger()
146142
build_executorch_binary(
147143
model=MyModel(),
148-
inputs=(torch.randn(200, 768),),
149-
soc_model="SM8650",
144+
qnn_config=qnn_config,
150145
file_name="my_model",
151146
dataset=my_dataset,
152-
dump_intermediate_outputs=args.dump_intermediate_outputs, # Add this flag
153-
qnn_intermediate_debugger=qnn_intermediate_debugger, # Add this flag
147+
qnn_intermediate_debugger=qnn_intermediate_debugger, # Provide this param
154148
)
155149
```
156150

157151
### 4. Set data num to 1
158152
It is perfectly fine for users to pass the desired amount of datasets to `build_executorch_binary`, which helps achieve better quantization results. However, after `build_executorch_binary` is called, we need to ensure that we only perform one inference during execution. Please ensure that CPU and QNN is using the same input during execution; otherwise, the debugging results might not be accurate.
159153

160-
### 5. Pass flag to SimpleADB
161-
When creating `SimpleADB`, please also pass the flag `args.dump_intermediate_outputs`. This tells the runner to create files that store the intermediate output schema and binary data.
162-
#### Example:
163-
```python
164-
adb = SimpleADB(
165-
qnn_sdk=os.getenv("QNN_SDK_ROOT"),
166-
build_path=f"{args.build_folder}",
167-
pte_path=f"{args.artifact}/{pte_filename}.pte",
168-
workspace=f"/data/local/tmp/executorch/{pte_filename}",
169-
device_id=args.device,
170-
host_id=args.host,
171-
soc_model=args.model,
172-
shared_buffer=args.shared_buffer,
173-
dump_intermediate_outputs=args.dump_intermediate_outputs, # Add this flag
174-
)
175-
```
176-
177-
### 6: Pull and process the results.
154+
### 5: Pull and process the results.
178155
After QNN execution with the runner, if the previous steps are done correctly, we should be able to get two files: `etdump.etdp` and `debug_output.bin`.
179156
The following example pulls the files back and calls a callback function to process the results. In this callback function, we create the `Inspector`. Then we perform CPU inference to get CPU intermediate results. Now, we have both QNN and CPU intermediate results, we can start generating results to compare the accuracy. Taking the following example, we should be able to get `debug_graph.svg` as an output in the current directory.
180157
#### Example:

0 commit comments

Comments
 (0)