Skip to content

Commit 285c644

Browse files
committed
docs: Update ML library versions and documentation
This commit updates the ML library versions and documentation - Update ARM Compute Library from 24.12 to 52.7.0 - Update Arm NN from 24.11 to 26.01 - Update NNStreamer from 2.4.2 to 2.6.0 - Update ONNX Runtime from 1.20.1 to 1.23.2 - Update TensorFlow Lite from 2.18.0 to 2.20.0 - Refresh all test outputs and benchmark results - Add ML components to AM62DX documentation TOC - Update component table with latest library information Signed-off-by: Pratham Deshmukh <p-deshmukh@ti.com>
1 parent 6448b30 commit 285c644

9 files changed

Lines changed: 102 additions & 94 deletions

File tree

configs/AM62DX/AM62DX_linux_toc.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,12 @@ linux/Foundational_Components/Kernel/Kernel_Drivers/UART
6262
linux/Foundational_Components/Kernel/Kernel_Drivers/UBIFS
6363
linux/Foundational_Components/Kernel/Kernel_Drivers/VTM
6464
linux/Foundational_Components/Kernel/Kernel_Drivers/Watchdog
65+
linux/Foundational_Components_Machine_Learning
66+
linux/Foundational_Components/Machine_Learning/arm_compute_library
67+
linux/Foundational_Components/Machine_Learning/armnn
68+
linux/Foundational_Components/Machine_Learning/nnstreamer
69+
linux/Foundational_Components/Machine_Learning/onnxrt
70+
linux/Foundational_Components/Machine_Learning/tflite
6571

6672
#linux/Foundational_Components_Power_Management
6773

71.7 KB
Loading
-40.3 KB
Binary file not shown.

source/linux/Foundational_Components/Machine_Learning/arm_compute_library.rst

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Exact list of functions can be found at https://www.arm.com/products/development
1010
Supported versions
1111
------------------
1212

13-
- ARM Compute Library 24.12
13+
- ARM Compute Library 52.7.0
1414

1515
Arm Compute Library Testing
1616
---------------------------
@@ -19,10 +19,10 @@ Arm Compute Libraries, tests, and sample executables are included in the SDK fil
1919

2020
.. code-block:: console
2121
22-
root@am62xx-evm:~# LD_LIBRARY_PATH=/usr/lib/tests/ /usr/lib/tests/arm_compute_validation
23-
Version = 32bcced2af7feea6969dd1d22e58d0718dc488e3
24-
CommandLine = /usr/lib/tests/arm_compute_validation
25-
Seed = 3778037091
22+
root@am62xx-evm:~# LD_LIBRARY_PATH=/usr/bin/arm-compute-library-52.7.0/tests/ /usr/bin/arm-compute-library-52.7.0/tests/arm_compute_validation
23+
Version = c9a1fff898abd5109b759e8e16616519dc758fdd
24+
CommandLine = /usr/bin/arm-compute-library-52.7.0/tests/arm_compute_validation
25+
Seed = 165977448
2626
cpu_has_sve = false
2727
cpu_has_sve2 = false
2828
cpu_has_svef32mm = false
@@ -34,22 +34,23 @@ Arm Compute Libraries, tests, and sample executables are included in the SDK fil
3434
cpu_has_bf16 = false
3535
cpu_has_dotprod = false
3636
cpu_has_i8mm = false
37+
cpu_has_fhm = false
3738
CPU0 = A53
3839
CPU1 = A53
3940
CPU2 = A53
4041
CPU3 = A53
4142
Iterations = 1
4243
Threads = 1
4344
Dataset mode = PRECOMMIT
44-
Running [0] 'UNIT/CPPScheduler/RethrowException'
45-
Wall clock/Wall clock time: AVG=3466.0000 us
45+
Running [0] 'UNIT/DataTypeUtils/CheckDataTypeIsPrinted@DataType=QSYMM8'
46+
Wall clock/Wall clock time: AVG=3.0000 us
4647
4748
4849
.. code-block:: console
4950
50-
root@am62xx-evm:~# /usr/bin/arm-compute-library-24.12/examples/graph_alexnet
51+
root@am62xx-evm:~# /usr/bin/arm-compute-library-52.7.0/examples/graph_alexnet
5152
52-
/usr/bin/arm-compute-library-24.12/examples/graph_alexnet
53+
/usr/bin/arm-compute-library-52.7.0/examples/graph_alexnet
5354
5455
Threads : 1
5556
Target : Neon
@@ -58,8 +59,8 @@ Arm Compute Libraries, tests, and sample executables are included in the SDK fil
5859
Tuner enabled? : false
5960
Cache enabled? : false
6061
Tuner mode : Normal
61-
Tuner file :
62-
MLGO file :
62+
Tuner file :
63+
MLGO file :
6364
Fast math enabled? : false
6465
6566
Test passed
@@ -69,16 +70,17 @@ Sample NN related executables (using Arm Compute Library only):
6970

7071
.. code-block:: console
7172
72-
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-24.12/examples/graph_*
73-
graph_alexnet graph_inception_v4 graph_resnext50 graph_vgg19
74-
graph_deepspeech_v0_4_1 graph_lenet graph_shufflenet graph_vgg_vdsr
75-
graph_edsr graph_mobilenet graph_squeezenet graph_yolov3
76-
graph_googlenet graph_mobilenet_v2 graph_squeezenet_v1_1
77-
graph_inception_resnet_v1 graph_resnet12 graph_srcnn955
78-
graph_inception_resnet_v2 graph_resnet50 graph_ssd_mobilenet
79-
graph_inception_v3 graph_resnet_v2_50 graph_vgg16
73+
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-52.7.0/examples/graph_*
74+
graph_alexnet graph_lenet graph_squeezenet
75+
graph_deepspeech_v0_4_1 graph_mobilenet graph_squeezenet_v1_1
76+
graph_edsr graph_mobilenet_v2 graph_srcnn955
77+
graph_googlenet graph_resnet12 graph_ssd_mobilenet
78+
graph_inception_resnet_v1 graph_resnet50 graph_vgg16
79+
graph_inception_resnet_v2 graph_resnet_v2_50 graph_vgg19
80+
graph_inception_v3 graph_resnext50 graph_vgg_vdsr
81+
graph_inception_v4 graph_shufflenet graph_yolov3
8082
8183
.. code-block:: console
8284
83-
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-24.12/examples/neon_*
85+
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-52.7.0/examples/neon_*
8486
neon_cnn neon_copy_objects neon_gemm_qasymm8 neon_gemm_s8_f32 neon_permute neon_scale neon_sgemm

source/linux/Foundational_Components/Machine_Learning/armnn.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ in conjunction with the TIDL TensorFlow Lite Delegate.
2323
Supported versions
2424
------------------
2525

26-
- Arm NN 24.11
26+
- Arm NN 26.01

source/linux/Foundational_Components/Machine_Learning/nnstreamer.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,15 @@ https://nnstreamer.ai/
1212
Supported versions
1313
------------------
1414

15-
- NNStreamer 2.4.2
15+
- NNStreamer 2.6.0
1616

1717
Testing NNStreamer
1818
------------------
1919

2020
.. code-block:: console
2121
2222
root@am62xx-evm:~# nnstreamer-check
23-
NNStreamer version: 2.4.2
23+
NNStreamer version: 2.6.0
2424
loaded : TRUE
2525
path : /usr/lib/gstreamer-1.0/libnnstreamer.so
2626
...

source/linux/Foundational_Components/Machine_Learning/onnxrt.rst

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ https://onnxruntime.ai/
1818
Supported version
1919
-----------------
2020

21-
- ONNX Runtime 1.20.1
21+
- ONNX Runtime 1.23.2
2222

2323
ONNX Runtime test applications
2424
------------------------------
@@ -34,7 +34,7 @@ Running benchmark_model
3434
usage: perf_test [options...] model_path [result_file]
3535
Options:
3636
-m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'.
37-
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
37+
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
3838
-M: Disable memory pattern.
3939
-A: Disable memory arena
4040
-I: Generate tensor input binding (Free dimensions are treated as 1.)
@@ -55,19 +55,19 @@ Running benchmark_model
5555
-o [optimization level]: Default is 99 (all). Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).
5656
Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels.
5757
-u [optimized_model_path]: Specify the optimized model path for saving.
58-
-d [CUDA only][cudnn_conv_algorithm]: Specify CUDNN convolution algorithms: 0(benchmark), 1(heuristic), 2(default).
59-
-q [CUDA only] use separate stream for copy.
58+
-d [CUDA only][cudnn_conv_algorithm]: Specify CUDNN convolution algorithms: 0(benchmark), 1(heuristic), 2(default).
59+
-q [CUDA only] use separate stream for copy.
6060
-z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals.
61-
-C: Specify session configuration entries as key-value pairs: -C "<key1>|<value1> <key2>|<value2>"
62-
Refer to onnxruntime_session_options_config_keys.h for valid keys and values.
63-
[Example] -C "session.disable_cpu_ep_fallback|1 ep.context_enable|1"
64-
-i: Specify EP specific runtime options as key value pairs. Different runtime options available are:
61+
-C: Specify session configuration entries as key-value pairs: -C "<key1>|<value1> <key2>|<value2>"
62+
Refer to onnxruntime_session_options_config_keys.h for valid keys and values.
63+
[Example] -C "session.disable_cpu_ep_fallback|1 ep.context_enable|1"
64+
-i: Specify EP specific runtime options as key value pairs. Different runtime options available are:
6565
[Usage]: -e <provider_name> -i '<key1>|<value1> <key2>|<value2>'
6666
67-
[ACL only] [enable_fast_math]: Options: 'true', 'false', default: 'false',
67+
[ACL only] [enable_fast_math]: Options: 'true', 'false', default: 'false',
6868
6969
-T [Set intra op thread affinities]: Specify intra op thread affinity string
70-
[Example]: -T 1,2;3,4;5,6 or -T 1-2;3-4;5-6
70+
[Example]: -T 1,2;3,4;5,6 or -T 1-2;3-4;5-6
7171
Use semicolon to separate configuration between threads.
7272
E.g. 1,2;3,4;5,6 specifies affinities for three threads, the first thread will be attached to the first and second logical processor.
7373
The number of affinities must be equal to intra_op_num_threads - 1
@@ -84,22 +84,22 @@ Example of running *onnxruntime_perf_test* on target using the pre-installed mob
8484
.. code-block:: console
8585
8686
# /usr/bin/onnxruntime-tests/onnxruntime_perf_test -I -m times -r 8 -e acl -P /usr/bin/onnxruntime-tests/testdata/mobilenet_v3_small_excerpt.onnx
87-
Session creation time cost: 0.0273071 s
88-
First inference time cost: 20 ms
89-
Total inference time cost: 0.14188 s
87+
Session creation time cost: 0.139671 s
88+
First inference time cost: 15 ms
89+
Total inference time cost: 0.126396 s
9090
Total inference requests: 8
91-
Average inference time cost: 17.735 ms
92-
Total inference run time: 0.141991 s
93-
Number of inferences per second: 56.3415
94-
Avg CPU usage: 98 %
95-
Peak working set size: 35299328 bytes
96-
Avg CPU usage:98
97-
Peak working set size:35299328
91+
Average inference time cost: 15.7995 ms
92+
Total inference run time: 0.126518 s
93+
Number of inferences per second: 63.232
94+
Avg CPU usage: 100 %
95+
Peak working set size: 37994496 bytes
96+
Avg CPU usage:100
97+
Peak working set size:37994496
9898
Runs:8
99-
Min Latency: 0.0159831 s
100-
Max Latency: 0.0232702 s
101-
P50 Latency: 0.0167086 s
102-
P90 Latency: 0.0232702 s
103-
P95 Latency: 0.0232702 s
104-
P99 Latency: 0.0232702 s
105-
P999 Latency: 0.0232702 s
99+
Min Latency: 0.00955697 s
100+
Max Latency: 0.0239688 s
101+
P50 Latency: 0.0156388 s
102+
P90 Latency: 0.0239688 s
103+
P95 Latency: 0.0239688 s
104+
P99 Latency: 0.0239688 s
105+
P999 Latency: 0.0239688 s

source/linux/Foundational_Components/Machine_Learning/tflite.rst

Lines changed: 20 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ It supports on-device inference with low latency and a compact binary size. You
1818
Features
1919
********
2020

21-
- TensorFlow Lite v2.18.0 via Yocto - `meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.18.0.bb <https://web.git.yoctoproject.org/meta-arago/tree/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.18.0.bb?h=11.00.09>`__
21+
- TensorFlow Lite v2.20.0 via Yocto - `meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.20.0.bb <https://web.git.yoctoproject.org/meta-arago/tree/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.18.0.bb?h=11.00.09>`__
2222
- Multithreaded computation with acceleration using Arm Neon SIMD instructions on Cortex-A cores
2323
- C++ Library and Python interpreter (supported Python version 3)
2424
- TensorFlow Lite Model benchmark Tool (i.e. :command:`benchmark_model`)
@@ -89,23 +89,21 @@ The output of the benchmarking application should be similar to:
8989
root@am62xx-evm:~# /opt/tensorflow-lite/tools/benchmark_model --graph=/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite --num_threads=4 --use_xnnpack=false
9090
INFO: STARTING!
9191
INFO: Log parameter values verbosely: [0]
92-
INFO: Num threads: [4]
9392
INFO: Graph: [/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite]
9493
INFO: Signature to run: []
95-
INFO: #threads used for CPU inference: [4]
9694
INFO: Use xnnpack: [0]
9795
INFO: Loaded model /usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite
9896
INFO: The input model file size (MB): 67.3128
99-
INFO: Initialized session in 6.418ms.
97+
INFO: Initialized session in 5.579ms.
10098
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
101-
INFO: count=1 curr=1041765
99+
INFO: count=1 curr=1357602 p5=1357602 median=1357602 p95=1357602
102100
103101
INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
104-
INFO: count=50 first=977738 curr=964908 min=911877 max=1112273 avg=971535 std=39112
102+
INFO: count=50 first=1249964 curr=1240143 min=1238588 max=1252566 avg=1.24027e+06 std=2565 p5=1238753 median=1239807 p95=1247415
105103
106-
INFO: Inference timings in us: Init: 6418, First inference: 1041765, Warmup (avg): 1.04176e+06, Inference (avg): 971535
104+
INFO: Inference timings in us: Init: 5579, First inference: 1357602, Warmup (avg): 1.3576e+06, Inference (avg): 1.24027e+06
107105
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
108-
INFO: Memory footprint delta from the start of the tool (MB): init=6.14844 overall=109.848
106+
INFO: Memory footprint delta from the start of the tool (MB): init=6.36328 overall=109.832
109107
110108
Where,
111109

@@ -130,26 +128,23 @@ The output of the benchmarking application should be similar to,
130128
root@am62xx-evm:~# /opt/tensorflow-lite/tools/benchmark_model --graph=/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite --num_threads=4 --use_xnnpack=true
131129
INFO: STARTING!
132130
INFO: Log parameter values verbosely: [0]
133-
INFO: Num threads: [4]
134131
INFO: Graph: [/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite]
135132
INFO: Signature to run: []
136-
INFO: #threads used for CPU inference: [4]
137133
INFO: Use xnnpack: [1]
138134
INFO: Loaded model /usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite
139135
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
140136
INFO: XNNPACK delegate created.
141137
INFO: Explicitly applied XNNPACK delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
142138
INFO: The input model file size (MB): 67.3128
143-
INFO: Initialized session in 592.232ms.
139+
INFO: Initialized session in 614.333ms.
144140
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
145-
INFO: count=1 curr=633430
146-
141+
INFO: count=1 curr=905463 p5=905463 median=905463 p95=905463
147142
INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
148-
INFO: count=50 first=605745 curr=618849 min=568228 max=722188 avg=602943 std=27690
149-
150-
INFO: Inference timings in us: Init: 592232, First inference: 633430, Warmup (avg): 633430, Inference (avg): 602943
143+
INFO: count=50 first=900416 curr=898333 min=898007 max=906121 avg=899641 std=1549 p5=898333 median=899281 p95=904305
144+
INFO: Inference timings in us: Init: 614333, First inference: 905463, Warmup (avg): 905463, Inference (avg): 899641
151145
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
152-
INFO: Memory footprint delta from the start of the tool (MB): init=133.086 overall=149.531
146+
INFO: Memory footprint delta from the start of the tool (MB): init=146.363 overall=150.141
147+
153148
154149
Where,
155150

@@ -166,14 +161,14 @@ The following performance numbers are captured with :command:`benchmark_model` o
166161
:header: "SOC", "Delegates", "Inference Time (sec)", "Initialization Time (ms)", "Overall Memory Footprint (MB)"
167162
:widths: 10, 10, 20, 20, 20
168163

169-
"AM62X", "CPU only", "0.977168", "6.129", "110.07"
170-
"", "XNNPACK", "0.613474", "593.558", "149.699"
171-
"AM62PX", "CPU only", "0.419261", "4.79", "108.707"
172-
"", "XNNPACK", "0.274756", "1208.04", "149.395"
173-
"AM64X", "CPU only", "1.10675", "144.535", "109.562"
174-
"", "XNNPACK", "0.702809", "601.33", "149.602"
175-
"AM62L", "CPU only", "1.04867", "6.088", "110.129"
176-
"", "XNNPACK", "0.661133", "466.216", "149.703"
164+
"AM62X", "CPU only", "1.24027", "5.579", "109.832"
165+
"", "XNNPACK", "0.899641", "614.333", "150.141"
166+
"AM62PX", "CPU only", "1.23341", "252.390", "111.121"
167+
"", "XNNPACK", "0.875280", "597.639", "150.52"
168+
"AM64X", "CPU only", "1.26429", "135.579", "110.188"
169+
"", "XNNPACK", "0.740743", "885.636", "150.484"
170+
"AM62L", "CPU only", "1.3708", "807.076", "111.152"
171+
"", "XNNPACK", "0.930577", "769.145", "150.496"
177172

178173
Based on the above data, using the XNNPACK delegate significantly improves inference times across all SoCs, though it generally increases initialization time and overall memory footprint.
179174

0 commit comments

Comments
 (0)