Skip to content

Commit 985f73e

Browse files
committed
feat(linux): Update ML library versions and documentation
This commit updates the ML library versions and documentation - Update ARM Compute Library from 24.12 to 52.7.0 - Update Arm NN from 24.11 to 26.01 - Update NNStreamer from 2.4.2 to 2.6.0 - Update ONNX Runtime from 1.20.1 to 1.23.2 - Update TensorFlow Lite from 2.18.0 to 2.20.0 - Refresh all test outputs and benchmark results - Add ML components to AM62DX documentation TOC - Update component table with latest library information Signed-off-by: Pratham Deshmukh <p-deshmukh@ti.com>
1 parent 6448b30 commit 985f73e

File tree

8 files changed

+151
-100
lines changed

8 files changed

+151
-100
lines changed

configs/AM62DX/AM62DX_linux_toc.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,12 @@ linux/Foundational_Components/Kernel/Kernel_Drivers/UART
6262
linux/Foundational_Components/Kernel/Kernel_Drivers/UBIFS
6363
linux/Foundational_Components/Kernel/Kernel_Drivers/VTM
6464
linux/Foundational_Components/Kernel/Kernel_Drivers/Watchdog
65+
linux/Foundational_Components_Machine_Learning
66+
linux/Foundational_Components/Machine_Learning/arm_compute_library
67+
linux/Foundational_Components/Machine_Learning/armnn
68+
linux/Foundational_Components/Machine_Learning/nnstreamer
69+
linux/Foundational_Components/Machine_Learning/onnxrt
70+
linux/Foundational_Components/Machine_Learning/tflite
6571

6672
#linux/Foundational_Components_Power_Management
6773

91.6 KB
Loading

source/linux/Foundational_Components/Machine_Learning/arm_compute_library.rst

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Exact list of functions can be found at https://www.arm.com/products/development
1010
Supported versions
1111
------------------
1212

13-
- ARM Compute Library 24.12
13+
- ARM Compute Library 52.7.0
1414

1515
Arm Compute Library Testing
1616
---------------------------
@@ -19,10 +19,10 @@ Arm Compute Libraries, tests, and sample executables are included in the SDK fil
1919

2020
.. code-block:: console
2121
22-
root@am62xx-evm:~# LD_LIBRARY_PATH=/usr/lib/tests/ /usr/lib/tests/arm_compute_validation
23-
Version = 32bcced2af7feea6969dd1d22e58d0718dc488e3
24-
CommandLine = /usr/lib/tests/arm_compute_validation
25-
Seed = 3778037091
22+
root@am62xx-evm:~# LD_LIBRARY_PATH=/usr/bin/arm-compute-library-52.7.0/tests/ /usr/bin/arm-compute-library-52.7.0/tests/arm_compute_validation
23+
Version = c9a1fff898abd5109b759e8e16616519dc758fdd
24+
CommandLine = /usr/bin/arm-compute-library-52.7.0/tests/arm_compute_validation
25+
Seed = 165977448
2626
cpu_has_sve = false
2727
cpu_has_sve2 = false
2828
cpu_has_svef32mm = false
@@ -34,22 +34,23 @@ Arm Compute Libraries, tests, and sample executables are included in the SDK fil
3434
cpu_has_bf16 = false
3535
cpu_has_dotprod = false
3636
cpu_has_i8mm = false
37+
cpu_has_fhm = false
3738
CPU0 = A53
3839
CPU1 = A53
3940
CPU2 = A53
4041
CPU3 = A53
4142
Iterations = 1
4243
Threads = 1
4344
Dataset mode = PRECOMMIT
44-
Running [0] 'UNIT/CPPScheduler/RethrowException'
45-
Wall clock/Wall clock time: AVG=3466.0000 us
45+
Running [0] 'UNIT/DataTypeUtils/CheckDataTypeIsPrinted@DataType=QSYMM8'
46+
Wall clock/Wall clock time: AVG=3.0000 us
4647
4748
4849
.. code-block:: console
4950
50-
root@am62xx-evm:~# /usr/bin/arm-compute-library-24.12/examples/graph_alexnet
51+
root@am62xx-evm:~# /usr/bin/arm-compute-library-52.7.0/examples/graph_alexnet
5152
52-
/usr/bin/arm-compute-library-24.12/examples/graph_alexnet
53+
/usr/bin/arm-compute-library-52.7.0/examples/graph_alexnet
5354
5455
Threads : 1
5556
Target : Neon
@@ -58,8 +59,8 @@ Arm Compute Libraries, tests, and sample executables are included in the SDK fil
5859
Tuner enabled? : false
5960
Cache enabled? : false
6061
Tuner mode : Normal
61-
Tuner file :
62-
MLGO file :
62+
Tuner file :
63+
MLGO file :
6364
Fast math enabled? : false
6465
6566
Test passed
@@ -69,16 +70,17 @@ Sample NN related executables (using Arm Compute Library only):
6970

7071
.. code-block:: console
7172
72-
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-24.12/examples/graph_*
73-
graph_alexnet graph_inception_v4 graph_resnext50 graph_vgg19
74-
graph_deepspeech_v0_4_1 graph_lenet graph_shufflenet graph_vgg_vdsr
75-
graph_edsr graph_mobilenet graph_squeezenet graph_yolov3
76-
graph_googlenet graph_mobilenet_v2 graph_squeezenet_v1_1
77-
graph_inception_resnet_v1 graph_resnet12 graph_srcnn955
78-
graph_inception_resnet_v2 graph_resnet50 graph_ssd_mobilenet
79-
graph_inception_v3 graph_resnet_v2_50 graph_vgg16
73+
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-52.7.0/examples/graph_*
74+
graph_alexnet graph_lenet graph_squeezenet
75+
graph_deepspeech_v0_4_1 graph_mobilenet graph_squeezenet_v1_1
76+
graph_edsr graph_mobilenet_v2 graph_srcnn955
77+
graph_googlenet graph_resnet12 graph_ssd_mobilenet
78+
graph_inception_resnet_v1 graph_resnet50 graph_vgg16
79+
graph_inception_resnet_v2 graph_resnet_v2_50 graph_vgg19
80+
graph_inception_v3 graph_resnext50 graph_vgg_vdsr
81+
graph_inception_v4 graph_shufflenet graph_yolov3
8082
8183
.. code-block:: console
8284
83-
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-24.12/examples/neon_*
85+
root@am62xx-evm:~# ls /usr/bin/arm-compute-library-52.7.0/examples/neon_*
8486
neon_cnn neon_copy_objects neon_gemm_qasymm8 neon_gemm_s8_f32 neon_permute neon_scale neon_sgemm

source/linux/Foundational_Components/Machine_Learning/armnn.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ in conjunction with the TIDL TensorFlow Lite Delegate.
2323
Supported versions
2424
------------------
2525

26-
- Arm NN 24.11
26+
- Arm NN 26.01

source/linux/Foundational_Components/Machine_Learning/nnstreamer.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,15 @@ https://nnstreamer.ai/
1212
Supported versions
1313
------------------
1414

15-
- NNStreamer 2.4.2
15+
- NNStreamer 2.6.0
1616

1717
Testing NNStreamer
1818
------------------
1919

2020
.. code-block:: console
2121
2222
root@am62xx-evm:~# nnstreamer-check
23-
NNStreamer version: 2.4.2
23+
NNStreamer version: 2.6.0
2424
loaded : TRUE
2525
path : /usr/lib/gstreamer-1.0/libnnstreamer.so
2626
...

source/linux/Foundational_Components/Machine_Learning/onnxrt.rst

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ https://onnxruntime.ai/
1818
Supported version
1919
-----------------
2020

21-
- ONNX Runtime 1.20.1
21+
- ONNX Runtime 1.23.2
2222

2323
ONNX Runtime test applications
2424
------------------------------
@@ -34,7 +34,7 @@ Running benchmark_model
3434
usage: perf_test [options...] model_path [result_file]
3535
Options:
3636
-m [test_mode]: Specifies the test mode. Value could be 'duration' or 'times'.
37-
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
37+
Provide 'duration' to run the test for a fix duration, and 'times' to repeated for a certain times.
3838
-M: Disable memory pattern.
3939
-A: Disable memory arena
4040
-I: Generate tensor input binding (Free dimensions are treated as 1.)
@@ -55,19 +55,19 @@ Running benchmark_model
5555
-o [optimization level]: Default is 99 (all). Valid values are 0 (disable), 1 (basic), 2 (extended), 99 (all).
5656
Please see onnxruntime_c_api.h (enum GraphOptimizationLevel) for the full list of all optimization levels.
5757
-u [optimized_model_path]: Specify the optimized model path for saving.
58-
-d [CUDA only][cudnn_conv_algorithm]: Specify CUDNN convolution algorithms: 0(benchmark), 1(heuristic), 2(default).
59-
-q [CUDA only] use separate stream for copy.
58+
-d [CUDA only][cudnn_conv_algorithm]: Specify CUDNN convolution algorithms: 0(benchmark), 1(heuristic), 2(default).
59+
-q [CUDA only] use separate stream for copy.
6060
-z: Set denormal as zero. When turning on this option reduces latency dramatically, a model may have denormals.
61-
-C: Specify session configuration entries as key-value pairs: -C "<key1>|<value1> <key2>|<value2>"
62-
Refer to onnxruntime_session_options_config_keys.h for valid keys and values.
63-
[Example] -C "session.disable_cpu_ep_fallback|1 ep.context_enable|1"
64-
-i: Specify EP specific runtime options as key value pairs. Different runtime options available are:
61+
-C: Specify session configuration entries as key-value pairs: -C "<key1>|<value1> <key2>|<value2>"
62+
Refer to onnxruntime_session_options_config_keys.h for valid keys and values.
63+
[Example] -C "session.disable_cpu_ep_fallback|1 ep.context_enable|1"
64+
-i: Specify EP specific runtime options as key value pairs. Different runtime options available are:
6565
[Usage]: -e <provider_name> -i '<key1>|<value1> <key2>|<value2>'
6666
67-
[ACL only] [enable_fast_math]: Options: 'true', 'false', default: 'false',
67+
[ACL only] [enable_fast_math]: Options: 'true', 'false', default: 'false',
6868
6969
-T [Set intra op thread affinities]: Specify intra op thread affinity string
70-
[Example]: -T 1,2;3,4;5,6 or -T 1-2;3-4;5-6
70+
[Example]: -T 1,2;3,4;5,6 or -T 1-2;3-4;5-6
7171
Use semicolon to separate configuration between threads.
7272
E.g. 1,2;3,4;5,6 specifies affinities for three threads, the first thread will be attached to the first and second logical processor.
7373
The number of affinities must be equal to intra_op_num_threads - 1
@@ -84,22 +84,22 @@ Example of running *onnxruntime_perf_test* on target using the pre-installed mob
8484
.. code-block:: console
8585
8686
# /usr/bin/onnxruntime-tests/onnxruntime_perf_test -I -m times -r 8 -e acl -P /usr/bin/onnxruntime-tests/testdata/mobilenet_v3_small_excerpt.onnx
87-
Session creation time cost: 0.0273071 s
88-
First inference time cost: 20 ms
89-
Total inference time cost: 0.14188 s
87+
Session creation time cost: 0.139671 s
88+
First inference time cost: 15 ms
89+
Total inference time cost: 0.126396 s
9090
Total inference requests: 8
91-
Average inference time cost: 17.735 ms
92-
Total inference run time: 0.141991 s
93-
Number of inferences per second: 56.3415
94-
Avg CPU usage: 98 %
95-
Peak working set size: 35299328 bytes
96-
Avg CPU usage:98
97-
Peak working set size:35299328
91+
Average inference time cost: 15.7995 ms
92+
Total inference run time: 0.126518 s
93+
Number of inferences per second: 63.232
94+
Avg CPU usage: 100 %
95+
Peak working set size: 37994496 bytes
96+
Avg CPU usage:100
97+
Peak working set size:37994496
9898
Runs:8
99-
Min Latency: 0.0159831 s
100-
Max Latency: 0.0232702 s
101-
P50 Latency: 0.0167086 s
102-
P90 Latency: 0.0232702 s
103-
P95 Latency: 0.0232702 s
104-
P99 Latency: 0.0232702 s
105-
P999 Latency: 0.0232702 s
99+
Min Latency: 0.00955697 s
100+
Max Latency: 0.0239688 s
101+
P50 Latency: 0.0156388 s
102+
P90 Latency: 0.0239688 s
103+
P95 Latency: 0.0239688 s
104+
P99 Latency: 0.0239688 s
105+
P999 Latency: 0.0239688 s

source/linux/Foundational_Components/Machine_Learning/tflite.rst

Lines changed: 66 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ It supports on-device inference with low latency and a compact binary size. You
1818
Features
1919
********
2020

21-
- TensorFlow Lite v2.18.0 via Yocto - `meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.18.0.bb <https://web.git.yoctoproject.org/meta-arago/tree/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.18.0.bb?h=11.00.09>`__
21+
- TensorFlow Lite v2.20.0 via Yocto - `meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.20.0.bb <https://web.git.yoctoproject.org/meta-arago/tree/meta-arago-extras/recipes-framework/tensorflow-lite/tensorflow-lite_2.18.0.bb?h=11.00.09>`__
2222
- Multithreaded computation with acceleration using Arm Neon SIMD instructions on Cortex-A cores
2323
- C++ Library and Python interpreter (supported Python version 3)
2424
- TensorFlow Lite Model benchmark Tool (i.e. :command:`benchmark_model`)
@@ -89,23 +89,21 @@ The output of the benchmarking application should be similar to:
8989
root@am62xx-evm:~# /opt/tensorflow-lite/tools/benchmark_model --graph=/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite --num_threads=4 --use_xnnpack=false
9090
INFO: STARTING!
9191
INFO: Log parameter values verbosely: [0]
92-
INFO: Num threads: [4]
9392
INFO: Graph: [/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite]
9493
INFO: Signature to run: []
95-
INFO: #threads used for CPU inference: [4]
9694
INFO: Use xnnpack: [0]
9795
INFO: Loaded model /usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite
9896
INFO: The input model file size (MB): 67.3128
99-
INFO: Initialized session in 6.418ms.
97+
INFO: Initialized session in 5.579ms.
10098
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
101-
INFO: count=1 curr=1041765
99+
INFO: count=1 curr=1357602 p5=1357602 median=1357602 p95=1357602
102100
103101
INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
104-
INFO: count=50 first=977738 curr=964908 min=911877 max=1112273 avg=971535 std=39112
102+
INFO: count=50 first=1249964 curr=1240143 min=1238588 max=1252566 avg=1.24027e+06 std=2565 p5=1238753 median=1239807 p95=1247415
105103
106-
INFO: Inference timings in us: Init: 6418, First inference: 1041765, Warmup (avg): 1.04176e+06, Inference (avg): 971535
104+
INFO: Inference timings in us: Init: 5579, First inference: 1357602, Warmup (avg): 1.3576e+06, Inference (avg): 1.24027e+06
107105
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
108-
INFO: Memory footprint delta from the start of the tool (MB): init=6.14844 overall=109.848
106+
INFO: Memory footprint delta from the start of the tool (MB): init=6.36328 overall=109.832
109107
110108
Where,
111109

@@ -130,26 +128,23 @@ The output of the benchmarking application should be similar to,
130128
root@am62xx-evm:~# /opt/tensorflow-lite/tools/benchmark_model --graph=/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite --num_threads=4 --use_xnnpack=true
131129
INFO: STARTING!
132130
INFO: Log parameter values verbosely: [0]
133-
INFO: Num threads: [4]
134131
INFO: Graph: [/usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite]
135132
INFO: Signature to run: []
136-
INFO: #threads used for CPU inference: [4]
137133
INFO: Use xnnpack: [1]
138134
INFO: Loaded model /usr/share/oob-demo-assets/models/ssd_mobilenet_v2_coco.tflite
139135
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
140136
INFO: XNNPACK delegate created.
141137
INFO: Explicitly applied XNNPACK delegate, and the model graph will be partially executed by the delegate w/ 1 delegate kernels.
142138
INFO: The input model file size (MB): 67.3128
143-
INFO: Initialized session in 592.232ms.
139+
INFO: Initialized session in 614.333ms.
144140
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
145-
INFO: count=1 curr=633430
146-
141+
INFO: count=1 curr=905463 p5=905463 median=905463 p95=905463
147142
INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
148-
INFO: count=50 first=605745 curr=618849 min=568228 max=722188 avg=602943 std=27690
149-
150-
INFO: Inference timings in us: Init: 592232, First inference: 633430, Warmup (avg): 633430, Inference (avg): 602943
143+
INFO: count=50 first=900416 curr=898333 min=898007 max=906121 avg=899641 std=1549 p5=898333 median=899281 p95=904305
144+
INFO: Inference timings in us: Init: 614333, First inference: 905463, Warmup (avg): 905463, Inference (avg): 899641
151145
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
152-
INFO: Memory footprint delta from the start of the tool (MB): init=133.086 overall=149.531
146+
INFO: Memory footprint delta from the start of the tool (MB): init=146.363 overall=150.141
147+
153148
154149
Where,
155150

@@ -166,14 +161,14 @@ The following performance numbers are captured with :command:`benchmark_model` o
166161
:header: "SOC", "Delegates", "Inference Time (sec)", "Initialization Time (ms)", "Overall Memory Footprint (MB)"
167162
:widths: 10, 10, 20, 20, 20
168163

169-
"AM62X", "CPU only", "0.977168", "6.129", "110.07"
170-
"", "XNNPACK", "0.613474", "593.558", "149.699"
171-
"AM62PX", "CPU only", "0.419261", "4.79", "108.707"
172-
"", "XNNPACK", "0.274756", "1208.04", "149.395"
173-
"AM64X", "CPU only", "1.10675", "144.535", "109.562"
174-
"", "XNNPACK", "0.702809", "601.33", "149.602"
175-
"AM62L", "CPU only", "1.04867", "6.088", "110.129"
176-
"", "XNNPACK", "0.661133", "466.216", "149.703"
164+
"AM62X", "CPU only", "1.24027", "5.579", "109.832"
165+
"", "XNNPACK", "0.899641", "614.333", "150.141"
166+
"AM62PX", "CPU only", "1.23341", "252.390", "111.121"
167+
"", "XNNPACK", "0.875280", "597.639", "150.52"
168+
"AM64X", "CPU only", "1.26429", "135.579", "110.188"
169+
"", "XNNPACK", "0.740743", "885.636", "150.484"
170+
"AM62L", "CPU only", "1.3708", "807.076", "111.152"
171+
"", "XNNPACK", "0.930577", "769.145", "150.496"
177172

178173
Based on the above data, using the XNNPACK delegate significantly improves inference times across all SoCs, though it generally increases initialization time and overall memory footprint.
179174

@@ -185,10 +180,12 @@ Based on the above data, using the XNNPACK delegate significantly improves infer
185180
Example Applications
186181
********************
187182

188-
|__SDK_FULL_NAME__| has integrated opensource components like NNStreamer which can be used for neural network inferencing using the sample tflite models under :file:`/usr/share/oob-demo-assets/models/`
189-
Checkout the Object Detection usecase under :ref:`TI Apps Launcher - User Guide <TI-Apps-Launcher-User-Guide-label>`
183+
.. ifconfig:: CONFIG_part_variant in ('AM62X', 'AM62LX', 'AM62PX')
190184

191-
Alternatively, if a display is connected, you can run the Object Detection pipeline using this command,
185+
|__SDK_FULL_NAME__| has integrated opensource components like NNStreamer which can be used for neural network inferencing using the sample tflite models under :file:`/usr/share/oob-demo-assets/models/`
186+
Checkout the Object Detection usecase under :ref:`TI Apps Launcher - User Guide <TI-Apps-Launcher-User-Guide-label>`
187+
188+
Alternatively, if a display is connected, you can run the Object Detection pipeline using this command,
192189

193190
.. ifconfig:: CONFIG_part_variant in ('AM62X', 'AM62LX')
194191

@@ -248,6 +245,47 @@ Alternatively, if a display is connected, you can run the Object Detection pipel
248245
249246
The above GStreamer pipeline reads an H.264 video file, decodes it, and processes it for object detection using a TensorFlow Lite model, displaying bounding boxes around detected objects. The processed video is then composited and rendered on the screen using the ``kmssink`` element.
250247

248+
.. ifconfig:: CONFIG_part_variant in ('AM62DX')
249+
250+
|__SDK_FULL_NAME__| has integrated opensource components like NNStreamer which can be used for neural network inferencing using the sample TensorFlow Lite models under :file:`/usr/share/oob-demo-assets/models/`
251+
252+
If an audio input device is connected, you can run the Audio Classification pipeline using this command:
253+
254+
.. code-block:: console
255+
256+
gst-launch-1.0 \
257+
alsasrc ! \
258+
audioconvert ! \
259+
audioresample ! \
260+
audio/x-raw,format=S16LE,channels=1,rate=16000,layout=interleaved ! \
261+
tensor_converter frames-per-tensor=3900 ! \
262+
tensor_aggregator \
263+
frames-in=3900 \
264+
frames-out=15600 \
265+
frames-flush=3900 \
266+
frames-dim=1 ! \
267+
tensor_transform \
268+
mode=arithmetic \
269+
option=typecast:float32,add:0.5,div:32767.5 ! \
270+
tensor_transform \
271+
mode=transpose \
272+
option=1:0:2:3 ! \
273+
queue \
274+
leaky=2 \
275+
max-size-buffers=10 ! \
276+
tensor_filter \
277+
framework=tensorflow2-lite \
278+
model=/usr/share/oob-demo-assets/models/yamnet_audio_classification.tflite \
279+
custom=Delegate:XNNPACK,NumThreads:2 ! \
280+
tensor_decoder \
281+
mode=image_labeling \
282+
option1=/usr/share/oob-demo-assets/labels/yamnet_label_list.txt ! \
283+
filesink \
284+
buffer-mode=2 \
285+
location=/dev/stdout
286+
287+
The above GStreamer pipeline captures real-time audio from an ALSA source, converts it to the required format, and processes it for audio event classification using the YAMNet TensorFlow Lite model. The audio data is aggregated into tensors, normalized for machine learning input, and classified to identify various audio events and sounds. The classification results are decoded to human-readable labels and output to stdout.
288+
251289
.. attention::
252290

253291
The Example Applications section is not applicable for AM64x

0 commit comments

Comments
 (0)