Demo applications DX-RT 3.3.2

DEEPX provides demo applications through the dx-app repository. These applications are included in the pre-built images provided by Embedded Artists, and can also be built into custom Yocto images by including the meta-ea-dx Yocto layer.

The way to run applications is version dependant. To check which version is being used, run dxrt-cli -s on target. The DXRT version shown on the first line like this. For 3.3.2 it looks like this:

DXRT v3.3.2
=======================================================
 * Device 0: M1, Accelerator type
---------------------   Version   ---------------------

This page is only for version 3.3.2 and later. For older versions visit Demo applications up to DX-RT 3.1.0 or Demo applications DX-RT 3.2.0.

All examples on this page were executed on RZ/G3E hardware running on an official build available on sw.embeddedartists.com. Note that the official builds does not include the Renesas proprietary graphics and codec drivers so the profiling data below is quite low. See instructions on the wiki, specifically Add Proprietary Packages for information about adding the support.

Models and Resources

To run the demo applications, compiled DNNX models as well as additional resources - such as sample videos and images - are required. The AI kits have the models and resources already installed, but your own build will not. Follow the steps below to install the resources.

Models

DXNN models have to be downloaded from DeepX Model Zoo one by one.

This is the procedure to download Ultralytics YOLOv8-n-ppu but the same steps can be used for any model

Go to DeepX Model Zoo
Locate an interesting model
Go to the DXNN column (you will get 'Q-Lite DXNN' tooltip when you are over the correct icon)
Right click the download symbol and select 'Copy link address"

In Linux on the RZ/G3E SOM, run this command to download the model:

mkdir -p assets/models
wget -P assets/models https://sdk.deepx.ai/modelzoo/dxnn/2_3_0/YoloV8N_PPU.dxnn

Repeat the steps for all the models that you want to evaluate.

Download sample videos

Sample video files are downloaded from the DEEPX resource directory and unpacked into the directory: assets/videos.

wget https://sdk.deepx.ai/res/video/sample_videos.tar.gz
mkdir -p assets/videos
tar -xvf sample_videos.tar.gz -C assets/videos/

Download sample images

Sample image files are downloaded from the dx-app GitHub repository and stored in the directory: assets/images.

mkdir -p assets/images
wget -P assets/images/ https://raw.githubusercontent.com/DEEPX-AI/dx_app/refs/heads/main/sample/ILSVRC2012/1.jpeg
wget -P assets/images/ https://raw.githubusercontent.com/DEEPX-AI/dx_app/refs/heads/main/sample/img/7.jpg
wget -P assets/images/ https://raw.githubusercontent.com/DEEPX-AI/dx_app/refs/heads/main/sample/img/8.jpg

About Examples

In version 3.3.2 of DX_RT there was a overhaul of the demo applications.

As a result, these are now > 500 applications (all in /usr/bin/). Common names are:

3ddfa_v2_mobilnet*_*sync
alexnet*_*sync
arcface_iresnet*_*sync
bisenet*_*sync
densenet*_*sync
mobilenet*_*sync
resnet*_*sync
scrfd*_*sync
yolov*_*sync

Note on closing GUI applications

When running the examples below and passing in an image file (i.e. via the -i parameter) the graphical window that shows the result is difficult to close.

If a mouse is used to press the X in the upper right corner, then the window will go away but the program is still running so the terminal appears hanging
Using Ctrl-C in the terminal does not work

These are two ways to solve the issue:

Use a USB keyboard and press the 'Q' key to close the application. This is the preferred way as the program exits nicely and will print all the statistics from the run.
In the terminal that launched the example: (assuming the program running is efficientnetb4_async otherwise change the pgrep parmeter)
- press Ctrl-Z to get back to the terminal
- run pgrep efficientnetb4_async and it will show the process number, e.g. 523
- run kill -9 523

This is only an issue with images. If the input is camera or video then the programs close correctly.

Image classification

Image classification is a computer vision task in which a neural network analyzes an image and assigns it a single label from a predefined set of categories. The image classification demo demonstrates this functionality by running a trained model on a static image.

Run the following command to classify an image:

efficientnetb4_async -m assets/models/EfficientNetB4.dxnn -i assets/images/ILSVRC2012/1.jpeg -l 1

[INFO] Model loaded: assets/models/EfficientNetB4.dxnn
[INFO] Model input size (WxH): 380x380

[INFO] Starting async inference...

==================================================
               PERFORMANCE SUMMARY
==================================================
 Pipeline Step   Avg Latency     Throughput
--------------------------------------------------
 Read               7.92 ms      126.2 FPS
 Preprocess         4.42 ms      226.2 FPS
 Inference          0.00 ms      102.3 FPS*
 Postprocess        0.03 ms     37616.6 FPS
 Render             3.98 ms      251.5 FPS
 Display          218.23 ms        4.6 FPS
--------------------------------------------------
 * Async: turnaround latency (submit to callback)
   Throughput measured independently
--------------------------------------------------
 Infer Completed     :    1
 Infer Inflight Avg  :    0.0
 Infer Inflight Max  :      1
--------------------------------------------------
 Total Frames        :    1
 Total Time          :    8.3 s
 Overall FPS         :   0.1 FPS
==================================================

The program (as of version 3.3.2) no longer prints any information about the result on the classification in the terminal. Instead the information is overlayed on top of the image in the window.

Object detection

This section describes how to run object detection demos using YOLO-based models. Object detection is a computer vision task in which a neural network identifies and localizes multiple objects within an image or video by drawing bounding boxes around detected objects and assigning class labels.

Run the following command to perform object detection on a video stream. When a display is connected, the video output will be shown with bounding boxes overlaid on the detected objects.

yolov7_ppu_async -m assets/models/YoloV7_PPU.dxnn -v assets/videos/snowboard.mp4

When the demo finishes, a large amount of output will be printed to the console, including performance-related information. One key metric is the frames per second (fps) value, which indicates the runtime performance of the object detection pipeline. An example output is shown below.

[INFO] Model loaded: assets/models/YoloV7_PPU.dxnn
[INFO] Model input size (WxH): 640x640


[INFO] Starting async inference...

==================================================
               PERFORMANCE SUMMARY
==================================================
 Pipeline Step   Avg Latency     Throughput
--------------------------------------------------
 Read              21.96 ms       45.5 FPS
 Preprocess        11.10 ms       90.1 FPS
 Inference         22.60 ms       18.4 FPS*
 Postprocess        0.21 ms     4823.5 FPS
 Render            10.46 ms       95.6 FPS
 Display           11.16 ms       89.6 FPS
--------------------------------------------------
 * Async: turnaround latency (submit to callback)
   Throughput measured independently
--------------------------------------------------
 Infer Completed     :    855
 Infer Inflight Avg  :    0.0
 Infer Inflight Max  :      1
--------------------------------------------------
 Total Frames        :    855
 Total Time          :    46.7 s
 Overall FPS         :   18.3 FPS
==================================================

Pose estimation

This section describes how to run the pose estimation demo using the YOLOv5 Pose model. Pose estimation is a computer vision task used to identify and track key anatomical landmarks - such as joints or other keypoints - of a person or object within an image or video.

Run the following command to perform pose estimation on a video stream. When a display is connected, the video output will be shown with bounding boxes and lines overlaid on the detected poses.

yolov5pose_ppu_async -m assets/models/YOLOV5Pose_PPU.dxnn -v assets/videos/dance-group.mov

When the demo finishes, a large amount of output will be printed to the console, including performance-related information. One key metric is the frames per second (fps) value, which indicates the runtime performance of the pose detection pipeline. An example output is shown below.

[INFO] Model loaded: assets/models/YOLOV5Pose_PPU.dxnn
[INFO] Model input size (WxH): 640x640

[INFO] Starting async inference...

==================================================
               PERFORMANCE SUMMARY
==================================================
 Pipeline Step   Avg Latency     Throughput
--------------------------------------------------
 Read              27.72 ms       36.1 FPS
 Preprocess        10.44 ms       95.8 FPS
 Inference         11.81 ms       15.7 FPS*
 Postprocess        0.21 ms     4771.1 FPS
 Render            22.63 ms       44.2 FPS
 Display           12.55 ms       79.7 FPS
--------------------------------------------------
 * Async: turnaround latency (submit to callback)
   Throughput measured independently
--------------------------------------------------
 Infer Completed     :    478
 Infer Inflight Avg  :    0.0
 Infer Inflight Max  :      1
--------------------------------------------------
 Total Frames        :    478
 Total Time          :    30.7 s
 Overall FPS         :   15.6 FPS
==================================================

Segmentation

This section describes how to run the semantic segmentation demo using the DeepLabV3Plus model. Semantic segmentation is a computer vision task that assigns a class label to each pixel in an image, effectively partitioning the image into meaningful regions. This technique is commonly used for applications such as scene understanding and image analysis.

Run the following command to perform segmentation on an image:

deeplabv3mobilenetv2_async -m assets/models/DeepLabV3PlusMobileNetV2.dxnn -i assets/images/img/sample_crowd.jpg

A window will pop up showing the segmentation as differently colored areas on the image.

[INFO] Model loaded: assets/models/DeepLabV3PlusMobileNetV2.dxnn
[INFO] Model input size (WxH): 512x512

[INFO] Starting async inference...

==================================================
               PERFORMANCE SUMMARY
==================================================
 Pipeline Step   Avg Latency     Throughput
--------------------------------------------------
 Read              54.60 ms       18.3 FPS
 Preprocess        12.22 ms       81.8 FPS
 Inference         12.35 ms       48.4 FPS*
 Postprocess        8.33 ms      120.1 FPS
 Render            36.71 ms       27.2 FPS
 Display          213.83 ms        4.7 FPS
--------------------------------------------------
 * Async: turnaround latency (submit to callback)
   Throughput measured independently
--------------------------------------------------
 Infer Completed     :    1
 Infer Inflight Avg  :    0.0
 Infer Inflight Max  :      1
--------------------------------------------------
 Total Frames        :    1
 Total Time          :    7.8 s
 Overall FPS         :   0.1 FPS
==================================================

Benchmark

A benchmark tool named dxbenchmark is provided as part of the dx-rt repository. This tool can be used to quickly evaluate the inference performance of different models on the DX-M1 accelerator.

Run the following command to benchmark all models located in the assets/models directory. In this example, 100 inference loops are executed for each model:

dxbenchmark --dir assets/models/ -l 100

The program will print a summary when finished:

  -------------------------------------------------------------------------------
  | Device 0                                                                    |
  |            Name                |  min (us)    |  max (us)    | average (us) |
  -------------------------------------------------------------------------------
  |                    Buffer Wait |            4 |        29499 |      8148.47 |
  |       NPU Input Format Handler |            3 |           85 |      9.37273 |
  |                     PCIe Write |          976 |         3474 |      1514.77 |
  |                       NPU Core |        16377 |        29195 |      27147.6 |
  |                      PCIe Read |            3 |            8 |      4.07273 |
  |                       NPU Task |        17544 |        58607 |      52121.9 |
  -------------------------------------------------------------------------------

Depending on the number of models and the selected loop count, the benchmark may take several minutes to complete. Once finished, the benchmark results are generated in multiple formats, including:

a JSON file
a CSV file
an HTML report

These reports provide detailed performance metrics for each model, such as inference time and throughput.

As of version 3.3.2 the information is limited and the charts have been removed.

Below is a screenshot showing a portion of the generated HTML report (it is just an example and the values are not correct).

dxbenchmark html report

Models and Resources​

Models​

Download sample videos​

Download sample images​

About Examples​

Note on closing GUI applications​

Image classification​

Object detection​

Pose estimation​

Segmentation​

Benchmark​