#### **CMC Infrastructure for Supporting Cloud and Edge**

#### **Computing Research**

Lowering barriers to technology adoption

YASSINE HARIRI CMC MICROSYSTEMS

CMC

© 2020 and Reg. TM – CMC Microsystems

# LAB Shorten the development cycle

# Access platform-based microsystems design and prototyping environments

- > Systems
- > Equipment Rental, Test Fixtures
- > Services for emerging processes and products
- > Contract R&D

And more: training, webinars, events, CMC engineer support





# LAB www.cmc.ca/Lab-Development-Systems

#### Cloud

> FPGA/GPU Cluster

#### Edge

> Xilinx ZCU102 Zynq Ultrascale+ MPSoC Evaluation Kit

#### **RISC-V**

> RISC-V Processor Design and Prototyping

#### Development Kit Rental

> Xilinx ZCU102 Zynq Ultrascale+ MPSoC Evaluation Kit



Machine Learning Platform An Al solution for monitoring and interpreting aerial surveillance video and imaging... a project supported by National Defence, Innovation for Defence Excellence and Security (IDEaS) program.





## CMC Cloud FPGA/GPU Cluster

#### > CPUs, GPUs and FPGAs in pre-validated cluster to scale heterogenous computing workloads

- > Machine learning training and inference (e.g. CNN for object detection, speech recognition)
- Video Processing / Transcoding, Financial Computing, Database analytics, Networking
- > Quantum chemistry, molecular dynamics, climate and weather, Genomics
- RISC-V Accelerators in Open Source Cloud Computing

#### **Cluster HW**



| FPGA/GPU cluster | • Specifications |
|------------------|------------------|
|------------------|------------------|

#### **Cluster Configuration**

| Environment     | Description       | # Nodes |
|-----------------|-------------------|---------|
| Accel - Cerebro | 2 Alveo FPGA U200 | 3       |
| Accel - Genisys | 2 V100 GPUs       | 3       |
| Accel - Synergy | 1 Alveo FPGA U200 | 2       |
|                 | 1 V100 GPU        |         |

#### **1 Node Specifications**

Dual 12 core 3.0 GHz CPU 192 GB RAM 300 GB local storage 100 Gb EDR node interconnect 10 GbE storage network



# Software stack for the FPGA/GPU cluster



## **Edge Platform: Xilinx ZCU102**





# End-to-end Deep Learning platform

Use case



© 2019 and Reg. TM – CMC Microsystems

## Innovation for Defence Excellence and Security (IDEaS)

**Object Detection, Classification and Tracking Using Heterogeneous Computing Architectures** 



# Phase I: Training Flow



© 2019 and Reg. TM – CMC Microsystems

#### **DLRSD** dataset



#### 2100 images 256x256 pixels, 21 class labels





СМС

## Step 1 - Data preparation

## Step 2 - Model definition



## Step 3 - Solver definition

- The solver provide parameters to perform model optimisation and guide the training and testing process.
- The content of *solver\_1.prototxt* is as follow:

net: "/home/ideas/.local/install/caffe/cmcideas\_dev0/caffenet\_train\_val\_1.prototxt" test\_iter: 400 test\_interval: 500 base\_lr: 0.001 lr\_policy: "step" gamma: 0.1 stepsize: 5000 display: 20 max\_iter: 10000 momentum: 0.9 weight\_decay: 0.0005 snapshot: 2000 snapshot\_prefix: "/home/ideas/.local/install/caffe/cmcideas\_dev0/caffe\_model\_1" solver\_mode: GPU



## Step 4 - Model training

At this step, we are ready to train the model by executing the following CAFFE command from the terminal:

>caffe train solver /home/ideas/.local/install/caffe/cmcideas\_dev0/solver\_1.prototxt 2>&1 | tee /home/ideas/.local/install/caffe/cmcideas\_dev0/train.log



>python /home/ideas/.local/install/caffe/cmcideas\_dev0/plot\_learning\_curve.py
/home/ideas/.local/install/caffe/cmcideas\_dev0/train.log
/home/ideas/.local/install/caffe/cmcideas\_dev0/learning\_curve.png



## **Transfer Learning**

Concept: Instead of training the network from scratch, transfer learning trains an already trained model on a different dataset.



• validation accuracy: <u>~85%</u>, after 4000 iterations.



• validation accuracy: <u>~98%</u>, after 1000 iterations.



# Phase II: Inference Flow



© 2019 and Reg. TM – CMC Microsystems

## Xilinx DNNDK

- Full-stack SDK for the Deep-learning Processor Unit (DPU)
- Supports CNN quantization, compilation, optimization and runtime support
- Network pruning supported by separate license
- Supports Caffe and TensorFlow
- Freely downloaded from Xilinx (registration required)
- Compatible with existing Xilinx tools/flows (Vivado, SDSoC)
- Supported evaluation boards:
  - ZCU102
  - ZCU104
  - Ultra96

Caffe Framework TensorFlow Models Model Zoo Al Model Pruning and Optimization Al Model Quantizer Software Edge Compiler Edge Runtime **Hardware Overlay** Edge AI DSA (CNN) (DSA) Board Xilinx Edge Boards Custom Silicon Zynq



# Build Hardware and Application Projects in the SDSoC Development Environment



## **Implementation results: Xilinx Vivado**



### Run the application on ZCU102

| • • •            | 📓 dev — screen /dev/tty.SLAB_USBtoUART 115200 + SCREEN — 138×47                                 |
|------------------|-------------------------------------------------------------------------------------------------|
| [ 9.398413] x:   | ilinx-vphy a0000000.vphy: probed                                                                |
| [ 9.412404] V    | Phy version : 02.02 (0000)                                                                      |
|                  | p159 3-005e: probe successful                                                                   |
| 9.420766] x:     | ilinx-vphy a0000000.vphy: probe successful                                                      |
| 9.428894] x:     | ilinx-hdmi-rx a1000000.hdmi_rxss: probed                                                        |
| 9.434176] x      | vphy_phy_init(fffffc87b11f800).                                                                 |
| 9.438593] x1     | vphy_phy_init(fffffc87b19a000).                                                                 |
|                  | vphy_phy_init(fffffc87bb21c00).                                                                 |
|                  | ilinx-hdmi-rx a1000000.hdmi_rxss: Direct firmware load for xilinx/xilinx-hdmi-rx-edid.bin faile |
|                  | ilinx-hdmi-rx a1000000.hdmi_rxss: Using Xilinx built-in EDID.                                   |
| 9.473275]        |                                                                                                 |
|                  | uccessfully loaded edid.                                                                        |
|                  | ilinx-video amba:vcap_hdmi: Entity type for entity a1000000.hdmi_rxss was not initialized!      |
|                  | ilinx-hdmi-rx a1000000.hdmi_rxss: probe successful                                              |
|                  | lnx-drm-hdmi a0080000.hdmi_txss: probed                                                         |
|                  | lnx-drm-hdmi a0080000.hdmi_txss: hdmi tx audio disabled in DT                                   |
|                  | lnx-drm-hdmi a0080000.hdmi_txss: probe successful                                               |
|                  | drm] Supports vblank timestamp caching Rev 2 (21.10.2013).                                      |
|                  | drm] No driver support for vblank timestamp query.                                              |
|                  | lnx-drm xlnx-drm.0: bound b00c0000.v_mix (ops 0xffffff8008b33eb8)                               |
|                  | lnx-drm xlnx-drm.0: bound a0080000.hdmi_txss (ops xlnx_drm_hdmi_component_ops [xilinx_hdmi_tx]) |
|                  | drm] Cannot find any crtc or sizes                                                              |
|                  | Inx-mixer b00c0000.v_mix: fb0: frame buffer device                                              |
|                  | drm] Initialized xlnx 1.0.0 20130509 for b00c0000.v_mix on minor 1                              |
|                  | t superserver: inetd.<br>ages on first boot                                                     |
|                  | ages on first boot<br>several minutes. Please do not power off the machine.)                    |
|                  | /etc/rpm-postinsts/100-xserver-nodm-init                                                        |
|                  | /etc/rpm-postinsts/101-sysvinit-inittab                                                         |
|                  | c/init.d/run-postinsts exists during rc.d purge (continuing)                                    |
| IT: Entering ru  |                                                                                                 |
|                  | ork interfaces [ 10.221239] pps pps0: new PPS source ptp0                                       |
|                  | ach ff0e0000.ethernet: gem-ptp-timer ptp clock registered.                                      |
|                  | Voc: ADDRCONF(NETDEV_UP): eth@: link is not ready                                               |
| hcpc (v1.24.1)   |                                                                                                 |
| nding discover.  |                                                                                                 |
| nding discover   |                                                                                                 |
| nding discover.  |                                                                                                 |
|                  | to background                                                                                   |
| ne.              | g to boonground                                                                                 |
|                  | message bus: dbus.                                                                              |
|                  | v SSH server: dropbear.                                                                         |
| tarting syslogd  |                                                                                                 |
| arting tcf-age   |                                                                                                 |
|                  |                                                                                                 |
| etting console : | loglevel to 0                                                                                   |
| oot@xilinx:~#    |                                                                                                 |
|                  |                                                                                                 |





#### A Unified Design Flow for Advanced Computing Platforms





Yassine Hariri <u>Hariri@cmc.ca</u>





© 2020 and Reg. TM – CMC Microsystems