

### Accelerating Deep Learning for Embedded Vision at the Edge

May 22, 2019

Note: participants are muted upon entering the webinar. Please use the chat feature to ask questions.

Hugh Pollitt-Smith, CMC Microsystems, Pollitt-smith@cmc.ca

© 2019 CMC Microsystems





- Demonstrate an integrated Machine Learning (ML) training and inference flow utilizing tools and hardware available to CNDN
  - Xilinx SDSoC/DNNDK
  - CMC HPP/HCC
  - Xilinx ZCU102 Development Kit
- Exploit reconfigurable, heterogeneous processing
- Builds on previous webinar, *Accelerating Deep Learning for Vision Using CAFFE* (February 27, 2019), posted on CMC's YouTube channel

Note: this work was undertaken through National Defence Innovation for Defence Excellence and Security (IDEaS) competitive project

# Agenda



- CMC Microsystems
- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A





### CMC Microsystems

- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A

# What is CMC and what is its role?



- Not for profit federally incorporated 1984
- Manages Canada's National Design Network<sup>®</sup>
- Delivers micro-nano innovation capabilities across Canada









#### Canada's National Design Network

A Canada-wide collaboration between **66** universities/colleges to connect **10,000** academic participants with **950** companies to design, make and test micro-nanosystem prototypes. CMC Microsystems manages Canada's National Design Network<sup>®</sup>.



# Measured outcomes, published annually

**Research Outcomes for 2017:** 

1662 journal publications

2116 other publications

53 national awards

57 international awards

443 graduate student courses

606 undergraduate student courses

#### **Commercialization Outcomes for 2017:**

14 startup companies

160 patents (applied for/issued)

27 licences

442 interactions with industry in Canada, valued at  $\$21.9\mbox{M}$ 

57 interactions with foreign industry, valued at \$4.5M

Value to industry is measured in research collaborations, transfer of highly qualified people, and direct company access of tools and technologies for research collaborations.

#### 700 highly trained researchers joined industry in 2017

#### LOWERING BARRIERS TO TECHNOLOGY ADOPTION







# Services for making working prototypes

- Selection of high-performance Computer Aided Design (CAD) tools and design environments
- Available via desktop or through CMC Cloud
- User guides, application notes, training materials and courses

#### CMC.ca/CAD



| /           |   |   |   |   |  |
|-------------|---|---|---|---|--|
| /           | G |   |   |   |  |
|             |   |   |   |   |  |
|             |   |   |   |   |  |
|             | ŀ | - | - | - |  |
|             | Ŀ | - | - | - |  |
| $^{\prime}$ |   |   |   |   |  |

# Services for making working prototypes

- Multi-project wafer services with affordable access to foundries worldwide
- Fabrication and travel assistance to prototype at a university-based lab
- Value-added packaging and assembly services
- In-house expertise for first-time-right prototypes

CMC.ca/FAB





Device validation to system demonstration

- Access to platform-based microsystems design and prototyping environments
- Access to test equipment on loan
- Access to contract engineering services

#### CMC.ca/LAB

#### ENGAGING STRATEGICALLY in Canada and worldwide



#### 15 ..... 0000 ... 00 000 ... .... ----------... ..... -----..... -----...... 000 \_\_\_\_\_ -----. 000 0 000 .......... 000000 . ........ 000000 000000 ....... 000000 0000000 00000 00 -\_\_\_\_\_ 000000000 000000 00000 0000 000000000 0000 ... 00000000 0000 000 000 ... 0 ... ... 000

#### Strategic Engagements, Global Partners

| North America                                                                              | Europe 1 Co-operative Initiative                |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------|-------------------------------------------------|--|--|--|--|--|--|
| 1. Canada<br>14 CAD   8 FAB   13 LABs<br>19 Systems & Components<br>42 University MNT LABS | 3. Ireland1 FAB4. UK1 CAD1 Systems & Components |  |  |  |  |  |  |
| 2. USA 1 Co-operative Initiative<br>15 CAD   5 FAB   11 LABs<br>8 Systems & Components     | 5. France 2 FAB   1 Co-operative<br>Initiative  |  |  |  |  |  |  |
|                                                                                            | 6. Sweden 1 CAD                                 |  |  |  |  |  |  |
|                                                                                            | 7. Netherlands 2 FAB                            |  |  |  |  |  |  |
| Asia                                                                                       | 8. Belgium 1 FAB                                |  |  |  |  |  |  |
| 11. China <b>1 Co-operative Initiative</b>                                                 | 9. Germany 1 CAD   2 FAB                        |  |  |  |  |  |  |
| 12. South Korea <b>1 Co-operative</b><br>Initiative                                        | 10. Austria 1 FAB                               |  |  |  |  |  |  |
| 13. Taiwan 1 Co-operative Initiative<br>2 FAB   1 LAB                                      |                                                 |  |  |  |  |  |  |
| 2 Systems & Components                                                                     |                                                 |  |  |  |  |  |  |
| 14. Japan <b>1 Co-operative Initiative</b>                                                 |                                                 |  |  |  |  |  |  |
| 15. Singapore 3 FAB                                                                        |                                                 |  |  |  |  |  |  |

#### Canada's National Design Network Academic Landscape 2017-2018





© 2018 CMC Microsystems. All rights reserved.

#### 10

#### **CADnet:** Canadian Design Network for Circuits and Systems



- \$20M infrastructure project targeting IF 2020 comp 'tion to I community.cmc.calcommunity extend access to CAD tools in the 2021-207F **`e** 
  - 40 participating institutions, 750+ faculty
- Key infrastructure:
  - CAD tools
  - Centralized servers
  - Next-generation
  - Design
- NOI
- Unfrastructure-fund-project Propo

Less via equipment loan)

# Agenda



- CMC Microsystems
- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A

### AI and Machine Learning



- Machine learning is programming computers to optimize a performance criterion using example data or past experience
- Transforming many industries
- Exploding ecosystem of tools and platforms





- Sense, reason, act and adapt
- ML: Machine Learning
  - Algorithm that improve as they are exposed to data over time

What can I help you with?

- DL: Deep Learning
  - Multilayered neural networks learn from vast amounts of data

Source: What's the Difference Between Artificial Intelligence (AI), Machine Learning, and Deep Learning? by <u>Glenn Evan Touger</u>



### **Sensor Fusion, Edge Al**



- AI/ML processing increasingly moving to the edge, close to the sensors:
  - Sensor fusion
  - Low latency, fast response
  - Power-constrained
  - Harsh environment



Source: https://towardsdatascience.com/sensor-fusion-90135614fde6

#### **Heterogeneous Systems Architecture**





Deep Learning on the HPP/HCC:

- Objective: Accelerating Deep Learning on:
  - Programmable logic (FPGAs)
  - GPGPUs

Deep Learning on the ZCU102:

- Objective: Accelerating Inference:
  - Multicore ARM processors
  - Programmable logic (FPGAs)
  - Embedded GPU, real-time processor

# Advantages of Reconfigurable, Heterogeneous Computing



- Flexibility to target parts of algorithm in software and parts in reconfigurable hardware or specialized accelerators to achieve performance speedups, low latency and power efficiency while maintaining software programmability
- Scale-up/down to meet higher performance or lower power requirements by selecting same family of chips with more/fewer resources while maintaining the same design/architecture
- In-field modifications with no changes to equipment:
  - New network models
  - New applications
  - Better-trained models
  - Continuous learning

### **Convolutional Neural Network (CNN) Training and Inference Development Flow**





### **Application**



- Video-based object classification and detection:
  - Video input (e.g., camera)
  - Identify multiple objects in each from a library of classes
  - Mark each detected object with a colour-coded bounding box
  - Output processed frames with bounding boxes





Accelerating Deep Learning for Vision Using CAFFE

### **SSD: Single Shot MultiBox Detector**



- SSD takes one single forward pass through the network to detect multiple objects within an image
- For object detection, outputs classification ID/confidence and location coordinates for a bounding box containing the object
- <u>https://towardsdatascience.com/understanding-ssd-multibox-real-time-object-detection-indeep-learning-495ef744fab</u>



#### **PASCAL Visual Object Classes (VOC) Dataset**

et CMC MICROSYSTEMS

- <u>http://host.robots.ox.ac.uk/pascal/VOC/</u>
- Standardised image data sets for object class recognition
- >20,000 images containing >45,000 annotated objects



#### **PASCAL VOC Classes**



- 1. Background
- 2. Aeroplane
- 3. Bicycle
- 4. Bird
- 5. Boat
- 6. Bottle
- 7. Bus
- 8. Car
- 9. Cat
- 10. Chair
- 11. Cow

- 12. Dining Table
- 13. Dog
- 14. Horse
- 15. Motorbike
- 16. Person
- 17. Potted Plant
- 18. Sheep
- 19. Sofa
- 20. Train
- 21. TV Monitor





- CMC Microsystems
- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A

#### Hardware



#### • Training:

- Synodic workstation, 2x Intel E5-2630v2 CPU, NVIDIA Tesla K40, Ubuntu 16.04
- Colfax ProEdge SXT9700, 2x Intel Xeon Bronze 3104 CPU, NVIDIA Tesla Pascal P100, Ubuntu 18.04
- Coming soon: CMC Heterogeneous Compute Cluster (HCC)

#### • Inference:

- Xilinx Zynq Ultrascale+ MPSoC ZCU102 Development Kit
- Leopard Imaging LI-IMX274MIPI-FMC camera
- HDMI monitor

#### Software



- Xilinx DNNDK (Deep Neural Network Development Kit) v2.08
  - SDSoC 2018.3
  - DNNDK for SDSoC
  - ZCU102 SDSoC Revision Stack for DNNDK
- Caffe v1.0
- CUDA v8.0
- CuDNN v7.0.5
- NCCL v1.2.3

https://github.com/Xilinx/Edge-AI-Platform-Tutorials/tree/master/docs/ML-SSD-PASCAL

### Xilinx DNNDK v2.08



- Full-stack SDK for the Deep-learning Processor Unit (DPU)
  - Supports CNN quantization, compilation, optimization and runtime support
  - Network pruning supported by separate license
  - Supports Caffe framework
- Freely downloaded from Xilinx (registration required)
- Compatible with existing Xilinx tools/flows (Vivado, SDSoC)
- Supported evaluation boards:
  - ZCU102
  - ZCU104
  - Ultra96
- DNNDK v3.0 support for TensorFlow added

## Xilinx Deep-learning Processor Unit (DPU)



- Co-processor/overlay for Zynq embedded ARM cores
- Supports commonly used network layers, using hardware acceleration from the underlying FPGA architecture
- DPU hardware generated from SDSoC project
- Supports multi-threading, up to 4 DPU core on chip (limited by available FPGA resources)



Courtesy: Xilinx Inc.

### **DNNDK: network pruning**





Source: <u>https://www.xilinx.com/publications/events/developer-forum/2018-frankfurt/machine-learning-for-embedded-deep-dive.pdf</u>





#### https://github.com/Xilinx/Edge-AI-Platform-Tutorials

|                                    |                                                   | 🚔 g                                                                     | ithub.com                  | Ċ                                              | 1                                    |       |
|------------------------------------|---------------------------------------------------|-------------------------------------------------------------------------|----------------------------|------------------------------------------------|--------------------------------------|-------|
| https://www Edge-Al-Pla Understand | DIGITS/pasc 2019 Global                           | Global Al Tal Book Your A                                               | Edge-AI-Pla https://xilinx | 40G/50G Ev Al Develope                         | https://www https://www xilinx zcu10 | ) +   |
| Search or jump to                  | 7 Pull requests Issues Marketp                    | lace Explore                                                            |                            |                                                | <b>▲</b> +-                          | - 🔛 - |
|                                    | Xilinx / Edge-Al-Platform-                        | Tutorials                                                               | <b>⊙</b> Watc              | h <b>▼</b> 5 <b>★</b> Star 42 <b>%</b> Fork 23 | 3                                    |       |
|                                    | <> Code () Issues (4) (1) I                       | Pull requests 2 III Projects 0                                          | 🗉 Wiki 🔟 Insights          |                                                |                                      |       |
|                                    | Branch: master - Edge-Al-Plat                     | form-Tutorials / docs / ML-SSD-                                         | PASCAL / Create            | new file Upload files Find file Histor         | y .                                  |       |
|                                    | 🚊 ErinTruax Added ML SSD PASCAL                   | Tutorial                                                                |                            | Latest commit 731f45b on Feb 2                 | 5                                    |       |
|                                    |                                                   |                                                                         |                            |                                                |                                      |       |
|                                    | PDF                                               | Added ML SSD                                                            | PASCAL Tutorial            | 2 months ag                                    |                                      |       |
|                                    | dataset_files/voc0712                             | Added ML SSD                                                            | PASCAL Tutorial            | 2 months ag                                    | 0                                    |       |
|                                    | ref_training                                      | Added ML SSD                                                            | PASCAL Tutorial            | 2 months ag                                    | 5                                    |       |
|                                    | .gitattributes                                    | Added ML SSD                                                            | PASCAL Tutorial            | 2 months ag                                    |                                      |       |
|                                    | README.md                                         | Added ML SSD                                                            | PASCAL Tutorial            | 2 months ag                                    |                                      |       |
|                                    | III README.md                                     |                                                                         |                            |                                                |                                      |       |
|                                    |                                                   |                                                                         |                            |                                                |                                      |       |
|                                    |                                                   | V                                                                       |                            |                                                |                                      |       |
|                                    |                                                   |                                                                         | ILINX.                     |                                                |                                      |       |
|                                    |                                                   | Edge A                                                                  | l Tutorials                |                                                |                                      |       |
|                                    |                                                   | ML-SSD-PASC                                                             | CAL-Caffe-Tutorial         |                                                |                                      |       |
|                                    | The following is a tutorial contains 20 classes). |                                                                         |                            |                                                |                                      |       |
|                                    |                                                   | s tutorial include how to train, qua<br>ramework and the DNNDK tools, t |                            |                                                |                                      |       |

#### **Resources**



#### https://forums.xilinx.com/t5/Deephi-DNNDK/bd-p/Deephi

| •• (        |               |                  |                                        |                  |                      |               | 🗎 foru                                     | ıms.xilinx.com      |                   | Ċ                                                                       |                 |                 |               | ( †            |  |  |
|-------------|---------------|------------------|----------------------------------------|------------------|----------------------|---------------|--------------------------------------------|---------------------|-------------------|-------------------------------------------------------------------------|-----------------|-----------------|---------------|----------------|--|--|
| tps://      | Edge-Al- Unde | rstand           | DIGITS/pas                             | 2019 Global      | Global Al Ta         | Book Your A   | Xilinx/Edge                                | https://xilinx      | 40G/50G Ev        | Al Develope                                                             | https://www     | https://www     | Deephi DNN    | xilinx zcu10 + |  |  |
|             |               |                  | 🛛 🔀 XIL                                | INX              | Applicatio           | ns            | Products                                   | Developers          | Support           | About                                                                   |                 | RQ              |               |                |  |  |
|             | ≡             |                  |                                        |                  |                      |               |                                            |                     |                   |                                                                         |                 |                 | Help Sign In  | ٢              |  |  |
|             | Deep          | hi DN            | INDK                                   |                  |                      |               |                                            |                     |                   |                                                                         |                 |                 |               |                |  |  |
|             | This boa      | rd ~             | Q Search all co                        | ntent            |                      |               |                                            |                     |                   |                                                                         |                 |                 |               |                |  |  |
|             | Communi       | ity Forums       | › Forums › App                         | lications > Deep | ohi DNNDK            |               |                                            |                     |                   |                                                                         |                 |                 |               | 0<br>0         |  |  |
|             | Announcements |                  |                                        |                  |                      |               |                                            |                     | Community Browser |                                                                         |                 |                 |               |                |  |  |
|             |               |                  |                                        |                  |                      |               | is a resource to as<br>ng AI applications. | k and learn about u | ising Deephi      | <ul> <li>Community Forums</li> <li>Blogs</li> </ul>                     |                 |                 |               |                |  |  |
|             | DIVIDING      | in an supp       | onted platforms, i                     |                  |                      | 100010001     | ng Ai applications.                        |                     |                   | ✓ C⇒ Forums                                                             |                 |                 |               |                |  |  |
|             |               |                  |                                        |                  |                      |               |                                            |                     |                   | • 6                                                                     | About Our Corr  | nmunity         |               |                |  |  |
|             |               | ent Thread       |                                        | nmunity Forums   | Guidelines or to     | pet started s | ee our Community                           | Forum Help.         |                   | <ul> <li>▷ Alveo™ and Boards</li> <li>▷ Programmable Devices</li> </ul> |                 |                 |               |                |  |  |
|             | Delete ye     | a poor, pro      |                                        |                  |                      | gorolaitoa o  |                                            | rorannioip.         |                   |                                                                         |                 |                 |               |                |  |  |
|             |               |                  |                                        |                  |                      |               |                                            | ▼ (=                | Applications      |                                                                         |                 |                 |               |                |  |  |
|             | Discussions   |                  |                                        |                  | Post                 | a Question    | Q                                          | Xilinx ML Suite     |                   |                                                                         |                 |                 |               |                |  |  |
|             |               |                  |                                        |                  |                      |               |                                            |                     | 🔉 Deephi DNNDK    |                                                                         |                 |                 |               |                |  |  |
|             | 100           |                  | sktop-stretch.im                       |                  |                      |               |                                            | <b>心</b> 0          | Q 5               | • 🖻                                                                     | Design Tools    |                 |               |                |  |  |
|             |               | by 🚺 @xx o       | on 03-20-2019 06:25                    | PM • Latest post | on 05-09-2019 08:3   | 4 AM by 📙 @   | xx                                         |                     |                   | • 🖻                                                                     | Embedded Sys    | tems            |               |                |  |  |
| donalis das |               | esn't pruning,ca | n dandk work a                         | fter i pruni     |                      |               |                                            |                     | • 🖻               | Intellectual Prop                                                       | perty           |                 |               |                |  |  |
|             | 10~           |                  | on 05-09-2019 08:27                    |                  | nor i prum           |               |                                            | <b>心</b> 0          | $\mathcal{O}$ 0   | • 🕞 🤴                                                                   | 表思中文社区论         | 坛               |               |                |  |  |
|             |               |                  |                                        |                  |                      |               |                                            |                     |                   | Q Ope                                                                   | n Source and Co | mmunity Project | ts Discussion |                |  |  |
|             | Mail Lee      |                  | EO out put on ZC<br>Iram1a on 04-10-20 |                  | est post on 05-09-20 | 019 07:41 AM  | by 🏌 terryo                                | <b>ப</b> ி 1        | Q 5               | Top Kudo                                                                | oed Posts       |                 |               |                |  |  |
|             |               |                  |                                        |                  |                      |               |                                            |                     |                   | SUBJECT                                                                 | -               |                 | KUD           | OOS            |  |  |

#### **Xilinx ZCU102 Features**



- USB Mouse, Keyboard, Webcam, Hub and Adapter Monitor (DisplayPort) Õ ........ -----USB-UART & Laptop THITTI Boot Mode Switch Rev 1.0 Rev B/C/D Off SD Card Courtesy: Xilinx Inc. Power
- Xilinx Zynq Ultrascale+ MPSoC (ZU9EG)
  - Quad-core ARM A53
  - Dual-core ARM R5
  - ARM GPU
  - 16nm FinFET+ programmable logic
- 4GB 64-bit DDR4 (processor)
- 512MB 16-bit DDR4 (FPGA)
- 2x FMC-HPC connectors
- HDMI video input and output
- DisplayPort video output
- SD Card
- Push buttons, DIP switches, LEDs
- USB UART
- Ethernet

### Xilinx ReVISION Stack



- Pre-built platform for algorithm and application development for embedded vision on Xilinx boards (e.g., ZCU102)
- Video capture and sink pipelines
- xfOpenCV library: acceleration-ready OpenCV functions
- PetaLinux BSP
- Design examples (machine vision, CNN)



#### Gstreamer



- Modular, open framework for creating streaming multimedia applications
- Individual processing elements (sources, sinks, filters) are called *plugins*
- In an application, plugins are linked and arranged into *pipelines*
- Pipelines can be constructed/executed within application (e.g., C/C++, Python) at the command line or through *gst-launch-1.0*
- Large library of plugins available (good, bad, ugly)
- Supported in Xilinx reVISION Stack Linux kernel







- CMC Microsystems
- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A

#### **Inference Development Flow**





#### Step 1: Install the Caffe tool for SSD



- Nvidia libraries/drivers
- CUDA v8.0
- CuDNN v7.0.5
- NCCL v1.2.3
- SSD Caffe

#### **Step 2: Prepare the dataset and database**



- Download pre-trained VGG network files VGG\_ILSVRC\_16\_layers\_fc\_reduced\_deploy.prototxt (description) VCC\_ILSVRC\_16\_layers\_fc\_reduced.caffemodel (weights)
- Download PASCAL VOC dataset

http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval 11-May-2012.tar http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval 06-Nov-2007.tar http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest 06-Nov-2007.tar

• Create training validation database and test database files

create\_list.sh → VOC0712\_test\_lmdb/data.mdb
create\_data.sh → VOC0712\_trainval\_lmdb/data.mdb

• Execute python script to add SSD framework layers to VGGNet

ssd\_pascal.py > solver.prototxt, deploy.prototxt, test.prototxt, train.prototxt

# **Step 3: Train the SSD Network**



- Modify SSD Prototxt files for compatibility with DPU/DNNDK
- Run training script (will execute caffe train):

\$CAFFE\_ROOT/jobs/VGGNet/VOC0912/SSD\_300x300/VGG\_VOC0912\_SSD\_300x300.sh

#### Runs 120,000 training iterations:

- Synodic Tower with NVIDIA Tesla K40: 6 days
- Colfax ProEdge SXT9700 with NVIDIA Tesla Pascal P100: 2 days

```
ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd/models/VGGNet/VOC0712/SSD_300x300$ ls -al
                                                 Train net output #0: mbox_loss = 2.31304 (* 1 = 2.31304 loss)
I0422 14:22:34.587582 27953 solver.cpp:259]
I0422 14:22:34.587594 27953 sgd_solver.cpp:138] Iteration 119990, lr = 1e-05
I0422 14:23:14.649554 27953 solver.cpp:596] Snapshotting to binary proto file models/VGGNet/VOC0712/SSD 300x3
00/VGG VOC0712 SSD 300x300 iter 120000.caffemodel
10422 14:23:14.991468 27953 sqd solver.cpp:307] Snapshotting solver state to binary proto file models/VGGNet/
VOC0712/SSD 300x300/VGG VOC0712 SSD 300x300 iter 120000.solverstate
10422 14:23:15.576786 27953 solver.cpp:332] Iteration 120000, loss = 2.33714
I0422 14:23:15.576824 27953 solver.cpp:433] Iteration 120000. Testing net (#0)
I0422 14:23:15.576895 27953 net.cpp:693] Ignoring source layer mbox_loss
                                                   Test net output #0: detection eval = 0.762029
10422 14:27:30.300951 27953 solver.cpp:546]
I0422 14:27:30.301141 27953 solver.cpp:337] Optimization Done.
10422 14:27:30.301154 27953 caffe.cpp:254] Optimization Done.
ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd/jobs/VGGNet/VOC0712/SSD_300x300$
-rw-rw-r-- 1 ideasuser ideasuser 97443334 Apr 20 13:27 VGG_VOC0712_SSD_300x300_iter_80000.solverstate
ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd/models/VGGNet/V0C0712/SSD_300x300$
```

# **Step 4: Evaluate the Floating Point Network**



Command:

#### \$CAFFE\_DIR/evaluation/score.sh

#### 😕 😑 💷 ideasuser@ideasubuntu16: ~/Caffe-SSD/caffe-ssd

| Weasuser@ideasubulicatio/carre-sso                                                   |
|--------------------------------------------------------------------------------------|
| 10509 14:03:10.999877 27352 net.cpp:228] relu3_1 does not need backward computation. |
| 10509 14:03:10.999883 27352 net.cpp:228] conv3_1 does not need backward computation. |
| 10509 14:03:10.999891 27352 net.cpp:228] pool2 does not need backward computation.   |
| 10509 14:03:10.999897 27352 net.cpp:228] relu2_2 does not need backward computation. |
| 10509 14:03:10.999903 27352 net.cpp:228] conv2_2 does not need backward computation. |
| 10509 14:03:10.999910 27352 net.cpp:228] relu2_1 does beduced computation            |
| 10509 14:03:10.999917 27352 net.cpp:228] conv2_1 does                                |
| 10509 14:03:10.999923 27352 net.cpp:228] pool1 does no                               |
| 10509 14:03:10.999930 27352 net.cpp:228] relu1 2 does                                |
| 10509 14:03:10.999938 27352 net.cpp:228] conv1 2 does                                |
| 10509 14:03:10.999943 27352 net.cpp:228] relu1_1 does                                |
| 10509 14:03:10.999950 27352 net.cpp:228] conv1 1 does                                |
| 10509 14:03:10.999958 27352 net.cpp:228] data_data_0                                 |
| 10509 14:03:10.999965 27352 net.cpp:228] data_does_no                                |
| 10509 14:03:10.999972 27352 net.cpp:270] This network                                |
| 10509 14:03:11.000078 27352 net.cpp:283] Network init                                |
| Could not create logging file: No such file or directo                               |
| COULD NOT CREATE A LOGGINGFILE 20190509-140311.27352!                                |
| lding done.                                                                          |
| 10509 14:03:11.006572 27352 caffe.cpp:155] Finetuning                                |
| 300x300_iter_120000.caffemodel                                                       |
| 10509 14:03:11.083011 27352 upgrade_proto.cpp:77] Atte                               |
| rams: models/VGGNet/V0C0712/SSD_300x300/VGG_V0C0712_S                                |
| 10509 14:03:11.083052 27352 upgrade_proto.cpp:80] Suc                                |
| rams.                                                                                |
| 10509 14:03:11.157622 27352 upgrade_proto.cpp:77] Atte                               |
| rams: models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_S                                |
| 10509 14:03:11.157660 27352 upgrade_proto.cpp:80] Succession                         |
| rams.                                                                                |
| 10509 14:03:11.180361 27352 net.cpp:761] Ignoring sou                                |
| 10509 14:03:11.180789 27352 caffe.cpp:251] Starting 0                                |
| 10509 14:03:11.180804 27352 solver.cpp:294] Solving V                                |
| 10509 14:03:11.180811 27352 solver.cpp:295] Learning F                               |
| 10509 14:03:11.622129 27352 solver.cpp:332] Iteration                                |
| 10509 14:03:11.622176 27352 solver.cpp:433] Iteration                                |
| 10509 14:03:11.639858 27352 net.cpp:693] Ignoring sou                                |
| 10509 14:07:25.647696 27352 solver.cpp:546] Test 1                                   |
| 10509 14:07:25.647825 27352 solver.cpp:337] Optimization cone.                       |
| I0509 14:07:25.647835 27352 caffe.cpp:254] Optimization Done.                        |
| ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd\$                                      |
|                                                                                      |

# Step 5: Quantize the SSD Network with DECENT



- Input files: float.caffemodel, float.prototxt, calibration dataset (100-1000 images)
- Output files: deploy.caffemodel, deploy.prototxt
- Default bitwidth: 8 (currently only supported by DPU)
- Command:

```
decent quantize
```

-model \${model\_dir}/float.prototxt

-weights \${model\_dir} float.caffemodel

-output\_dir \${output\_dir} -gpu 0 -auto\_test

```
10509 13:18:13.569810 26696 net_test.cpp:207] Test iter: 46/50
10509 13:18:13.978898 26696 net_test.cpp:207] Test iter: 47/50
10509 13:18:14.402451 26696 net_test.cpp:207] Test iter: 48/50
10509 13:18:14.814522 26696 net_test.cpp:207] Test iter: 49/50
10509 13:18:15.209785 26696 net_test.cpp:207] Test iter: 50/50
10509 13:18:15.218399 26696 net_test.cpp:254] Test Results:
10509 13:18:15.218410 26696 net_test.cpp:255] Test net output #0: detection_eval = 0.790698
10509 13:18:15.218444 26696 net_test.cpp:387] Test Done!
10509 13:18:15.913386 26696 decent.cpp:331] Test Done!
0utput Deploy Weights: "/home/ideasuser/Caffe-SSD/caffe-ssd/DNNDK_Project/decent_output/deploy.caffemodel"
0utput Deploy Model: "/home/ideasuser/Caffe-SSD/caffe-ssd/DNNDK_Project/decent_output/deploy.prototxt"
```

# Step 6: Compile the SSD Network with DNNC



- Output file: dpu\_ssd.elf
- Command:

```
dnnc --prototxt=${model_dir}/deploy.prototxt \
    --caffemodel=${model_dir}/deploy.caffemodel
    --net_name=ssd \
    --dpu=4096FA \
    --cpu_arch=arm64 \
    --abi=0
```

| DNNC Kernel Inf | formation                                      |  |
|-----------------|------------------------------------------------|--|
| 1. Overview     |                                                |  |
| kernel numbers  | : 1                                            |  |
| kernel topology | / : ssd_kernel_graph.jpg                       |  |
| 2. Kernel Descr | iption in Detail                               |  |
| kernel id       | : 0                                            |  |
| kernel name     | : ssd                                          |  |
| type            | : DPUKernel                                    |  |
| nodes           | : NA                                           |  |
| input node(s)   |                                                |  |
| output node(s)  | : mbox_loc(0) mbox_conf(0)                     |  |
| ideasuser@ideas | subuntu16:~/Caffe-SSD/caffe-ssd/DNNDK ProjectS |  |

# Step 7: Compile .elf to shared library



- Input file: dpu\_ssd.elf
- Output file: libdpumodelssd.so
- Command:

aarch64-linux-gnu-gcc –fPIC –shared dpu\_ssd.elf –o libdpumodelssd.so

ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd/DNNDK\_Project/dnnc\_output\$ aarch64-linux-gnu-gcc -fPIC -shared dpu\_ssd.elf -o libdpumodelssd.so ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd/DNNDK\_Project/dnnc\_output\$ ls -al total 48528 drwx----- 2 ideasuser ideasuser 4096 May 9 13:33 . drwxrwxr-x 6 ideasuser ideasuser 4096 May 9 13:17 .. -rw-rw-r- 1 ideasuser ideasuser 24834008 May 9 13:25 dpu\_ssd.elf -rwxrwxr-x 1 ideasuser ideasuser 24850256 May 9 13:33 libdpumodelssd.so ideasuser@ideasubuntu16:~/Caffe-SSD/caffe-ssd/DNNDK\_Project/dnnc\_output\$

## **Step 8: Build Hardware and Application Projects in the SDSoC Development Environment**





## **Implementation results: Xilinx Vivado**



| A                                                                         | prj - [/eng/home/hugh/ENGD                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | EV/xilinx/SDx_2018.2/                 | ONNDK/xil  | inx_dnn | dk_v2.08_f  | or_sdsoc/d | nndk_ws/dp  | ucore_zu9/De        | bug/_sds      | /p0/vivad                   | o/prj/prj.: | xpr] - Vivad           | lo 2018.3           |          | _ = >                       |
|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|------------|---------|-------------|------------|-------------|---------------------|---------------|-----------------------------|-------------|------------------------|---------------------|----------|-----------------------------|
| <u>F</u> ile <u>E</u> dit F <u>l</u> ow <u>T</u> ools Rep <u>o</u> rts    | lit Flow Tools Reports Window Layout View Help Q-Quick Access                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                       |            |         |             |            |             |                     |               | ion Out-of-date 🛛 details 🚽 |             |                        |                     |          |                             |
|                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             | 📰 Default Layout 🛛 🗸 🗸 |                     |          |                             |
| Flow Navigator 😤 🔶 ? _ IMPLEMENTED DESIGN - xczu9eg-ffvb1156-2-e (active) |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     | ?>            |                             |             |                        |                     |          |                             |
| ✓ PROJECT MANAGER                                                         | Sources Netlist ×                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | ? _ 🗆 🖆 Pro                           | ect Summan | v × D   | evice ×     |            |             |                     |               |                             |             |                        |                     |          | 2 🗆 🖸                       |
| 🏟 Settings                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     | 0             |                             |             |                        |                     |          |                             |
| Add Sources                                                               | X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X       X |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Language Templates                                                        | > 🚔 Nets (86)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                       |            |         |             |            | Developmen  |                     |               |                             |             | 7                      |                     |          |                             |
| 👎 IP Catalog                                                              | > 🚍 Leaf Cells (20)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                       |            |         |             |            |             |                     | t ti ti ti ti | *****                       |             |                        |                     |          |                             |
| V IP INTEGRATOR                                                           | I zcu102_rv_ss_i (zcu102_rv_ss)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             | ×016                |               |                             |             |                        |                     |          |                             |
| Create Block Design                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Open Block Design                                                         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Generate Block Design                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
|                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             | ×0r4                |               | <u>174</u>                  |             |                        |                     |          |                             |
| ✓ SIMULATION                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Run Simulation                                                            | · · · · · · · · · · · · · · · · · · ·                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                       |            |         |             |            |             | х <mark>о</mark> гз |               | <u>113. (21</u> 3           | patel,      | _                      |                     |          |                             |
|                                                                           | Properties                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | ? _ 🗆 🖒 X                             |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| <ul> <li>RTL ANALYSIS</li> <li>Open Elaborated Design</li> </ul>          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | $\leftarrow   \Rightarrow   \diamond$ |            |         |             |            |             | <u>×0Y2</u>         | ×             | 192 9252                    |             |                        |                     |          |                             |
| > Open Elaborated Design                                                  |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             | II THE      |                        |                     |          |                             |
| ✓ SYNTHESIS                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             | X0Y1                | ×             | 111                         |             |                        |                     |          |                             |
| Run Synthesis                                                             | Select an object to see prop                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | perties                               |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| > Open Synthesized Design                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             | XOYO                |               | 1.70 x270                   |             |                        |                     |          |                             |
|                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| ✓ IMPLEMENTATION                                                          | Tcl Console Messages Log R                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | eports Design Runs                    | Power      | DRC M   | lethodology | Timing     |             |                     |               |                             |             |                        |                     |          | ? _ 0 6                     |
| <ul> <li>Run Implementation</li> <li>Open Implemented Design</li> </ul>   | Q   ¥   ♦   •   ≪   ▶   ≫                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Constraints Wizard                                                        | Name Constraints                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                       | WNS        | TNS     | WHS TH      | IS TPWS    | Total Power | Failed Routes       | LUT %         | FF %                        | BRAM %      | URA DSP %              | Start               | Elapsed  | Run Strategy                |
| Edit Timing Constraints                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Synthesis Out-of-date                 |            |         |             |            |             |                     | 0.00          | 0.00                        | 0.00        | 0.00 0.00              |                     |          | Vivado Synthesis Defaults   |
| C Report Timing Summary                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | Implementation Out-of-date            | 0.052      | 0.000   | 0.008 0.    | 000 0.000  | 20.728      | 0                   | 61.72         | 56.14                       | 77.52       | 0.00 53.21             | 4/15/19, 11:23 AM   | 02:37:11 | Congestion_SpreadLogic_high |
| Report Clock Networks                                                     | Gut-of-Context Module Runs     S    S    zcu102_rv_ss                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Submodule Runs Out-of-da              | a .        |         |             |            |             |                     |               |                             |             |                        | 4/15/19, 10:49 AM   | 00:33:15 |                             |
| Report Clock Networks                                                     | 2CUIO2_1V_33                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Submodule Nuns Out-01-0a              | -          |         |             |            |             |                     |               |                             |             |                        | 1, 15, 15, 10.49 AM | 00.55.15 |                             |
|                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Report Methodology                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          |                             |
| Report DRC                                                                | <                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                       |            |         |             |            |             |                     |               |                             |             |                        |                     |          | >                           |

# **Step 9: Copy files to SD Card**



- From DNNDK pre-built for ZCU102:
  - video\_cmd
  - libdputils.so→lib
  - libn2cube.so→lib
- From DNNC output: libdpumodelssd.so→lib
- From dpucore\_zcu102 project:
  - BOOT.BIN
  - image.ub
  - libdpucore.so→lib
- From gstsdxtrafficdetect project: libgstsdxtrafficdetect.so→lib

## Step 10: Boot ZCU102



| • • •           | dev — screen /dev/tty.SLAB_USBtoUART 115200 + SCREEN — 138×47                                                   |
|-----------------|-----------------------------------------------------------------------------------------------------------------|
| [ 9.398413]     | xilinx-vphy a0000000.vphy: probed                                                                               |
|                 | VPhy version : 02.02 (0000)                                                                                     |
|                 | dp159 3-005e: probe successful                                                                                  |
| 9.420766]       | xilinx-vphy a0000000.vphy: probe successful                                                                     |
| 9.428894]       | xilinx-hdmi-rx a1000000.hdmi_rxss: probed                                                                       |
|                 | xvphy_phy_init(fffffc87b11f800).                                                                                |
| 9.438593]       | xvphy_phy_init(fffffc87b19a000).                                                                                |
| 9.443044]       | xvphy_phy_init(fffffc87bb21c00).                                                                                |
| 9.455063]       | xilinx-hdmi-rx a1000000.hdmi_rxss: Direct firmware load for xilinx/xilinx-hdmi-rx-edid.bin failed with error -2 |
| 9.466246]       | xilinx-hdmi-rx a1000000.hdmi_rxss: Using Xilinx built-in EDID.                                                  |
| 9.473275]       |                                                                                                                 |
| 9.473275]       | Successfully loaded edid.                                                                                       |
| 9.478621]       | xilinx-video amba:vcap_hdmi: Entity type for entity a1000000.hdmi_rxss was not initialized!                     |
| 9.493520]       | xilinx-hdmi-rx a1000000.hdmi_rxss: probe successful                                                             |
| 9.499614]       | xlnx-drm-hdmi a0080000.hdmi_txss: probed                                                                        |
| 9.504648]       | xlnx-drm-hdmi a0080000.hdmi_txss: hdmi tx audio disabled in DT                                                  |
| 9.514695]       | xlnx-drm-hdmi a0080000.hdmi_txss: probe successful                                                              |
| 9.526397]       | [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).                                                     |
| 9.533043]       | [drm] No driver support for vblank timestamp query.                                                             |
| 9.539234]       | xlnx-drm xlnx-drm.0: bound b00c0000.v_mix (ops 0xffffff8008b33eb8)                                              |
| 9.546556]       | xlnx-drm xlnx-drm.0: bound a0080000.hdmi_txss (ops xlnx_drm_hdmi_component_ops [xilinx_hdmi_tx])                |
| 9.556337]       | [drm] Cannot find any crtc or sizes                                                                             |
| 9.647540]       | xlnx-mixer b00c0000.v_mix: fb0: frame buffer device                                                             |
| 9.680990]       | [drm] Initialized xlnx 1.0.0 20130509 for b00c0000.v_mix on minor 1                                             |
|                 | net superserver: inetd.                                                                                         |
| onfiguring pac  | kages on first boot                                                                                             |
| (This may take  | e several minutes. Please do not power off the machine.)                                                        |
| unning posting  | st /etc/rpm-postinsts/100-xserver-nodm-init                                                                     |
| unning posting  | st /etc/rpm-postinsts/101-sysvinit-inittab                                                                      |
| pdate-rc.d: /   | rtc/init.d/run-postinsts exists during rc.d purge (continuing)                                                  |
| NIT: Entering   | runlevel: 5                                                                                                     |
|                 | work interfaces [ 10.221239] pps pps0: new PPS source ptp0                                                      |
|                 | macb ff0e0000.ethernet: gem-ptp-timer ptp clock registered.                                                     |
| 10.231978]      | IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready                                                              |
| udhcpc (v1.24.2 | L) started                                                                                                      |
| Sending discove | r                                                                                                               |
| Sending discove |                                                                                                                 |
| Sending discove | pr                                                                                                              |
| No lease, fork: | ing to background                                                                                               |
| done.           |                                                                                                                 |
|                 | n message bus: dbus.                                                                                            |
|                 | ear SSH server: dropbear.                                                                                       |
| Starting syslog |                                                                                                                 |
| Starting tcf-ag | jent: OK                                                                                                        |
| etting console  | e loglevel to 0                                                                                                 |
| reatOvilinvt.#  |                                                                                                                 |

root@xilinx:~#

# Step 11: Run application with gst-launch-1.0



```
root@xilinx:~# cd /media/card
```

```
root@xilinx:~# video_cmd -s 1 -i 640x480@YUY2 -X
```

```
root@xilinx:~# video_cmd -d 1 &
```

```
root@xilinx:~# gst-launch-1.0 \
```

```
v4l2src device=/dev/video2 force-aspect-ratio=false ! \
```

```
"video/x-raw, width=640, height=480, format=YUY2, framerate=3/1, pixel-aspect-ratio=4/3" ! \
```

```
videoconvert ! \
```

```
"video/x-raw, width=640, height=480, format=BGR, framerate=3/1, pixel-aspect-ratio=4/3" ! \
```

```
sdxtrafficdetect ! \
```

fpsdisplaysink video-sink=" kmssink sync=false plane-id=29 bus-id="b00c0000.v\_mix" renderrectangle=\"<0,0,640,480>\" " text-overlay=true sync=false



Accelerating Deep Learning for Embedded Vision at the Edge

## **Results**





HDMI Video Input

HDMI Monitor 640x480

# Agenda



- CMC Microsystems
- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A



#### DEMO

Accelerating Deep Learning for Vision Using CAFFE

# Agenda



- CMC Microsystems
- Overview
- Hardware and software environment
- CNN Training and Quantization Flow
- Inference Demonstration
- How to access
- Q&A

# **Getting Started: Xilinx Tools**



#### • Xilinx tools & licenses available for academic use

• Local and CMC cloud installation



Xilinx SDSoC provides a comprehensive and easy to use application development environment for embedded C/C++ applications targeting Xilinx Zynq SoCs. The environment includes:

- A C/C++ full-system optimizing compiler
- System-level profiling
- Automated software acceleration
- Automated system connectivity generation
- Libraries to speed programming
- Support for bare metal, Linux and FreeRTOS operating systems

The SDSoC installation includes the Xilinx Vivado and Vivado HLS tools for design implementation and high-level synthesis. You can find more information on the Xilinx product page at <a href="http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">http://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">NDN</a> <a href="https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>. You can also join the <a href="https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm">https://www.xilinx.com/products/design-tools/software-zone/sdsoc.htm</a>.

## Getting Started: ZCU102 Zynq Ultrascale+ MPSoC Development Kit



Monitor (DisplayPort)

 Available for shared access at universities via emSYSCAN CFI project:

https://community.cmc.ca/community/devel opment-systems

 Available for short-term (~6-month) loan through CMC Equipment Pool: <u>https://www.cmc.ca/WhatWeOffer/Test/Equi</u> pmentLoan.aspx





Power

### **CMC Heterogeneous Systems**



| Accelerator       | Features                                                                      | Host Interface | Compute Performance                                                 | Power                            |
|-------------------|-------------------------------------------------------------------------------|----------------|---------------------------------------------------------------------|----------------------------------|
| Nallatech<br>385  | <ul> <li>Altera Stratix V</li> <li>Memory: 2 banks of<br/>4 GB</li> </ul>     | PCle 3.0 x 8   | Unavailable                                                         | Typical<br>application<br>≤ 25 W |
| TESLA K40         | <ul> <li>2880 CUDA cores</li> <li>Memory: 12 GB at 288 GB/s</li> </ul>        | PCle 2.0 x 16  | 4.29 TFLOPS (single<br>precision) 1.43 TFLOPs<br>(double precision) | 225 W                            |
| Xeon Phi<br>7120a | <ul> <li>61 Cores, 1.33 GHz</li> <li>Memory: 16 GB at<br/>352 Gb/s</li> </ul> | PCle 2.0 x 16  | Peak Double Precision<br>1.003 TFLOPs                               | 300 W                            |

#### Supported platforms

- SAP: Simulation Acceleration platform; CPU + FPGA
- MPA: Multiprocessor Array Platform; CPU + GPU or Xeon Phi
- HPP: Heterogeneous Processing Platform; MPA + SAP

## Getting Started: CMC Cloud: Unified Architecture





#### **Seamless Transition Between Environments**

- CAD Design using CMC Cloud desktop
- FAB -Simulate on the CAD Compute cluster
- LAB Prototype on the FPGA+GPU cluster

#### More info: <u>www.cmc.ca/cmccloud</u>

### CMC Cloud: CAD Compute Cluster



#### Speed up your simulations

- CMC engineers provide assistance in utilizing the infrastructure as well as domain knowledge on utilizing HPC infrastructure
- Documentation/reference designs available for ANSYS, COMSOL, Xilinx and more
- Uniform array available in standard and large memory configurations



#### **CAD Compute Cluster** – 8 nodes

- Dual 16-core 2.1-.3.7 GHz CPU
- 4 nodes each with 384GB RAM
- 4 nodes each with 768GB RAM
- 300GB local storage
- 100Gb EDR node interconnect / 10GbE storage

More info: <u>www.cmc.ca/cmccloud</u>

## CMC Cloud: Multi-FPGA+GPU Cluster



#### CPUs, GPUs and FPGAs in pre-validated cluster to scale heterogenous computing workloads

- CMC engineers provide assistance with access and application best practices
- Hosted and managed by CMC as a cloud resource; accessible at your desktop
- Reference designs using software stack for OpenCL + MPI heterogenous cluster computing





#### **Heterogeneous Compute Cluster** – 8 nodes

- Dual 12 core 2.2-3.0 GHz CPU
- 192GB RAM
- 300GB local storage
- 100Gb EDR node interconnect / 10GbE storage
- Xilinx Alveo U200 FPGA + NVIDIA V100 GPU

More info: <u>www.cmc.ca/cmccloud</u>





- Presented flow for CNN training and embedded inference using Xilinx DNNDK and Zynq Ultrascale+ MPSoC Development Kit
- Fast, stream-lined flow for rapid prototyping
- Tools and equipment available through CMC

# **Future Work**



- Performance improvements:
- UNNDK network pruning tool
  Multi-threading/multi-DPU
  Other networks/models
  Training (Xilinx SD<sup>c</sup> interest, become a lead client!
  Training (Xilinx SD<sup>c</sup> interest, become a lead client!
  Training (Xilinx SD<sup>c</sup> interest, become a lead client!
  Release ct us to express interest, ind Ultrascale+ MPSoC, DNNDK)
  Release ct us to express interest ind Ultrascale+ computing cli contact of the second data o

  - AI to ASIC reference design and flow







# For more information, contact:

# Hugh Pollitt-Smith, CMC Microsystems Pollitt-smith@cmc.ca