Workshop: Accelerating AI – Challenges and Opportunities in Cloud and Edge Computing, Toronto

CMC Microsystems is pleased to organize a one-day workshop highlighting the challenges and opportunities of AI acceleration from the cloud to the very edge. The same workshop is hosted twice on different days at different locations. You may attend either or both. If you are interested in another date, view the workshop in March.

This workshop aims to bring together experts from industry and academia to share their latest achievements and innovations targeting both training and inference from cloud to edge, with a focus on:

  • new architectures and approaches to accelerate deep learning (DL) workloads,
  • software stack and deep learning frameworks,
  • open-source processor technology RISC-V customized with ultra-low power highly-specialized computing engines for DL inference at the very edge, and
  • the latest trends in AI chip design and commercialization.


Presentation Slides
8:30 to 8:50
8:50 to 9:00
Welcome and Opening Remarks
Bio: Over 15 years of experience in advanced computing systems from the cloud to the very edge, with a focus on artificial intelligence, computer vision, video, image and sensor fusion workloads acceleration, FPGA based prototyping, software stack, and domain-specific hardware architectures. Currently leading projects related to the specification, development, implementation, deployment, and support of the next generation of advanced computing infrastructure mainly FPGAs, GPUs, and Custom Hardware for AI applications. Dr. Hariri earned his B.A.Sc. in Computer Engineering from Ecole Marocaine des Sciences de l’ingénieur, Casablanca, Morocco, in 1998, and the M.S. and Ph.D. degrees from Ecole de Technologie Supérieure (ETS), Montreal, QC, Canada, in 2002 and 2008, respectively, all in electrical engineering.
9:00 to 9:30


Deep-learning-based solutions for embedded vision have emerged as a key application of the growing class of artificial intelligence-based solutions. Specialized accelerators for deep neural networks (DNN) have emerged in order to achieve the highest performance at low-cost and low-power. Computational requirements for DNN accelerators continue to increase, driven in particular by autonomous driving applications.

This presentation introduces some techniques for efficient scaling of DNN graph performance on multiple DNN accelerators, with a particular focus on bandwidth reduction technologies. This includes data compression, layer merging and efficient data sharing across multiple accelerators.


Dr. Pierre G. Paulin is Director of R&D for Embedded Vision at Synopsys. He is responsible for the application development, architecture design and S/W programming tools for embedded vision processors supporting classical computer vision and deep learning-based solutions. Prior to this, he was director of System-on-Chip Platform Automation at STMicroelectronics in Canada, working on platform programming tools for multi-processor systems-on-a-chip, targeting computer vision, video codecs and network processors.

This followed his previous positions as director of Embedded Systems Technologies for STMicroelectronics in Grenoble, France, and manager of Embedded Software and High-level synthesis tools with Nortel Networks in Canada. His interests include embedded vision, AI, video processing, multi-processor systems, and system-level design.

He obtained a Ph.D. from Carleton University, Ottawa, and B.Sc. and M.Sc. degrees from Laval University, Quebec. He won the best paper award at ISSS-Codes in 2004. He is a member of the IEEE.

9:30 to 10:00


Deep neural networks (DNN) are a key enabler for a number of current and emerging technologies. Computational requirements for DNNs can be large, inspiring research on low-precision and mixed-precision instead of full floating-point operations. Generic processors such as CPUs and GPUs are not necessarily optimized to compute neural networks efficiently, especially with low- and mixed-precision operations. In this presentation, recent advances in using programmable logic devices such as FPGA to build custom accelerators for DNNs will be reviewed, notably those exploiting low-precision and quantization. A new FPGA-based DNN accelerator design will also be covered, one that supports mixed-precision operations on low-bit numbers (down to 1-bit).


Sean Wagner is a research scientist with the IBM Canada Research and Development Centre. He earned his B.A.Sc. in Computer Engineering from the University of Waterloo, and M.A.Sc. and Ph.D. from the Electrical and Computer Engineering Department of the University of Toronto. Sean specializes in high-performance computer hardware and architecture (reconfigurable and heterogeneous systems in particular), and has carried out research in photonics, nanofabrication, and communications systems. In his work with SOSCIP (, he provides technical leadership to the research consortium and it’s industry/academic collaborative research projects, helping researchers use SOSCIP’s advanced IBM high-performance computing platforms. With research partners at SOSCIP member institutions, Dr. Wagner has helped with the development of numerous projects including accelerating real-time fMRI analysis for neuroscience applications and accelerating photodynamic cancer therapy planning software.

10:00 to 10:30


DaVinci is an AI processor architecture invented by Huawei. It is a unified architecture for neural network acceleration with best-in-class power efficiency. This keynote will provide an introduction to this architecture and the AI processors/products based on it.


Dr. Xu leads the research and development of IC architecture and algorithm design, in the area of AI, Computer Graphics and  Computer Vision.  He received his B.Sc. and M. Sc. in Computer Science from Tsinghua University and Ph.D. from the University of Regina. Prior to joining Huawei, he was the co-founder and CEO of Yunen Communication Inc., and CTO of Reality Commerce Corp., and IC algorithm architect at Teradici inc., working on various video and graphics processing services and products.


10:30 to 11:00
11:00 to 11:30


The computational demands of modern deep learning algorithms are increasing at a tremendous rate. AMD Radeon Instinct GPUs and AMD’s open-source ROCm software infrastructure provide an efficient, high-performance platform for training a wide range of models used in deep learning. This session will cover AMD’s current product offerings, supported software environments, and machine learning workloads which are particularly suited to GPU acceleration. There will also be a discussion of unique capabilities possible with systems based on the combination of AMD CPU technology and AMD GPU technology.


Niles Burbank leads the Solutions Architecture team in AMD’s Data Center GPU Business Unit. His team supports strategic customers in deploying AMD GPUs for machine learning, high-performance computing, cloud gaming, and virtual desktop infrastructure (VDI). Prior to his current position, Niles served in a variety of product planning and product management roles at AMD and ATI Technologies for more than twenty years. Niles holds a bachelor’s degree in engineering physics from the Royal Military College of Canada and a master’s degree in electrical engineering from the University of Toronto.

11:30 to 12:00
Abstract: AIgean, pronounced like the sea, is an open framework to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs).  We leverage two open-source projects: Galapagos, for multi- FPGA deployment and hls4ml, for generating machine learning kernels synthesizable using Vivado HLS. We use particle detection in the physics domain to provide the first driving applications that help us to characterize the framework. We define a flexible implementation stack for ML that includes the layers of Applications & Algorithms, Cluster Deployment & Communication, and Hardware. To use AIgean, the user provides a machine learning algorithm and the resources of their cluster.  Then AIgean converts the algorithm into appropriate IP cores and provides the off-chip communication between devices. We demonstrate the effectiveness of AIgean with three use cases: a small network running on a single network-connected FPGA, an autoencoder running on three FPGAs, and ResNet-50 running on five FPGAs.


Paul Chow is a professor in the faculty of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto where he holds the Dusan and Anne Miklas Chair in Engineering Design.  He was a major contributor to the early RISC processor technology developed at Stanford University that helped spawn the rapid rise of computing performance in the past 30 years.  Paul helped to establish the FPGA research group at UofT and did some of the early research in FPGA architectures, applications and reconfigurable computing.  He has two papers of the 25 papers selected as the most influential papers in the first 20 years of FCCM, the premier conference on reconfigurable computing.  His current research focuses on reconfigurable computing with an emphasis on programming models, middleware to support programming and portability, and scaling to large-scale, distributed FPGA deployments.

12:00 to 13:00
13:00 to 13:20


For several decades now, Moore’s Law and Dennard Scaling have allowed software developers to effortlessly satisfy the increasing demand for computing power. Recent years, however, have seen the dramatic slowdown of Moore’s law and the end of Dennard scaling. Chip designers have thus turned to application-specific accelerators to meet the processing demands imposed by IoT, big data and AI. Nowhere is this more prevalent than in machine learning, where a huge number of accelerators have been proposed recently. While these accelerators offer significant speed-ups, they are challenging for software developers to use as their use requires specialized knowledge of the underlying hardware of each chip to maximize performance. The difficulty is further exacerbated by the large number of software frameworks available for ML. With new chips being released on a frequent cadence, programmers and framework developers must scramble to maximize performance on the latest ML accelerators. This situation threatens to nullify all the speedup benefits offered by new ML accelerators. A fundamental shift is needed in how programmers work with accelerators to quickly and accurately program them to maximize performance. This is vital for ensuring we can keep pace with the rising demand for ML compute in the years to come.

Bio: Deborah is a co-founder and the machine learning lead of YetiWare. She holds a MASc degree from the University of Toronto, specializing in Operations Research and has more than 10 years of industry experience applying optimization and machine learning techniques.
13:20 to 13:40


Hardware systems for accelerating Artificial Intelligence (AI) and Deep Learning (DL) models have been widely studied recently in response to a substantial need to accelerate AI and DL models. For this reason, many researchers and industrialists have been proposing different approaches to accelerate the heavy calculations of AI models and answer to their continuous increase in size and complexity. Among these techniques, Neuromorphic technology promises high performances, low cost, and real-time processing.

Neuromorphic technology is devoted to the design and development of computational hardware that mimics the characteristics and capabilities of neuro-biological systems. They have been mostly implemented in the analog domain before seeing the emergence of new approaches with the hybrid implementation or even only digital implementation using Field Programmable Gate Arrays (FPGA). With FPGAs, we can improve the flexibility of the implemented models to obtain better accuracy and real-time processing. As well, FPGAs enables creating hybrid systems quickly. Besides, FPGAs boards could implement realistic models that incorporate the non-linearity, plasticity, excitations, and extinctions of the biological model. Several Neuromorphic platforms (SpiNNaker, Loihi, True North…) exist and can implement a large neuronal network consisting of thousands to millions of neurons!


Idir Mellal Ph.D. Industrial Post Doctoral Fellow, a specialist in Hardware Implementation and Digital Design. He is a talented FPGA Implementations expert with more than 10 years of experience. His primary interest is building an effective and robust neuromorphic platform for neurocomputing and accelerating AI models. Currently leading a project at Krembil Research Institute, at Toronto Western Hospital, for building a digital neuromorphic platform mimicking biological neurons. Dr. Mellal earned all his degrees in Electrical Engineering, B.A.Sc., and M.S. degrees at Mouloud Mammeri University, in Tizi Ouzou, Algeria, in 2008 and 2010, respectively; and a Ph.D. at the University of Quebec in Outaouais, Gatineau, Quebec, in 2018.

13:40 to 14:00


The democratization of DNA sequencing has been an important addition to the genomics industry and the bioinformatics research field since the market introduction of small and cheap sequencers in 2014. This development is the result of decades-long research in nanosensors and the unabated advancements in semiconductor technology. These advancements have yielded custom chips capable of integrating nanopore sensor arrays consisting of thousands of channels per square-centimetre and mixed-signal CMOS circuits along with substantial computing powers. However, these mobile sequencers are still facing serious computing challenges in terms of power, speed and memory. These are among the technological obstacles thwarting the transition of these devices into highly commoditized molecular sensors that may be economically deployed at large scales. This presentation highlights these challenges, and shows our research results towards addressing these challenges, and our future work for realizing an embedded solution that would potentially compete with existing sequencing devices.


Karim Hammad received his B.Sc. and M.Sc. degrees in Electronics and Communications Engineering from the Arab Academy for Science, Technology and Maritime Transport (AASTMT), Cairo, Egypt, in 2005 and 2009, respectively, and the Ph.D. degree in Electrical and Computer Engineering from the University of Western Ontario, Canada in 2016. Currently, he is an Assistant Professor in the Department of Electronics and Communications Engineering at the AASTMT, Cairo, Egypt, and Postdoctoral Visitor at York University, Toronto, Ontario, Canada. His research interests include wireless networks cross-layer design, physical layer security and digital circuit design.

14:00 to 14:20


This session will provide a brief overview of the RISC-V instruction set architecture and describe the CORE-V family of open-source cores that implement the RISC-V ISA.  RISC-V (pronounced “risk-five”) is an open, free ISA enabling a new era of processor innovation through open standard collaboration. Born in academia and research, RISC-V ISA delivers a new level of free, extensible software and hardware freedom on architecture, paving the way for the next 50 years of computing design and innovation.  Based on the original PULP Platform development at ETH Zurich, CORE-V is a series of RISC-V based open-source processor cores with associated processor subsystem IP, tools and software for electronic system designers. The CORE-V family provides quality core IP in line with industry best practices in both silicon and FPGA optimized implementations. These cores can be used to facilitate rapid design innovation and ensure effective manufacturability of production SoCs.  The session will describe barriers to the adoption of open-source IP and opportunities to overcome these barriers.


Rick O’Connor is Founder and serves as President & CEO of the OpenHW Group a not-for-profit, global organization driven by its members and individual contributors where hardware and software designers collaborate on open-source cores, related IP, tools and software projects.  The OpenHW Group Core-V Family is a series of RISC-V based open-source cores with associated processor subsystem IP, tools and software for electronic system designers.  Previously Rick was Executive Director of the RISC-V Foundation.  RISC-V (pronounced “risk-five”) is a free and open ISA enabling a new era of processor innovation through open standard collaboration. Founded by Rick in 2015 with the support of over 40 Founding Members, the RISC-V Foundation currently comprises more than 235 members building an open, collaborative community of software and hardware innovators powering processor innovation.   Throughout his career, Rick has continued to be at the leading edge of technology and corporate strategy and has held executive positions in many industry standards bodies.  With many years of executive-level management experience in semiconductor and systems companies, Rick possesses a unique combination of business and technical skills and was responsible for the development of dozens of products accounting for over $750 million in revenue.  With very strong interpersonal skills, Rick is a regular speaker at key industry forums and has built a very strong professional network of key executives at many of the largest global technology firms including Altera (now part of Intel), AMD, ARM, Cadence, Dell, Ericsson, Facebook, Google, Huawei, HP, IBM, IDT, Intel, Microsoft, Nokia, NXP, RedHat, Synopsys, Texas Instruments, Western Digital, Xilinx and many more. Rick holds an Executive MBA degree from the University of Ottawa and is an honours graduate of the faculty of Electronics Engineering Technology at Algonquin College.

14:20 to 14:40
14:40 to 14:50

A high-level overview of the extensive products and services delivered by the CAD, FAB, and LAB business units. There will be a deeper dive into the CAD offering and the infrastructure that is available to make industry-grade CAD tools and resources available to users across Canada.

Craig has been involved in engineering and IT infrastructure design and management for +20 years. Experienced in both industrial and non-profit organizations, Craig will provide a high-level overview of products and services delivered by CMC to the research community with additional detail about the industry-grade CAD tools and resources available to users across Canada.
14:50 to 15:00
Abstract: Previous industrial revolutions expanded human’s mechanical power. AI is expected to have a huge impact on our future, bringing another industrial revolution, which is going to increasingly expand our mental power and cognitive abilities, creating new solutions for existing hard or even impossible to solve problems. For example, AI promises new solutions for presently intractable problems in areas like health (drug discovery, personalized medicine, medical consultation), education (personal tutor), environmental monitoring (prediction of disasters), and agriculture (optimization of production). With Moore’s law losing its steam and the growing reliance on the cloud, new innovations in computing demand revolutionary changes at all levels of the computing stack: from compilers, applications and algorithms to the architecture of the datacenter, processors, microarchitectures and circuits. Some of the grand challenges for the next decade include investigating non-Von-Neumann architectures, reducing the gap between software and hardware development cycles and in general empowering a broader community with the means to leverage application-specific computing hardware, bringing inference and even training for machine learning models to the edge devices. This presentation presents CMC infrastructure in supporting AI acceleration from the cloud to the very edge, including:
  • CAD tools and flows for Processor design and prototyping for RISC-V and ASIPs
  • FPGA/GPU cluster for machine learning
Bio: Over 15 years of experience in advanced computing systems from the cloud to the very edge, with a focus on artificial intelligence, computer vision, video, image and sensor fusion workloads acceleration, FPGA based prototyping, software stack, and domain-specific hardware architectures. Currently leading projects related to the specification, development, implementation, deployment, and support of the next generation of advanced computing infrastructure mainly FPGAs, GPUs, and Custom Hardware for AI applications. Dr. Hariri earned his B.A.Sc. in Computer Engineering from Ecole Marocaine des Sciences de l’ingénieur, Casablanca, Morocco, in 1998, and the M.S. and Ph.D. degrees from Ecole de Technologie Supérieure (ETS), Montreal, QC, Canada, in 2002 and 2008, respectively, all in electrical engineering.
15:00 to 16:00
Panel Session: Accelerating AI – Challenges and Opportunities in Cloud and Edge Computing
16:00 to 16:05
Closing Remarks
Yassine Hariri, CMC Microsystems

Why Attend

  1. To promote innovation, adoption and early access to advanced technologies including silicon and systems for accelerating AI workloads from cloud to the edge.
  2. To share insights and experiences with others; explore collaboration opportunities and connect leaders from industry to AI researchers and start-ups.
  3. To Influence technology selection (roadmap) and development activities of emerging AI trends.
The workshop is open to professors, research associates and graduate students at Canadian universities as well as industrial attendees who wish to provide input and advice.
Scroll to Top