Workshop: Accelerating AI – Challenges and Opportunities in Cloud and Edge Computing, Montreal

CMC Microsystems is pleased to organize a one-day workshop highlighting the challenges and opportunities of AI acceleration from the cloud to the very edge. The same workshop is hosted twice on different days at different locations. You may attend either or both.

This workshop aims to bring together experts from industry and academia to share their latest achievements and innovations targeting both training and inference from cloud to edge, with a focus on:

  • new architectures and approaches to accelerate deep learning (DL) workloads,
  • software stack and deep learning frameworks,
  • open-source processor technology RISC-V customized with ultra-low power highly-specialized computing engines for DL inference at the very edge, and
  • the latest trends in AI chip design and commercialization.


Presentation Slides
8:30 to 8:50
8:50 to 9:00
Welcome and Opening Remarks
Bio: Over 15 years of experience in advanced computing systems from the cloud to the very edge, with a focus on artificial intelligence, computer vision, video, image and sensor fusion workloads acceleration, FPGA based prototyping, software stack, and domain-specific hardware architectures. Currently leading projects related to the specification, development, implementation, deployment, and support of the next generation of advanced computing infrastructure mainly FPGAs, GPUs, and Custom Hardware for AI applications. Dr. Hariri earned his B.A.Sc. in Computer Engineering from Ecole Marocaine des Sciences de l’ingénieur, Casablanca, Morocco, in 1998, and the M.S. and Ph.D. degrees from Ecole de Technologie Supérieure (ETS), Montreal, QC, Canada, in 2002 and 2008, respectively, all in electrical engineering.
9:00 to 9:30


The emergence of deep neural networks (DNNs) in recent years has enabled ground-breaking abilities and applications for modern intelligent systems. Concurrently, the increasing complexity and sophistication of DNNs are predicated on significant power consumption, model size and computing resources.  These factors have been found to limit deep learning’s performance in real-time applications, in large-scale systems, and on low-power devices. Application developers, software engineers and algorithm architects must now create intelligent solutions that deal with strict latency, power and computation constraints. In this talk, we will examine automated software solutions for Neural Architecture Search (NAS) and network compression software that is ideal for various target hardware platforms. We will take a look at promising new ways of using AI to help human experts design highly compact, high-performance Deep Neural Networks on cloud and edge devices.


Ehsan is CTO and co-founder of Deeplite. He did his Bachelor’s in Artificial Intelligence and his Ph.D. on Embedded Systems at Concordia University. During his Ph.D., he did two internships at SAP and Microsoft to develop performance analysis tools and mobile platforms. He joined the incubator TandemLaunch Inc. as an Entrepreneur-in-Residence in 2017 where he assessed emerging challenges for deep learning and formed the technology vision that became the core of Deeplite.

9:30 to 10:00
Abstract: AIgean, pronounced like the sea, is an open framework to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs).  We leverage two open-source projects: Galapagos, for multi- FPGA deployment and hls4ml, for generating machine learning kernels synthesizable using Vivado HLS. We use particle detection in the physics domain to provide the first driving applications that help us to characterize the framework. We define a flexible implementation stack for ML that includes the layers of Applications & Algorithms, Cluster Deployment & Communication, and Hardware. To use AIgean, the user provides a machine learning algorithm and the resources of their cluster.  Then AIgean converts the algorithm into appropriate IP cores and provides the off-chip communication between devices. We demonstrate the effectiveness of AIgean with three use cases: a small network running on a single network-connected FPGA, an autoencoder running on three FPGAs, and ResNet-50 running on five FPGAs.


Paul Chow is a professor in the faculty of The Edward S. Rogers Sr. Department of Electrical and Computer Engineering at the University of Toronto where he holds the Dusan and Anne Miklas Chair in Engineering Design.  He was a major contributor to the early RISC processor technology developed at Stanford University that helped spawn the rapid rise of computing performance in the past 30 years.  Paul helped to establish the FPGA research group at UofT and did some of the early research in FPGA architectures, applications and reconfigurable computing.  He has two papers of the 25 papers selected as the most influential papers in the first 20 years of FCCM, the premier conference on reconfigurable computing.  His current research focuses on reconfigurable computing with an emphasis on programming models, middleware to support programming and portability, and scaling to large-scale, distributed FPGA deployments.

10:00 to 10:30


We propose a generalized acceleration of convolutional neural network (CNN) architecture for image classification based on the decomposition of the input image spectra into multiple subbands and study its effect on the overall classification performance. We decompose the input image spectra into multiple critically-sampled subbands, then individually extract features per subband using CNNs and combine the subbands by fully connected layers in the overall classification. We do this by learning the subband decomposition filter from the dataset by back-propagating the error derivative through the subband decomposition filter structure. A form of structural regularization is imposed by the concept of decomposing the input image into subbands and then processing each of the subbands by a single CNN. This provides better generalization capability achieving accuracy that surpasses the state-of-the-art results for MNIST, CIFAR-10/100, Caltech-101 and ImageNet-2012 datasets. The subband decomposition provides a powerful tool for CNN acceleration enabling over 90% reduction in computation cost in the inference path and approximately 75% reduction of computation in back-propagation (per iteration) with just a single-layer of subband decomposition. We demonstrate a generalized technique for decomposing the input signal into various subbands and later extend the technique to a specific case where the wavelet basis functions are used to decompose into orthogonal subbands and study its properties. We show that the wavelet-transform yields computational benefits at the cost of minor compromise on classification accuracy. On ImageNet-2012 dataset we achieve top-5 and top-1 validation set accuracy of 81.37% and 63.7%, respectively. With a 2-Layer subband decomposition, the achieved computational gains are even greater with comparable accuracy results. The subband decomposed structure is also seen to be more robust than the regular full-band CNN architecture to weight-and-bias quantization noise and input quantization noise.


Pavel Sinha has a total of over 10 years of Industry R&D experience. He worked with the R&D division of Qualcomm as a Senior member of the team as a Video ASIC Architect. Pavel worked with Cadence as a Principal Engineer in their R&D division designing ultra-high-speed on-silicon emulation Processors. He holds a Bachelor and a Master degree in Electrical and Computer Engineer and is pursuing a Ph.D. at McGill University, Montreal, Canada, with specialization in Artificial-Intelligent/Machine-Learning and VLSI. Pavel is the chief scientist, CEO and founder at Aarish Technologies Inc. where he is involved in developing high-performance, low-power and low-cost Artificial Intelligence (AI) processors. Aarish was established in 2018 by founding members from Silicon Valley and the growing AI hub of Montreal. Pavel holds several patents and has been part of the cutting-edge research team in the industry.

10:30 to 11:00
11:00 to 11:30


The rise of Deep Learning has enabled significant improvement in many AI applications, such as speech recognition, natural language processing and computer vision. Although a lot of research has been carried out in developing new deep learning models and techniques, making deep learning models computationally affordable and accessible is still a challenge. In this presentation, we will talk about how to accelerate computation in Deep Neural Networks. Specifically, we will talk about Quantization. Quantization in Deep Learning is a technique to reduce power, memory and computation time of deep neural networks. We will talk about how one can improve the performance of a DNN using both software and hardware solutions.


MohammadHossein AskariHemmat is a second-year Ph.D. student in the Electrical Engineering department of Ecole Polytechnique Montreal. He is working under the supervision of Prof. Jean-Pierre David and Prof. Yvon Savaria. In his doctoral research, he investigates methods for making computation of Deep Neural Network more efficient. Before pursuing a Ph.D. degree, he worked for two years as an ASIC Verification Engineer for Microsemi and worked for a year as Software Engineer at Tru Simulation.

11:30 to 12:00


This session will provide a brief overview of the RISC-V instruction set architecture and describe the CORE-V family of open-source cores that implement the RISC-V ISA.  RISC-V (pronounced “risk-five”) is an open, free ISA enabling a new era of processor innovation through open standard collaboration. Born in academia and research, RISC-V ISA delivers a new level of free, extensible software and hardware freedom on architecture, paving the way for the next 50 years of computing design and innovation.  Based on the original PULP Platform development at ETH Zurich, CORE-V is a series of RISC-V based open-source processor cores with associated processor subsystem IP, tools and software for electronic system designers. The CORE-V family provides quality core IP in line with industry best practices in both silicon and FPGA optimized implementations. These cores can be used to facilitate rapid design innovation and ensure effective manufacturability of production SoCs.  The session will describe barriers to the adoption of open-source IP and opportunities to overcome these barriers.


Rick O’Connor is Founder and serves as President & CEO of the OpenHW Group a not-for-profit, global organization driven by its members and individual contributors where hardware and software designers collaborate on open-source cores, related IP, tools and software projects.  The OpenHW Group Core-V Family is a series of RISC-V based open-source cores with associated processor subsystem IP, tools and software for electronic system designers.  Previously Rick was Executive Director of the RISC-V Foundation.  RISC-V (pronounced “risk-five”) is a free and open ISA enabling a new era of processor innovation through open standard collaboration. Founded by Rick in 2015 with the support of over 40 Founding Members, the RISC-V Foundation currently comprises more than 235 members building an open, collaborative community of software and hardware innovators powering processor innovation.   Throughout his career, Rick has continued to be at the leading edge of technology and corporate strategy and has held executive positions in many industry standards bodies.  With many years of executive-level management experience in semiconductor and systems companies, Rick possesses a unique combination of business and technical skills and was responsible for the development of dozens of products accounting for over $750 million in revenue.  With very strong interpersonal skills, Rick is a regular speaker at key industry forums and has built a very strong professional network of key executives at many of the largest global technology firms including Altera (now part of Intel), AMD, ARM, Cadence, Dell, Ericsson, Facebook, Google, Huawei, HP, IBM, IDT, Intel, Microsoft, Nokia, NXP, RedHat, Synopsys, Texas Instruments, Western Digital, Xilinx and many more. Rick holds an Executive MBA degree from the University of Ottawa and is an honours graduate of the faculty of Electronics Engineering Technology at Algonquin College.

12:00 to 13:00
13:00 to 13:20


LACIME is a Level-1 research lab focusing on technologies for cyberphysical system. Founded in 1993, it has experienced continuous growth and is now regrouping 15 professors and nearly 150 students and researchers. This talk will provide an overview of the exciting research currently happening at LACIME.


Professor Gagnon is highly inclined toward research partnerships with industry, as shown by a series of important grants with industrial partners (Media5, SIGNUM preemptive healthcare, Ubisoft). He was mandated as a technical expert in judicial cases pertaining to embedded systems and signal processing and acts as a senior advisor to the scientific committee of Quantum Numbers Corporation. He is currently a Board Member of ReSMiQ and the Director of the LACIME Research Laboratory, a group of 15 faculties and 150 highly-dedicated students and researchers in microelectronics, optoelectronics, printable electronics and wireless communications. His contribution to management was recognized with ÉTS Board of Directors’ Award of Excellence in 2016. He is now leading a cutting-edge research project to design a novel system for Capacitively-Coupled Electrocardiography involving a team of 15 researchers.

13:20 to 13:40


For several decades now, Moore’s Law and Dennard Scaling have allowed software developers to effortlessly satisfy the increasing demand for computing power. Recent years, however, have seen the dramatic slowdown of Moore’s law and the end of Dennard scaling. Chip designers have thus turned to application-specific accelerators to meet the processing demands imposed by IoT, big data and AI. Nowhere is this more prevalent than in machine learning, where a huge number of accelerators have been proposed recently. While these accelerators offer significant speed-ups, they are challenging for software developers to use as their use requires specialized knowledge of the underlying hardware of each chip to maximize performance. The difficulty is further exacerbated by the large number of software frameworks available for ML. With new chips being released on a frequent cadence, programmers and framework developers must scramble to maximize performance on the latest ML accelerators. This situation threatens to nullify all the speedup benefits offered by new ML accelerators. A fundamental shift is needed in how programmers work with accelerators to quickly and accurately program them to maximize performance. This is vital for ensuring we can keep pace with the rising demand for ML compute in the years to come.

Bio: Deborah is a co-founder and the machine learning lead of YetiWare. She holds a MASc degree from the University of Toronto, specializing in Operations Research and has more than 10 years of industry experience applying optimization and machine learning techniques.
13:40 to 14:00


DL relies on the storage of an ever-growing number of parameters, which consumes a large part of the system energy. Storage energy can be reduced by lowering the supply voltage of the memories, but at the cost of reduced storage reliability.  As the level of unreliability is voltage-dependent, we consider this as a variable which the network can optimize to find the best energy/accuracy compromise. We consider the case where the network learns an unreliability level on a per-layer basis, and show that optimal reliability assignment can lead to large energy savings at equal accuracy.


Sébastien Henwood is a Ph.D. student at the Electrical Engineering Department of Polytechnique Montréal under the supervision of François Leduc-Primeau and Yvon Savaria with interests in machine learning, optimization and the design of efficient systems and algorithms. François Leduc-Primeau is an assistant Professor at Polytechnique Montréal with research interests in digital systems and telecommunications for the purpose of increasing the energy efficiency in signal processing and telecommunication systems. Yvon Savaria is a full professor at Polytechnique Montréal and is the director of the Microelectronics and Microsystems Research Group.

14:00 to 14:10
Abstract: A high-level overview of the extensive products and services delivered by the CAD, FAB, and LAB business units. There will be a deeper dive into the CAD offering and the infrastructure that is available to make industry-grade CAD tools and resources available to users across Canada.
Bio: Owain has worked in IT and Engineering at CMC Microsystems in Kingston, Ontario for 11 years. Prior to that, he worked at IBM Canada for 2 years. He earned a Master of Engineering in Electrical and Computer Engineering at Queen’s University and a Bachelor of Science in Computing Science at the University of Alberta. Owain holds a Certified Information Systems Security Professional (CISSP) designation.
14:10 to 14:20
14:20 to 14:30
Abstract: Previous industrial revolutions expanded human’s mechanical power. AI is expected to have a huge impact on our future, bringing another industrial revolution, which is going to increasingly expand our mental power and cognitive abilities, creating new solutions for existing hard or even impossible to solve problems. For example, AI promises new solutions for presently intractable problems in areas like health (drug discovery, personalized medicine, medical consultation), education (personal tutor), environmental monitoring (prediction of disasters), and agriculture (optimization of production). With Moore’s law losing its steam and the growing reliance on the cloud, new innovations in computing demand revolutionary changes at all levels of the computing stack: from compilers, applications and algorithms to the architecture of the datacenter, processors, microarchitectures and circuits. Some of the grand challenges for the next decade include investigating non-Von-Neumann architectures, reducing the gap between software and hardware development cycles and in general empowering a broader community with the means to leverage application-specific computing hardware, bringing inference and even training for machine learning models to the edge devices. This presentation presents CMC infrastructure in supporting AI acceleration from the cloud to the very edge, including:
  • CAD tools and flows for Processor design and prototyping for RISC-V and ASIPs
  • FPGA/GPU cluster for machine learning
Bio: Over 15 years of experience in advanced computing systems from the cloud to the very edge, with a focus on artificial intelligence, computer vision, video, image and sensor fusion workloads acceleration, FPGA based prototyping, software stack, and domain-specific hardware architectures. Currently leading projects related to the specification, development, implementation, deployment, and support of the next generation of advanced computing infrastructure mainly FPGAs, GPUs, and Custom Hardware for AI applications. Dr. Hariri earned his B.A.Sc. in Computer Engineering from Ecole Marocaine des Sciences de l’ingénieur, Casablanca, Morocco, in 1998, and the M.S. and Ph.D. degrees from Ecole de Technologie Supérieure (ETS), Montreal, QC, Canada, in 2002 and 2008, respectively, all in electrical engineering.
14:30 to 15:30
Panel Session: Accelerating AI – Challenges and Opportunities in Cloud and Edge Computing


  • Ehsan Saboori, CTO and Co-founder of Deeplite
  • Paul Chow, Professor at University of Toronto
  • Deborah Guillon, Co-founder and the Machine Learning Lead of YetiWare
  • Pavel Sinha, Ph.D. student at McGill University, Montreal, Canada
  • MohammadHossein AskariHemmat, Ph.D. Student at Ecole Polytechnique of Montréal
15:30 to 15:40
Closing Remarks
Yassine Hariri, CMC Microsystems

Why Attend

  1. To promote innovation, adoption and early access to advanced technologies including silicon and systems for accelerating AI workloads from cloud to the edge.
  2. To share insights and experiences with others; explore collaboration opportunities and connect leaders from industry to AI researchers and start-ups.
  3. To Influence technology selection (roadmap) and development activities of emerging AI trends.
The workshop is open to professors, research associates and graduate students at Canadian universities as well as industrial attendees who wish to provide input and advice.

Vous planifiez un événement, un cours ou une rencontre dans votre département ? CMC souhaite soutenir et participer à vos événements.

Retour haut de page

CMC Planned Service Disruption

Thursday, August 6
8 am to 9 am EDT

CMC is making improvements to infrastructure that will potentially affect connectivity to CMC managed license servers.

We apologize for the inconvenience this may cause.

We're Hiring!

If you’re ready for a new challenge and want to learn everyday while working with talented colleagues, we want to connect with you.