[Tutorial #1] 09:00-10:30
|
|
Training Spiking Neural Networks Using Lessons from Deep LearningDr. Jason K. Eshraghian Biography
Jason Eshraghian is a Post-Doctoral Researcher at the Department of Electrical Engineering and Computer Science, University of Michigan in Ann Arbor and a Forrest Research Fellow with the School of Computer Science, University of Western Australia. He received the Bachelor of Engineering (Electrical and Electronic) and the Bachelor of Laws degrees from The University of Western Australia, WA, Australia in 2016, where he also completed his Ph.D. Degree. He is the developer of snnTorch, an accelerated deep learning framework for spiking neural networks, and he was awarded the 2019 IEEE VLSI Best Paper Award, the Best Paper Award at the 2019 IEEE AICAS Conference, and the Best Live Demonstration Award at the 2020 IEEE ICECS for his work on neuromorphic vision and in-memory computing using RRAM. He currently serves as the secretary-elect of the IEEE Neural Systems and Applications Committee, and was a recipient of the Fulbright Future Fellowship (Australian-America Fulbright Commission), the Forrest Research Fellowship (Forrest Research Foundation), and the Endeavour Fellowship (Australian Government). Abstract
The brain is the perfect place to look for inspiration to develop more efficient neural networks. While the computational cost of deep learning exceeds millions of dollars to train large-scale language models, our brains are somehow equipped to process an abundance of signals from our sensory periphery, to provide feedback control to motor systems, and to ensure our involuntary biological processes do not shut down. All within a power budget of approximately 10-20 watts. One of the main differences with modern deep learning is that the brain encodes and processes information as spikes rather than continuous, high-precision activations. The dominant cost of deep learning hardware accelerators arises from regular memory access and data communication. But distributing information in the temporal domain in the form of sparse spiking activity has demonstrated 100-1000x reduction of energy consumption in deep learning workloads. This tutorial will dive into the intersection between neuroscience, deep learning, and hardware acceleration, and how spanning across these various layers of abstraction can drive forward the next generation of deep learning algorithms. We will explore how to adopt basic principles of neurobiology into modern deep learning algorithms trained via error backpropagation. Several practical case studies of spike-based processing will be explored, from front-end event-based sensing, to back-end processing, and how all of these concepts come together to enable ultra-efficient cognitive computation. The tutorial will conclude with a hands-on coding session, and an overview of best practices for ensuring optimal performance of spike-based algorithms on neuromorphic hardware. |
|
[Tutorial #2] 09:00-10:30
|
|
Accelerator System Design Challenges from Real-time and Multi-DNN WorkloadsDr. Hyoukjun Kwon Biography
Hyoukjun Kwon is a research scientist at Reality Labs, Meta. He received his PhD degree in Computer Science from Georgia Institute of Technology in 2020. His primary research area is computer architecture. His research interest includes AI accelerator architecture, accelerator dataflow optimization and modeling, interconnection network, AI model-compiler-accelerator co-design, and design automation. His flexible dataflow AI accelerator architecture, MAERI (ASPLOS 2018) received the honorable mention at IEEE Top Picks in Computer Architecture Conferences in 2018. His data-centric approach to model and analyze accelerator dataflow, MAESTRO, was selected one of the IEEE Top Picks in Computer Architecture Conferences in 2019. His thesis work on data-centric approach to design AI accelerators was recognized with an honorable mention at IEEE ACM SIGARCH/IEEE CS TCCA Outstanding Dissertation Award. Abstract
Many platforms, from data centers to edge devices, started to deploy multiple DNNs to provide high-quality results for diverse applications in various domains such as computer vision, speech, language, and so on. Unlike single task-oriented applications that were dominant in the past, emerging applications such as AR/VR deploy multi-DNN pipelines performing different tasks to enable more complex applications. Also, such workloads are often dynamic where some models are activated based on the results of another model’s output (e.g., keyword detection – speech recognition pipeline). In addition, some models in such applications need to be run constantly at a target processing rate (e.g., object detection on an autonomous vehicle). Such new features of emerging workloads such as AR/VR introduce new design challenges to accelerator systems. This tutorial will introduce such real-time and multi-DNN workloads from emerging applications and discuss their unique features and challenges to accelerator systems. As a case study, this tutorial will also present our recent work, Herald, which proposed an accelerator architecture tailored for real-time multi-DNN workloads. |
|
[Tutorial #3] 10:45-12:15
|
|
Deep Neural Network Training Processor Design Prof. Dongsuk Jeon Biography
Dongsuk Jeon received the B.S. degree in electrical engineering from Seoul National University, Seoul, South Korea, in 2009, and the Ph.D. degree in electrical engineering from the University of Michigan at Ann Arbor, Ann Arbor, MI, USA, in 2014. From 2014 to 2015, he was a Postdoctoral Associate with the Massachusetts Institute of Technology, Cambridge, MA, USA. He is currently an Associate Professor with the Graduate School of Convergence Science and Technology, Seoul National University. His current research interests include hardware-oriented machine learning algorithms, hardware accelerators, and low-power circuits. Abstract
Deep learning algorithms gathered serious attention due to their outstanding performance in various tasks. Their application areas are fast expanding from computer vision and speech recognition to multi-modal understanding. While power-saving techniques such as quantization, network compression, and pruning have been successfully adopted in pre-trained models, they often become next to useless when applied to the training process. This talk will discuss various algorithmic and hardware optimization techniques enabling energy-efficient training processors. While backpropagation is a very powerful tool for gradient computation, its computational graph does not match the human brain very well, and it requires high-precision computations for reliable training convergence. In the second part of the talk, alternative algorithms based on neuromorphic computing for training deep neural networks will be presented. Recent advances in neuromorphic computing will be discussed, followed by design examples of neuromorphic processors. |
|
[Tutorial #4] 10:45-12:15
|
|
IEEE Low-Power Computer Vision ChallengeProf. Zhangyang “Atlas” Wang Biography
Professor Zhangyang “Atlas” Wang is currently the Jack Kilby/Texas Instruments Endowed Assistant Professor in the Department of Electrical and Computer Engineering at The University of Texas at Austin. He also holds a visiting researcher position at Amazon. He was an Assistant Professor of Computer Science and Engineering, at the Texas A&M University, from 2017 to 2020. He received his Ph.D. degree in ECE from UIUC in 2016, advised by Professor Thomas S. Huang; and his B.E. degree in EEIS from USTC in 2012. Prof. Wang has broad research interests in machine learning, computer vision, optimization, and their interdisciplinary applications. Most recently, he studies automated machine learning (AutoML), learning to optimize (L2O), robust learning, efficient learning, and graph neural networks. His research is gratefully supported by NSF, DARPA, ARL, ARO, IARPA, DOE, as well as dozens of industry and university grants. He is/was an elected technical committee member of IEEE MLSP and IEEE CI; an associate editor of IEEE TCSVT (in which capacity he received the 2020 Best Associate Editor Award); and frequently serves as area chairs, guest editors, invited speakers, various panelist positions and reviewers. He has received many research awards and scholarships, including most recently an ARO Young Investigator Award, an IBM Faculty Research Award, a J. P. Morgan Faculty Research Award, an Amazon Research Award (AWS AI), an Adobe Data Science Research Award, a Young Faculty Fellow of TAMU, and five research competition prizes from CVPR/ICCV/ECCV. Abstract
Low-Power Computer Vision: Algorithms and Practice, I will introduce our team’s winning solution of IEEE low-power computer vision (LPCV) challenge 2021, Track 1 (UAV video). The track targets detecting and tracking multiple moving objects in video captured by an unmanned aerial vehicle (UAV), and the algorithm solution needs to be both accurate and energy-efficient (as measured by Raspberry Pi 3B+). The task poses challenges for reducing temporal-spatial redundancy at all three levels: data, model, and hardware. I will describe the efforts our team made for optimizing each aspect, and how the three levels work together as an efficient and effective pipeline in the wild. |
|
Abhinav Goel Purdue University, USA Biography
Abhinav Goel is a Senior Deep Learning Architect in NVIDIA. He expects to receive the PhD degree from Elmore Family School of Electrical and Computer Engineering at Purdue University in May 2022. Abhinav Goel’s primary research focus is efficient computer vision training and inference systems. Abstract
Realizing a 5X Computer Vision Inference Speedup with PyTorch, I will conduct an interactive demonstration of popular low-power computer vision techniques for embedded devices. The techniques covered will include (a) Parameter Quantization, (b) Neural Network Pruning, and (c) Knowledge Distillation. Starting from one of the state-of-the-art “unoptimized” deep neural networks, we will apply the different techniques in PyTorch, to see reductions in latency, energy, and memory requirements. I will also discuss the combination of the various techniques and describe the accuracy-efficiency tradeoff. Before concluding, I will explain the open problems in the area and highlight directions for future research. |
|
George K. Thiruvathukal Loyola University Chicago, USA Biography
George K. Thiruvathukal is a professor of computer science at Loyola University Chicago in Chicago, IL, USA and a visiting computer scientist at Argonne National Laboratory in the Leadership Computing Facility. He directs the Software and Systems Laboratory at Loyola University Chicago. His research areas include software engineering for science and machine learning; low-power computer vision; computational neuroscience; crowdsourcing to aid in classification and machine learning; platform studies/gaming studies; computing education; computing history; and digital music. Abstract
Machine Learning Reproducibility: Guidance for Practitioners, Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The speech defines a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way. |
|
[Tutorial #5] 13:15-14:45
|
|
Ultra-Low Power Biomedical AI Processor Design for Wearable Intelligent Health Monitoring DevicesProf. Jun Zhou Biography
Dr. Jun Zhou is a professor and the director of the group of Smart ICs and Systems for AIoT at the University of Electronic Science and Technology of China. He has worked at IMEC and Institute of Microelectronics A*STAR Singapore for nearly 10 years. His main research interest is low power processor design for intelligent sensing applications. He has published more than 80 papers in prestigious conferences and journals including ISSCC, JSSC, CICC, TCAS-I and DAC, and has received several awards including the Seoul Chapter Award from the IEEE Circuits & System Society and the Award for Technological Invention from the Chinese Association for Artificial Intelligence. He is currently a senior member of IEEE and serving as the chair of the Digital Circuits & Systems Sub-committee of Asian Solid-State Circuits Conference, the Associate Editor of the IEEE Transactions on Biomedical Circuits & Systems (TBioCAS) and the Associate Editor of the IEEE Transactions on VLSI Systems (TVLSI). Abstract
Wearable intelligent health monitoring devices can detect the abnormality in users’ biomedical signals and generate real-time alerts. Compared with the general health monitoring equipments in the hospital, they can be used for long-term health monitoring with automatic abnormality detection without involving the doctor, which is very suitable for health monitoring at home. A key component of the wearable intelligent health monitoring device is the biomedical AI processor. By embedding the biomedical AI processing capability in the wearable health monitoring device, real-time abnormality detection with low data transmission can be achieved. A major challenge of biomedical AI processor design is that the AI computation involves large computational complexity causing high power consumption, which is not suitable for size- and energy-constrained wearable devices. Therefore, how to reduce the power consumption of the biomedical AI processor while achieving high accuracy and real-time performance is an urgent need. This talk introduces the design techniques combining algorithm, architecture, circuit and device to address the above challenges. |
|
Prof. Liang Chang University of Electronic Science and Technology of China, China Biography
Dr. Liang Chang received the Ph.D. degree from Beihang University, Beijing, China. Since 2020, he has been an Associate Professor at School of Information and Communication Engineering, University of Electronic Science and Technology of China. His research interests include Computing in emerging nonvolatile memory, advanced memory-centric computer architecture, and AI processor for intelligent detection. He has co-authored more than 30 scientific papers, including ISSCC, CCIC, ASSCC, MICRO, TCASI, TC, ICCAD, DATE. He is the regular reviewer of IEEE TCASI/II, IEEE TBiocas , IEEE TC and IEEE TVLSI etc. | |
[Tutorial #6] 13:15-14:45
|
|
Compute-in-Memory Processors: A Cross-layer ApproachProf. Yongpan Liu Biography
Yongpan Liu received the B.S., M.S., and Ph.D. degrees from the Electronic Engineering Department, Tsinghua University, Beijing, China, in 1999, 2002, and 2007, respectively. He is currently a Full Professor with the Department of Electronic Engineering, Tsinghua University. Prof. Liu is a Program Committee Member for ISSCC, ASSCC and DAC. He has received under 40 Young Innovators Award DAC 2017, Best Paper/Poster Award from ASPDAC 2021, 2017, Micro Top Pick 2016, HPCA 2015, and Design Contest Awards of ISLPED in 2012, 2013 and 2019. He served as General Secretary for ASPDAC 2021 and Technical Program Chair for NVMSA 2019. He was Associate Editor of the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II, and the IET Cyber-Physical Systems. He is an IEEE Senior Member. He served as A-SSCC2020 tutorial speaker and IEEE CASS Distinguished Lecturer 2021. Abstract
Computing in memory (CIM) processor is quite promising to improve energy efficiency of ML applications. Plenty of CIM circuit designs are emerging, such as SRAM, DRAM and RRAM, which demonstrates obvious advantages over digital counterparts in device and macro level. Recently, how to exploit CIM in system-level processor design is attracting more and more attentions. This tutorial will introduce CIM processor designs from a holistic approach, covering device, macro and system level techniques to design programmable and scalable CIM processors. Hardware and software co-design techniques will also be covered to support realistic ML applications, including considerations of device/circuit non-idealities, system architecture, and model structures suitable for CIM processors with compression techniques such as pruning and quantization. |
|
[Tutorial #7] 15:00-16:30
|
|
Event-driven bio-inspired audio sensor front end for edgeTinyMLProf. Shih-Chii Liu Biography
Co-directs the Sensors group (http://sensors.ini.uzh.ch) at the Institute of Neuroinformatics, University of Zurich and ETH Zurich. Her group works on event-driven deep networks and bio-inspired auditory sensors, and the real-time implementation of intelligent hardware systems with state-of-art power efficiency, latency, and throughput; and applications in machine learning tasks including speech recognition. Abstract
Voice-activated wake-up functions such as voice activity detection (VAD) and keyword spotting (KWS) are prevalent for always-on edge audio devices. This tutorial will cover the design trends of power-efficient analog signal processing circuits for the edge audio tasks. These continuous-time (CT) feature extraction circuits are bio-inspired, especially from the biological cochlea, building up an architecture of a set of frequency-selective channels which include a band-pass filter (BPF), rectifier and spike generator. Starting with an early generation of bio-inspired audio feature extractors, we will provide a new figure-of-merit (FoM) to compare the state-of-the-art designs. We will give a demonstration example that combines a bio-inspired spiking cochlea front-end with a deep-neural-network (DNN) classifier showing the end-to-end audio-inference. |
|
Dr. Kwantae Kim University of Zurich and ETH Zurich, Switzerland Biography
Kwantae Kim received his B.S., M.S., and Ph.D. degrees from the School of Electrical Engineering, KAIST. He is currently a Postdoc at the Institute of Neuroinformatics, University of Zurich and ETH Zurich. His research interests include analog/mixed-signal integrated circuits (IC) for bio-impedance sensor, neuromorphic audio sensor, and time-domain processing. |
|
[Tutorial #8] 15:00-16:30
|
|
In-memory Computing Circuit Design for Neural Network AccelerationProf. Shyh-Shyuan Sheu Biography
Shyh-Shyuan Sheu received the Ph.D. degree from the Department of Electrical Engineering, National Central University, Chungli, Taiwan, in 2012. He joined ITRI to develop the emerging semiconductor technology and circuit at 2003. Until now, he has published over 20 journals, 40 conference papers and 70 patents. Currently. He is the division director at the Chip Technology and Design Division, Electronics and Optoelectronic System Research Lab. (EOSL) of Industrial Technology Research Institute (ITRI) in Hsinchu, Taiwan. His research involves memory, emerging AI architecture and circuit(computing-in-memory), neuromorphic computing, 3DIC, display driver, and sensors. Abstract
AI with IoT (AIOT) application is widely appear in human’s life such as self-driving car, smart door bell. These applications show the AI is moving forward from cloud to edge device. However, the edge device for AI have the challenges of low to medium computing power, small form factor, long battery life time. These challenges are all related the limitation of computing energy efficiency. The conventional computer architecture (Von-Neumann machine) is not efficient due to the memory bandwidth limit and data movement frequently. A new computing architecture in-memory computing (IMC) is proposed to improve the computing efficiency. The IMC can directly compute (multiple and accumulate) in memory to avoid the data movement between the CPU and memory. However, IMC still have many challenges need to be overcome, such as linearity, throughput, energy and area efficiency. The design challenges of IMC will be discussed first. This challenges also need to consider the system requirement. Then the characteristic of memory devices for IMC will be introduced. According the characteristic of memory devices, the designer can select the suitable memory device for weight storage. Finally, the previous works of IMC macro will be analyzed and discussed. |
Tutorial Sessions
Tutorial Sessions
[Tutorial #1] 09:00-10:30
|
|
Training Spiking Neural Networks Using Lessons from Deep LearningDr. Jason K. Eshraghian |
|
[Tutorial #2] 09:00-10:30
|
|
Accelerator System Design Challenges from Real-time and Multi-DNN WorkloadsDr. Hyoukjun Kwon |
|
[Tutorial #3] 10:45-12:15
|
|
Deep Neural Network Training Processor Design Prof. Dongsuk Jeon |
|
[Tutorial #4] 10:45-12:15
|
|
IEEE Low-Power Computer Vision ChallengeProf. Zhangyang “Atlas” Wang |
|
Abhinav Goel Purdue University, USA |
|
George K. Thiruvathukal Loyola University Chicago, USA |
|
[Tutorial #5] 13:15-14:45
|
|
Ultra-Low Power Biomedical AI Processor Design for Wearable Intelligent Health Monitoring DevicesProf. Jun Zhou |
|
Prof. Liang Chang University of Electronic Science and Technology of China, China | |
[Tutorial #6] 13:15-14:45
|
|
Compute-in-Memory Processors: A Cross-layer ApproachProf. Yongpan Liu |
|
[Tutorial #7] 15:00-16:30
|
|
Event-driven bio-inspired audio sensor front end for edgeTinyMLProf. Shih-Chii Liu |
|
Dr. Kwantae Kim University of Zurich and ETH Zurich, Switzerland |
|
[Tutorial #8] 15:00-16:30
|
|
In-memory Computing Circuit Design for Neural Network AccelerationProf. Shyh-Shyuan Sheu |