|Seoul National University
|Presenter : Jeongwoo Park
Title : A low power neural network training processor with 8-bit floating point with a shared exponent bias and fused multiply add trees
This paper presents an 8-bit floating point training processor for state-of-the-art non-sparse neural networks. The processor implements hardware design techniques such as using n-way fused multiply-add trees with lossless representations, flexible 2-D routing schemes, and hardware-based memory prefetching. When trained from scratch, the implemented training scheme achieves 69.0% ResNet-18 Top-1 accuracy on ImageNet, and matches the performance of full-precision baseline models in other tasks such as image super-resolution and language modeling. In ResNet-18 training, the fabricated processor requires 43% less memory access due to accessing external memory with only 8-bit tensors, and requires only 37% of memory space in DRAM compared to prior work. The fabricated and measured processor in 40nm LPCMOS technology achieves 4.81TFLOPS/W energy efficiency and 2.48× higher training efficiency than prior work. The functionality and merits of the processor will be shown in real-time in this live demonstration, where 4 tasks will be shown in the fabricated processor complete with host PC controller, FPGA serving as memory bridge, and interactive GUI.
|Infineon Technologies Dresden
|Presenter : Jiaxin Huang
Title : Spiking neural network based real-time radar gesture recognition
This live demo aims at continuously real-time classifying radar gesture signals from the real world with the neuromorphic hardware SpiNNaker 2 prototype to play the game. With the 10 MHz operation frequency on SpiNNaker 2 FPGA, the closed-loop setup realizes around 35 ms delay from PC sending input data to receiving classification output, and there is nearly no feeling of apparent delay when testers are playing the game. Energy consumption per frame on SpiNNaker 2 is 3.29 µJ, and the operation cycle accounts for less than 8 k. Even if our current middleware has not considered balanced work loading among different processing cores, the tightly couple memory usage on the heaviest loaded processing element is less than half of the total 128 kB available memory space based on the directly trained gesture recognition spiking neural network (SNN) model with 2048 input neurons, 5 hidden neurons, and 4 classification outputs.
|Presenter : Donghyeon Han
Title : A DNN Training Processor for Robust Object Detection with Real-World Environmental Adaptation
Lightweight DNN is essential for energy-efficient DNN execution in mobile/edge devices. However, it suffers from significant accuracy degradation when it is applied to the new environment. In other words, the lightweight DNN loses generality due to its low network capacity. In particular, the mobile-oriented DNNs do not work properly in unexpected situations such as camera malfunction. Therefore, accuracy compensation for unpredictable accidents is important to prevent critical system damage. Real-time online DNN tuning is a promising solution to compensate accuracy of the lightweight network while maintaining its hardware benefits. In this demonstration, we demonstrate online tuning-based lightweight object detection execution based on our proposed processor and systems. The proposed processor successfully demonstrates 46.6 FPS object detection with 0.95 mJ/frame energy consumption which is the state-of-the-art performance compared with the existing processors.
|Nanyang Technological University
|Presenter : Yuncheng Lu
Title : A 181μW Real-Time 3-D Hand-Gesture Recognition System for Edge Applications
This demonstration presents an ultra-low-power real-time 3-D hand gesture recognition system for edge scenarios. The rotation-resistant features are extracted through bi-directional convolution, and the gestures are classified based on an iteration-free feature clustering scheme. The computing-intensive units for gesture recognition are adaptively gated according to the input data patterns for power saving. The proposed system can recognize 9 static and 20 dynamic hand gestures with an average accuracy of 94.4% and 98.6%, respectively. Besides, it can track the fingertips in real-time. The measurement result shows the prototype chip achieves the lowest power of 181 μW at 0.6 V and 25 MHz.
|National Central University
|Presenter : Yi-Jhen Luo
Title : Home Appliance Control System with Dynamic Hand Gesture Recognition base on 3D Hand Skeletons
In this paper, we present a two-stage lightweight convolutional neural network architecture on hand gesture recognition for home appliance control system. At the first stage, we utilize DetNet to detect the hand and generate 3D hand skeleton locations. At the second stage, a skeleton-based dynamic hand gesture recognition model is developed. We have 99.4% accuracy by the trained CNN model with our testing dataset. Besides, we implement this system on the Nvidia Jetson AGX Xavier to control the on/off of the fan and the light and the overall system achieve 15 fps.
|National Central University
|Presenter : Chun-Lin Lee
Title : An Edge-Optimized Incremental Learning Algorithm For Audio Classification
In the proposed demo, we would like to show the incremental learning for audio classification using an embedded system. Figure 1 shows the overview of the data processing based on our proposed incremental learning algorithm. The proposed incremental learning algorithm can increase the capability of DNN model to classify new audio sounds which are not included in the base model. In the proposed system, the audio data is gathered at the edge device, and DNN model is trained using our proposed algorithm to learn about the new classes while retaining the knowledge about previous classes as well.
|Presenter : Suk-Ju Kang
Title : Deep Learning-based Real-time Segmentation for Edge Computing Devices
Recently, due to the rapid improvement of artificial intelligence technology, numerous studies are considered to solve various problems using deep learning. Typical deep neural networks for semantic segmentation require the high computation with a large capacity to extract abundant amounts of contextual information for accurate prediction. Our live demonstration will show real-time semantic segmentation operation on an NVIDIA Jetson-Xavier board with the BiSeNet-based method compressed using a novel knowledge distillation method.
|National Tsing Hua University
|Presenter : Chih-Tsun Huang
Title : Fast DNN-based Mechatronics Prototyping Platform on on Robotic Arm Control
In industrial applications, a robotic controller requires a low-latency computation process for real-time constraints. In the meantime, more controllers are designed with DNN-based reinforcement learning, which needs increasing computation power. In this demo, we developed a fast prototyping infrastructure in AI-based mechatronics. Our software/hardware co-optimization incorporates a cyber-physical system (CPS), a host computer, and a DNN-based accelerator on an FPGA. The holistic accelerator is built upon the ESP SoC (System-on-Chip) platform with the high-level synthesis (HLS) technique and an improved interface. Our demonstration on an intelligent robotic arm showcases 101 times speedup over a CPU-based software implementation.
|Presenter : Chae Eun Rhee
Title : Memory-Efficient Hardware Design for a Real-Time Convolutional Encoder-Decoder Network
This work presents a FPGA-based convolutional-neural-network (CNN)-based encoder-decoder accelerator for interpolation of high-resolution images. The baseline model is DVF. The proposed system is demonstrated on Virtex UltraScale+ HBM VCU128 evaluation kit. The performance of the proposed hardware is 1.4 TOPS with the operating clock frequency of 200MHz at 75% PE utilization. (42 GOPS×31) =1331 GOPS required for interpolating 2K@30fps videos to 60fps is sufficiently satisfied.
|Presenter : Doyoung Kim
Title : Efficient Deep Learning Algorithm for Alzheimer’s Disease Diagnosis using Retinal Images
This live demonstration presents Alzheimer’s Disease (AD) detection using deep learning based smartphone application. The deep learning model is based on MobileNetV3 as a backbone, and an Attention mechanism is applied. However, we modified the backbone structure into U-Net-like architecture to perform better. Furthermore, we modified the conventional Attention mechanism to the Weighted Attention mechanism. The masking-adding process has been applied in the training method of the model.