Live Demo

Live Demo
Seoul National University Presenter : Jeongwoo Park
Title : A low power neural network training processor with 8-bit floating point with a shared exponent bias and fused multiply add trees
Abstract
This paper presents an 8-bit floating point training processor for state-of-the-art non-sparse neural networks. The processor implements hardware design techniques such as using n-way fused multiply-add trees with lossless representations, flexible 2-D routing schemes, and hardware-based memory prefetching. When trained from scratch, the implemented training scheme achieves 69.0% ResNet-18 Top-1 accuracy on ImageNet, and matches the performance of full-precision baseline models in other tasks such as image super-resolution and language modeling. In ResNet-18 training, the fabricated processor requires 43% less memory access due to accessing external memory with only 8-bit tensors, and requires only 37% of memory space in DRAM compared to prior work. The fabricated and measured processor in 40nm LPCMOS technology achieves 4.81TFLOPS/W energy efficiency and 2.48× higher training efficiency than prior work. The functionality and merits of the processor will be shown in real-time in this live demonstration, where 4 tasks will be shown in the fabricated processor complete with host PC controller, FPGA serving as memory bridge, and interactive GUI.

 

Infineon Technologies Dresden Presenter : Jiaxin Huang
Title : Spiking neural network based real-time radar gesture recognition
Abstract
This live demo aims at continuously real-time classifying radar gesture signals from the real world with the neuromorphic hardware SpiNNaker 2 prototype to play the game. With the 10 MHz operation frequency on SpiNNaker 2 FPGA, the closed-loop setup realizes around 35 ms delay from PC sending input data to receiving classification output, and there is nearly no feeling of apparent delay when testers are playing the game. Energy consumption per frame on SpiNNaker 2 is 3.29 µJ, and the operation cycle accounts for less than 8 k. Even if our current middleware has not considered balanced work loading among different processing cores, the tightly couple memory usage on the heaviest loaded processing element is less than half of the total 128 kB available memory space based on the directly trained gesture recognition spiking neural network (SNN) model with 2048 input neurons, 5 hidden neurons, and 4 classification outputs.

 

KAIST Presenter : Donghyeon Han
Title : A DNN Training Processor for Robust Object Detection with Real-World Environmental Adaptation
Abstract
Lightweight DNN is essential for energy-efficient DNN execution in mobile/edge devices. However, it suffers from significant accuracy degradation when it is applied to the new environment. In other words, the lightweight DNN loses generality due to its low network capacity. In particular, the mobile-oriented DNNs do not work properly in unexpected situations such as camera malfunction. Therefore, accuracy compensation for unpredictable accidents is important to prevent critical system damage. Real-time online DNN tuning is a promising solution to compensate accuracy of the lightweight network while maintaining its hardware benefits. In this demonstration, we demonstrate online tuning-based lightweight object detection execution based on our proposed processor and systems. The proposed processor successfully demonstrates 46.6 FPS object detection with 0.95 mJ/frame energy consumption which is the state-of-the-art performance compared with the existing processors.

 

Nanyang Technological University Presenter : Yuncheng Lu
Title : A 181μW Real-Time 3-D Hand-Gesture Recognition System for Edge Applications
Abstract
This demonstration presents an ultra-low-power real-time 3-D hand gesture recognition system for edge scenarios. The rotation-resistant features are extracted through bi-directional convolution, and the gestures are classified based on an iteration-free feature clustering scheme. The computing-intensive units for gesture recognition are adaptively gated according to the input data patterns for power saving. The proposed system can recognize 9 static and 20 dynamic hand gestures with an average accuracy of 94.4% and 98.6%, respectively. Besides, it can track the fingertips in real-time. The measurement result shows the prototype chip achieves the lowest power of 181 μW at 0.6 V and 25 MHz.

 

National Central University Presenter : Yi-Jhen Luo
Title : Home Appliance Control System with Dynamic Hand Gesture Recognition base on 3D Hand Skeletons
Abstract
In this paper, we present a two-stage lightweight convolutional neural network architecture on hand gesture recognition for home appliance control system. At the first stage, we utilize DetNet to detect the hand and generate 3D hand skeleton locations. At the second stage, a skeleton-based dynamic hand gesture recognition model is developed. We have 99.4% accuracy by the trained CNN model with our testing dataset. Besides, we implement this system on the Nvidia Jetson AGX Xavier to control the on/off of the fan and the light and the overall system achieve 15 fps.

 

National Central University Presenter : Chun-Lin Lee
Title : An Edge-Optimized Incremental Learning Algorithm For Audio Classification
Abstract
In the proposed demo, we would like to show the incremental learning for audio classification using an embedded system. Figure 1 shows the overview of the data processing based on our proposed incremental learning algorithm. The proposed incremental learning algorithm can increase the capability of DNN model to classify new audio sounds which are not included in the base model. In the proposed system, the audio data is gathered at the edge device, and DNN model is trained using our proposed algorithm to learn about the new classes while retaining the knowledge about previous classes as well.

 

Sogang University Presenter : Suk-Ju Kang
Title : Deep Learning-based Real-time Segmentation for Edge Computing Devices
Abstract
Recently, due to the rapid improvement of artificial intelligence technology, numerous studies are considered to solve various problems using deep learning. Typical deep neural networks for semantic segmentation require the high computation with a large capacity to extract abundant amounts of contextual information for accurate prediction. Our live demonstration will show real-time semantic segmentation operation on an NVIDIA Jetson-Xavier board with the BiSeNet-based method compressed using a novel knowledge distillation method.

 

National Tsing Hua University Presenter : Chih-Tsun Huang
Title : Fast DNN-based Mechatronics Prototyping Platform on on Robotic Arm Control
Abstract
In industrial applications, a robotic controller requires a low-latency computation process for real-time constraints. In the meantime, more controllers are designed with DNN-based reinforcement learning, which needs increasing computation power. In this demo, we developed a fast prototyping infrastructure in AI-based mechatronics. Our software/hardware co-optimization incorporates a cyber-physical system (CPS), a host computer, and a DNN-based accelerator on an FPGA. The holistic accelerator is built upon the ESP SoC (System-on-Chip) platform with the high-level synthesis (HLS) technique and an improved interface. Our demonstration on an intelligent robotic arm showcases 101 times speedup over a CPU-based software implementation.

 

Inha University Presenter : Chae Eun Rhee
Title : Memory-Efficient Hardware Design for a Real-Time Convolutional Encoder-Decoder Network
Abstract
This work presents a FPGA-based convolutional-neural-network (CNN)-based encoder-decoder accelerator for interpolation of high-resolution images. The baseline model is DVF. The proposed system is demonstrated on Virtex UltraScale+ HBM VCU128 evaluation kit. The performance of the proposed hardware is 1.4 TOPS with the operating clock frequency of 200MHz at 75% PE utilization. (42 GOPS×31) =1331 GOPS required for interpolating 2K@30fps videos to 60fps is sufficiently satisfied.

 

Yale University Presenter : Abhishek Moitra
Title : Demo Demonstration Proposal for RoEdge: When Adversarial Robustness meets the Edge
Abstract
Recently, adversarial attacks have been shown to fool deep learning models causing a serious degradation in their reliability. However, little has been studied about the repercussions of adversarial attacks in edge systems. In this work, we show that adversarial attacks also decrease the energy-efficiency of edge systems by increasing the amount of non-useful work expended in the transmission and processing of adversarial data that are ultimately misclassified by the classifier. We propose to integrate a standalone detector at the edge that is capable of detecting and blocking transmission of adversarial samples to the classifier model. This increases the robustness and energy-efficiency of the system because of low data transmission and processing. Our adversarial detector has low resource overheads and is classifier model agnostic. Being classifier model agnostic, the detector can be deployed in a standalone manner to defend against adversarial attacks for a wide range of classifier models whose information is unknown. To validate our methodology, we implement the adversarial detector on a Raspberry Pi 4 platform. We report the ROC-AUC score, Error and Accuracy for a wide range of gradient, decision and score-based adversarial attacks across CIFAR10, CIFAR100 and TinyImagenet datasets. Our detector achieves a ROC-AUC score > 0.9, while improving the energy efficiency of the edge system by ~25%.

 

Ajou University Presenter : Doyoung Kim
Title : Efficient Deep Learning Algorithm for Alzheimer’s Disease Diagnosis using Retinal Images
Abstract
This live demonstration presents Alzheimer’s Disease (AD) detection using deep learning based smartphone application. The deep learning model is based on MobileNetV3 as a backbone, and an Attention mechanism is applied. However, we modified the backbone structure into U-Net-like architecture to perform better. Furthermore, we modified the conventional Attention mechanism to the Weighted Attention mechanism. The masking-adding process has been applied in the training method of the model.