AI Accelerators

“Design Next Generation
AI Hardware Platform”

Machine learning (ML), the study of algorithms that enable artificial intelligence (AI), has become the prominent computing paradigm as it revolutionizes how computers handle cognitive tasks based on a massive amount of observed data. With more industries adopting this technology, we face growing demand for hardware support that achieves high-performance and energy-efficient processing for the required comptuations.

In AI Accelerator research, we provide design space exploration (DSE) of highly scalable heterogeneous architecture and various system solutions with SW/HW co-design approach for next generation AI/ML scenarios. We build a simulation framework for these solutions, then implement them on physical devices (i.e., FPGA and ASIC) for development, debugging, and deployment of AI Accelerators.

Multi-Deep Neural Network

Recently, AI applications, such as metaverse and autonomous vehicles, use multi-deep neural network (multi-DNN) models to solve complex problems. Therefore, supporting efficient multi-DNN execution in datacenters is essential.

Handling multi-DNN execution with conventional AI accelerators suffers from the underutilization of the processing element and external memory bandwidth due to a mismatch between hardware’s native capability and requested model requirements. We propose a novel accelerator that devise optimized multi-DNN dataflows and scheduling methods that maximize hardware utilization to overcome the cost inefficiency in datacenters.

Related Publications:

A Heterogeneous Vector-Array Architecture with Resource Scheduling for Multi-User/Multi-DNN Workloads (ACSMD 2021 MICRO Workshop)

Convolutional Neural Network

Starting with the radical growth of convolutional neural network (CNN), various applications such as image classification, object detection, and semantic segmentation have been developed. CNN extracts the features of the input image or text by iterating through several layers of mathematical operations between the input data and trained weights. The number of layers is increasing to achieve better accuracy and more sophistication in the applications, so large amount of computation is required.

Since the operations performed in CNN are relatively simple and repetitive, research is actively being conducted to accelerate these CNN operation with optimized hardware architectures. We specifically integrate massively parallel dataflow with SW/HW co-design approach to design an AI accelerator for CNN that achieves energy-efficiency and high-performance.

Reinforcement Learning

Deep Reinforcement learning (DRL) is a promising area of machine learning that studies how an agent should take actions in an environment in order to maximize a long-term cumulative reward. DRL stands out in various domains such as game playing, industrial control, and robotics as it supports autonomous adaptation of unknown environment.

Unlike conventional supervised learning, reinforcement learning generates training sample by its own inference. We address computationally expensive DRL training with lightweight deep learning algorithm like quantization, pruning and knowledge distillation in the accelerator.