CastLab

CAST Lab
Circuits, Architecture, Systems, Technology

We aim to advance modern computer systems based on specialized hardware in the post-Moore’s law era. We conduct research in various fields of hardware design such as computer architecture, VLSI, FPGA, hardware/software co-design, and processing-in-memory with holistic design approach to improve overall system performance. Our current mission is to build a high-performance and scalable computing platform for future AI applications.

READ MORE

AI Accelerators

Machine learning (ML), the study of algorithms that enable artificial intelligence (AI), has become the prominent computing paradigm as it revolutionizes how computers handle cognitive tasks based on a massive amount of observed data. With more industries adopting this technology, we face growing demand for hardware support that achieves high-performance...

Multi-FPGA Systems

Cloud computing is rapidly changing how enterprises run their services by offering a virtualized computing infrastructure over the internet. Datacenter is a powerhouse behind cloud computing, which physically hosts millions of computer servers, communication cables, and data storages. Recently, as the number of services using AI in data centers is increasing...

Processing-in-Memory

Traditionally, CPU is the center of the computing systems that executes arithmetic and logic calculation, while memory is built around it to simply load and store the data. Today, compute unit is executing operations faster than the memory unit can load and store the required data due to technology scaling. Therefore, compute unit is no longer the most time-consuming...

Near-Data Processing

Near-data processing (NDP) is another alternative to address the expensive data movement problem of traditional compute-centric model. It refers to augmenting the memory or the storage with processing power. By placing computing capabilities directly on the memory or the storage, data is allowed to be processed in place, which significantly reduces data movement...

Lastest News

View All

Lastest Publications

View All

Selected Publications

Please see the following selected publications to learn more about CastLab’s research.

A Cloud-Scale Acceleration Architecture, International Symposium on Microarchitecture (MICRO), 2016 link

Toward Accelerating Deep Learning at Scale Using Specialized Logic, Hot Chips: A Symposium on High Performance Chips (HOTCHIPS) 2015 link

A 201.4GOPS 496mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine, IEEE Journal of Solid-State Circuits (JSSC), 2010 link

CAST Lab
Circuits, Architecture, Systems, Technology

AI Accelerators

Multi-FPGA Systems

Processing-in-Memory

Near-Data Processing

Lastest News

[ESSERC 2025] Donghyuk Kim and In-Jun Jung’s paper on D3TA: 38.9TOPS/W Transformer Accelerator with Dual-Port 3T-eDRAM Digital Compute-In-Memory Using HyperAttention and Triple-Sparsity-Handling is accepted

[TC 2025] Wontak Han’s paper on SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation is accepted

[VLSI 2025] Jung-Hoon Kim’s paper on Adelia: A 4nm LLM Accelerator with Streamlined Dataflow and Dual-Mode Parallelization for Efficient Generative AI Inference is accepted

[ISCA 2025] Seungjae Moon and Junseo Cha’s paper on Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window is accepted

[ISCA 2025] Sungmin Hong’s paper on Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization is accepted

[ISCA 2025] Seunghee Han and Soongyu Choi’s paper on Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization is accepted

[ISPASS 2025] Junsoo Kim’s paper on ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput is accepted

[DAC 2025] Sungwoong Yune’s paper on ABC-FHE: A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption is accepted

[AAAI 2025] Yi Chen’s paper on AoP-SAM: Automation of Prompts for Efficient Segmentation is accepted

[JSSC 2025] Sukbin Lim’s paper on Hawkeye: A Point Cloud Neural Network Processor with Virtual Pillar and Quadtree-based Workload Management for Real-Time Outdoor BEV Detection is accepted

Lastest Publications

“D3TA: 38.9TOPS/W Transformer Accelerator with Dual-Port 3T-eDRAM Digital Compute-In-Memory Using HyperAttention and Triple-Sparsity-Handling” European Solid-State Electronics Research Conference(ESSERC), 2025

“SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text Generation” IEEE Transactions on Computers (TC), 2025

“Adelia: A 4nm LLM Accelerator with Streamlined Dataflow and Dual-Mode Parallelization for Efficient Generative AI Inference” IEEE Symposium on VLSI Technology Circuits (VLSI), 2025

“Hybe: GPU-NPU Hybrid System for Efficient LLM Inference with Million-Token Context Window” ACM/IEEE International Symposium on Computer Architecture (ISCA), 2025

“Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization” ACM/IEEE International Symposium on Computer Architecture (ISCA), 2025

“LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization” ACM/IEEE International Symposium on Computer Architecture (ISCA), 2025

“ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput” IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2025

Selected Publications

A Cloud-Scale Acceleration Architecture, International Symposium on Microarchitecture (MICRO), 2016 link

Toward Accelerating Deep Learning at Scale Using Specialized Logic, Hot Chips: A Symposium on High Performance Chips (HOTCHIPS) 2015 link

A 201.4GOPS 496mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine, IEEE Journal of Solid-State Circuits (JSSC), 2010 link

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, International Symposium on Computer Architecture (ISCA), 2014 link

Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-aware Task Pipelining, IEEE Micro, Vol. 29, No. 6, 2009 link

Research Partners

CAST LabCircuits, Architecture, Systems, Technology

AI Accelerators

Multi-FPGA Systems

Processing-in-Memory

Near-Data Processing

Lastest News

Lastest Publications

Selected Publications

A Cloud-Scale Acceleration Architecture, International Symposium on Microarchitecture (MICRO), 2016 link

Toward Accelerating Deep Learning at Scale Using Specialized Logic, Hot Chips: A Symposium on High Performance Chips (HOTCHIPS) 2015 link

A 201.4GOPS 496mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine, IEEE Journal of Solid-State Circuits (JSSC), 2010 link

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, International Symposium on Computer Architecture (ISCA), 2014 link

Real-Time Object Recognition with Neuro-Fuzzy Controlled Workload-aware Task Pipelining, IEEE Micro, Vol. 29, No. 6, 2009 link

Research Partners

CAST Lab
Circuits, Architecture, Systems, Technology