Publication

Conference Papers

2025
63
ISSCC
TOP-TIER
HuMoniX: A 57.3 fps 12.8 TFLOPS/W Text-to-Motion Processor with Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity Circuit
IEEE International Solid-State Circuits Conference (ISSCC) 2025
Jaehoon Heo, Adiwena Putra, Sungwoong Yune, Jieon Yoon, Hangyeol Lee, Jihoon Kim, Joo-Young Kim
2024
62
MICRO
TOP-TIER
AdapTiV: Sign-Similarity based Image-Adaptive Token Merging for Vision Transformer Acceleration Architecture
ACM/IEEE International Symposium on Microarchitecture (MICRO), 2024
*Seungjae Yoo, *Hangyeol Kim, Joo-Young Kim (*equal contribution)
61
ICCAD
TOP-TIER
APINT: A Full-Stack Framework for Acceleration of Privacy-Preserving Inference of Transformers based on Garbled Circuits Architecture Automation
ACM/IEEE International Conference on Computer-Aided Design(ICCAD), 2024
Hyunjun Cho, Jaeho Jeon, Jaehoon Heo, Joo-Young Kim
60
ESSERC
MAJOR
A 28nm 4.96 TOPS/W End-to-End Diffusion Accelerator with Reconfigurable Hyper-Precision Unified Non-Matrix Processing Engine Circuit
European Solid-State Electronics Research Conference(ESSERC), 2024
Sungyeob Yoo, Geonwoo Ko, Seri Ham, Seeyeon Kim, Yi Chen, Joo-Young Kim
59
ESSERC
MAJOR
DPIM: A 19.36TOPS/W 2T1C eDRAM Transformer-in-Memory Chip with Sparsity-Aware Quantization Heterogeneous Dense-Sparse Core Circuit
European Solid-State Electronics Research Conference(ESSERC), 2024
Donghyuk Kim, Jae-Young Kim, Hyunjun Cho, Seungjae Yoo, Sukjin Lee, Sungwoong Yune, Hoichang Jeong, Keonhee Park, Ki-soo Lee, Jongchan Lee, Chanheum Han, Gunmo Koo, Yuli Han, Jaejin Kim, Jaemin Kim, Kyuho Lee, Joo-Hyung Chae, Kunhee Cho, Joo-Young Kim
58
HotChips
Picasso: An Area/Energy-Efficient End-to-End Diffusion Accelerator with Hyper-Precision Data Type Circuit
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2024
Sungyeob Yoo, Geonwoo Ko, Seri Ham, Seeyeon Kim, Yi Chen, Joo-Young Kim
57
ISCA
TOP-TIER
BLESS: Bandwidth Locality Enhanced SMEM Seeding Acceleration for DNA Sequencing Architecture
ACM/IEEE International Symposium on Computer Architecture (ISCA), 2024
*Seunghee Han, *Seungjae Moon, Teokkyu Suh, Jaehoon Heo, Joo-Young Kim (*equal contribution)
56
CICC
MAJOR
A 38.5TOPS/W Point Cloud Neural Network Processor with Virtual Pillar Quadtree-based Workload Management for Real-Time Outdoor BEV Detection Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2024
Sukbin Lim, Jaehoon Heo, Jinho Yang, Joo-Young Kim
55
HPCA
TOP-TIER
Morphling: A Throughput-Maximized TFHE-based Accelerator using Transform-domain Reuse Architecture
IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024
Prasetiyo, Adiwena Putra, Joo-young Kim
54
ASP-DAC
MAJOR
ACane: An Efficient FPGA-based Embedded Vision Platform with Accumulation-as-Convolution Packing for Autonomous Mobile Robots Architecture
Asia South Pacific Design Automation Conference (ASP-DAC), 2024
Jinho Yang, Sungwoong Yune, Sukbin Lim, Donghyuk Kim, Joo-Young Kim
2023
53
MICRO
TOP-TIER
Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping Architecture
IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023
Adiwena Putra, Prasetiyo, Yi Chen, John Kim, Joo-Young Kim
52
ICCAD
TOP-TIER
PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads FPGA Architecture Automation
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023
Jaehoon Heo, Yongwon Shin, Sangjin Choi, Sungwoong Yune, Jung-Hoon Kim, Hyojin Sung, Youngjin Kwon, Joo-Young Kim
51
ESSCIRC
MAJOR
JNPU: A 1.04TFLOPS Joint-DNN Training Processor with Speculative Cyclic Quantization Triple Heterogeneity on Microarchitecture / Precision / Dataflow Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2023
Je Yang, Sukbin Lim, Sukjin Lee, Jae-Young Kim, Joo-Young Kim
50
HotChips
HyperAccel LPU: Accelerating Hyperscale Models for Generative AI FPGA Architecture HyperAccel
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2023
Seungjae Moon, Junsoo Kim, Jung-Hoon Kim, Junseo Cha, Gyubin Choi, Seongmin Hong, Joo-Young Kim
49
VLSI
TOP-TIER
SP-PIM: A 22.41TFLOPS/W, 8.81Epochs/Sec Super-Pipelined Processing-In-Memory Accelerator with Local Error Prediction for On-Device Learning Circuit
Symposium on VLSI Technology Circuits (VLSI), 2023
*Jung-Hoon Kim, *Jaehoon Heo, Wontak Han, Jaeuk Kim, Joo-Young Kim (*equal contribution)
48
CICC
MAJOR
A 26.55TOPS/W Explainable AI Processor with Dynamic Workload Allocation Heat Map Compression/Pruning Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2023
Junsoo Kim, Geonwoo Ko, Ji-Hoon Kim, Changha Lee, Taewoo Kim, Chan-Hyun Yoon, Joo-Young Kim
47
HPCA
TOP-TIER
LightTrader: A Standalone AI-enabled High-Frequency Trading System with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators Architecture Rebellions
IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023
Sungyeob Yoo*, Hyunsung Kim*, Jinseok Kim, Sunghyun Park, Joo-Young Kim, Jinwook Oh (*equal contribution)
2022
46
FPT
MAJOR
LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning Architecture FPGA
The International Conference on Field Programmable Technology (FPT), 2022
Je Yang, Jaeuk Kim, Joo-Young Kim
45
ASSCC
MAJOR
A 409.6 GOPS 204.8 GFLOPS Mixed-Precision Vector Processor System for General-Purpose Machine Learning Acceleration Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2022
Jung-Hoon Kim, Sukjin Lee, Seungjae Moon, Sungyeob Yoo, Joo-Young Kim
44
MICRO
TOP-TIER
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation Architecture FPGA Naver
IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022
Seongmin Hong, Seungjae Moon, Junsoo Kim, Sungjae Lee, Minsub Kim, Dongsoo Lee, Joo-Young Kim
43
HotChips
LightTrader: World's first AI-enabled High-Frequency Trading Solution with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators Architecture Rebellions
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2022
Hyunsung Kim*, Sungyeob Yoo*, Jaewan Bae, Kyeongryeol Bong, Yoonho Boo, Karim Charfi, Hyo-Eun Kim, Hyun Suk Kim, Jinseok Kim, Byungjae Lee, Jaehwan Lee, Myeongbo Shim, Sungho Shin, Jeong Seok Woo, Joo-Young Kim, Sunghyun Park, Jinwook Oh (*equal contribution)
42
HotChips
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation FPGA Architecture Naver
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2022
Seongmin Hong, Seungjae Moon, Junsoo Kim, Sungjae Lee, Minsub Kim, Dongsoo Lee, Joo-Young Kim
41
HotChips
Trinity: End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics FPGA Architecture Microsoft Samsung
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2022
Ji-Hoon Kim, Seunghee Han, Kwanghyun Park, Soo-Young Ji, Joo-Young Kim
40
FPL
MAJOR
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure FPGA Flapmax
IEEE International Conference on Field Programmable Logic Applications (FPL), 2022
Yashael Faith Arthanto, David Ojika, Joo-Young Kim
39
FCCM
TOP-TIER
A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval FPGA Amazon
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2022
Yeo-Reum Park, Ji-Hoon Kim, Jaeyoung Do, Joo-Young Kim
38
FCCM
TOP-TIER
An Open-Source Shell Generation Framework for High-Performance Design on Multi-Die FPGAs FPGA
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2022
Gyeongcheol Shin, Junsoo Kim, Joo-Young Kim
37
CICC
MAJOR
T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2022
Jaehoon Heo, Junsoo Kim, Wontak Han, Sukbin Lim, Joo-Young Kim
2021
36
ACSMD
A Heterogeneous Vector-Array Architecture with Resource Scheduling for Multi-User/Multi-DNN Workloads Architecture
Architecture, Compiler, System Support for Multi-model DNN Workloads (ACSMD) Workshop, 2021 (MICRO Workshop)
Sungyeob Yoo, Jung-Hoon Kim, Joo-Young Kim
35
FCCM
TOP-TIER
Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device FPGA Samsung
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021
Ji-Hoon Kim, Yeo-Reum Park, Jaeyoung Do, Soo-Young Ji, Joo-Young Kim
34
DAC
TOP-TIER
FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training Adaptive Parallelism Architecture Automation
ACM/IEEE Design Automation Conference (DAC), 2021
Je Yang, Seongmin Hong, Joo-Young Kim
2020
33
VLSI
TOP-TIER
Z-PIM: An Energy-Efficient Sparsity-Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2020
Ji-Hoon Kim, Juhyoung Lee, Jinsu Lee, Hoi-Jun Yoo, Joo-Young Kim
Before 2020
32
MICRO
TOP-TIER
A Cloud-Scale Acceleration Architecture Architecture
International Symposium on Microarchitecture (MICRO), 2016
Adrian Caulfield, Eric Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, Doug Burger
31
HOTCHIPS
Toward Accelerating Deep Learning at Scale Using Specialized Logic Circuit
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2015
Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, Eric Chung
30
FCCM
TOP-TIER
A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs FPGA
International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2015
Jeremy Fowers, Joo-Young Kim, Scott Hauck, Doug Burger
29
ISCA
TOP-TIER
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services Architecture
International Symposium on Computer Architecture (ISCA), 2014
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James R. Larus, Eric Peterson, Gopi Prashanth, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger
28
ASAP
Energy Efficient Canonical Huffman Encoding Architecture
International Conference on Application-specific Systems, Architectures Processors (ASAP), 2014
Janarbek Matai, Joo-Young Kim, Ryan Kastner
27
FCCM
TOP-TIER
A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer FPGA
International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2014
Joo-Young Kim, Scott Hauck, Doug Burger
26
CICC
MAJOR
Intelligent NoC with Neuro-Fuzzy Bandwidth Regulation for a 51 IP Object Recognition Processor Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2010
Seungjin Lee, Jinwook Oh, Minsu Kim, Junyoung Park, Joonsoo Kwon, Joo-Young Kim, Hoi-Jun Yoo
25
VLSI
TOP-TIER
A 1.2mW On-Line Learning Mixed Mode Intelligent Inference Engine for Robust Object Recognition Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2010
Jinwook Oh, Seungjin Lee, Minsu Kim, Joonsoo Kwon, Junyoung Park, Joo-Young Kim, Hoi-Jun Yoo
24
COOLCHIPS
A 36 Heterogeneous Core Architecture with Resource-Aware Fine-grained Task Scheduling for Feedback Attention based Object Recognition Circuit
IEEE Symposium on Low-Power High-Speed Chips (COOLCHIPS), 2010
Seungjin Lee, Jinwook Oh, Minsu Kim, Joonyoung Park, Joonsoo Kwon, Joo-Young Kim, Hoi-Jun Yoo
Before 2010
23
ESSCIRC
MAJOR
A 118.4GB/s Multi-Casting Network-on-Chip for Real-Time Object Recognition Processor Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2009
Joo-Young Kim, Kwanho Kim, Minsu Kim, Seungjin Lee, Jinwook Oh, Hoi-Jun Yoo
22
ISLPED
A 60fps 496mW Multi-Object Recognition Processor with Workload-Aware Dynamic Power Management Circuit
ACM/IEEE International Symposium on Low Power Electronics Design (ISLPED), 2009
Joo-Young Kim, Seungjin Lee, Jinwook Oh, Minsu Kim, Hoi-Jun Yoo
21
VLSI
TOP-TIER
A 22.8GOPS 2.83mW Neuro-fuzzy Object Detection Engine for Fast Multi-Object Recognition Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2009
Minsu Kim, Joo-Young Kim, Seungjin Lee, Jinwook Oh, Hoi-Jun Yoo
20
COOLCHIPS
An Energy Efficient Real-Time Object Recognition Processor with Neuro-Fuzzy Controlled Task Pipelining Circuit
IEEE Symposium on Low- Power High-Speed Chips (COOLCHIPS), 2009
Joo-Young Kim, Minsu Kim, Seungjin Lee, Jinwook Oh, Kwanho Kim, Hoi-Jun Yoo
19
ISSCC
TOP-TIER
A 201.4GOPS 496mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine Circuit
IEEE International Solid-State Circuits Conference (ISSCC), 2009
Joo-Young Kim, Minsu Kim, Seungjin Lee, Jinwook Oh, Kwanho Kim, Sejong Oh, Jeong-Ho Woo, Donghyun Kim, Hoi-Jun Yoo
18
ASSCC
MAJOR
A 66fps 38mW Nearest Neighbor Matching Processor with Hierarchical VQ Algorithm for Real-Time Object Recognition Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2008
Joo-Young Kim, Kwanho Kim, Seungjin Lee, Minsu Kim, Hoi-Jun Yoo
17
ASSCC
MAJOR
A 76.8 GB/s 46 mW Low-latency Network-on-Chip for Real-time Object Recognition Processor Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2008
Kwanho Kim, Joo-Young Kim, Seungjin Lee, Minsu Kim, Hoi-Jun Yoo
16
ESSCIRC
MAJOR
A 211 GOPS/W Dual-Mode Real-Time Object Recognition Processor with Network-on-Chip Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2008
Kwanho Kim, Joo-Young Kim, Seungjin Lee, Minsu Kim, Hoi-Jun Yoo
15
VLSI
TOP-TIER
The Brain Mimicking Visual Attention Engine: An 80x60 Digital Cellular Neural Network for Rapid Global Feature Extraction Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2008
Seungjin Lee, Kwanho Kim, Minsu Kim, Joo-Young Kim, Hoi-Jun Yoo
14
DAC
TOP-TIER
Vision Platform for Mobile Intelligent Robots Based on 81.6 GOPS Objects Recognition Processor Automation
ACM Design Automation Conference (DAC), 2008
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, Hoi-Jun Yoo
13
ISCAS
A 0.6pJ/b 3Gb/s/ch Transceiver in 0.18 um CMOS for 10mm On-chip interconnects Circuit
IEEE International Symposium on Circuit Systems (ISCAS), 2008
Joonsung Bae, Joo-Young Kim, Hoi-Jun Yoo
12
ISSCC
TOP-TIER
A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual Attention Engine Circuit
IEEE International Solid-State Circuits Conference (ISSCC), 2008
Kwanho Kim, Seungjin Lee, Joo-Young Kim, Minsu Kim, Donghyun Kim, Jeong-Ho Woo, Hoi-Jun Yoo
11
ASSCC
MAJOR
Bitwise Competition Logic for Compact Digital Comparator Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2007
Joo-Young Kim, Hoi-Jun Yoo
10
ASSCC
MAJOR
Implementation of Memory-Centric NoC for 81.6 GOPS Object Recognition Processor Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2007
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, Hoi-Jun Yoo
9
ESSCIRC
MAJOR
Visual Image Processing RAM for Fast 2-D Data Location Search Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2007
Joo-Young Kim, Donghyun Kim, Seungjin Lee, Kwanho Kim, Hoi-Jun Yoo
8
CICC
MAJOR
An 81.6 GOPS Object Recognition Processor Based on NoC Visual Image Processing Memory Circuit
IEEE Custom Circuits Conference (CICC), 2007
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, Hoi-Jun Yoo
7
NOCS
Solutions for Real Chip Implementation Issues of NoC Their Application to Memory-Centric NoC Circuit
IEEE International Symposium on Network-on-Chip (NOCS), 2007
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, Hoi-Jun Yoo
6
ISCAS
A 372ps 64-bit Adder using Fast Pull-up Logic in 0.18-um CMOS Circuit
IEEE International Symposium on Circuit Systems (ISCAS), 2006
Joo-Young Kim, Kangmin Lee, Hoi-Jun Yoo
5
ASSCC
MAJOR
A TCAM-based Periodic Event Generator for Multi-Node Management in the Body Sensor Network Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2006
Sungdae Choi, Kyomin Sohn, Jooyoung Kim, Jerald Yoo, Hoi-Jun Yoo
4
ASSCC
MAJOR
A 0.6-V, 6.8-uW Embedded SRAM for Ultra-low Power SoC Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2006
Kyomin Sohn, Sungdae Choi, Jeong-Ho Woo, Jooyoung Kim, Hoi-Jun Yoo
3
ESSCIRC
MAJOR
A 24.2-uW Dual-Mode Human Body Communication Controller for Body Sensor Network Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2006
Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim, Jooyoung Kim, Namjun Cho, Jeong-Ho Woo, Jerald Yoo, Hoi-Jun Yoo
2
CICC
MAJOR
A Multi-Nodes Human Body Communication Sensor Network Control Processor Circuit
IEEE Custom Circuits Conference (CICC), 2006
Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim, Jooyoung Kim, Namjun Cho, Jeong-Ho Woo, Jerald Yoo, Hoi-Jun Yoo
1
ISWC
A Low-power Star-topology Body Area Network Controller for Periodic Data Monitoring Around Inside the Human Body Circuit
IEEE International Symposium on Wearable Computers (ISWC), 2006
Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim, Jooyoung Kim, Jerald Yoo, Hoi-Jun Yoo