Publication

Conference Papers

2024
57
ISCA
TOP-TIER
BLESS: Bandwidth and Locality Enhanced SMEM Seeding Acceleration for DNA Sequencing Architecture
ACM/IEEE International Symposium on Computer Architecture (ISCA), 2024
*Seunghee Han, *Seungjae Moon, Teokkyu Suh, Jaehoon Heo, Joo-Young Kim
56
CICC
MAJOR
A 38.5TOPS/W Point Cloud Neural Network Processor with Virtual Pillar and Quadtree-based Workload Management for Real-Time Outdoor BEV Detection Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2024
Sukbin Lim, Jaehoon Heo, Jinho Yang, Joo-Young Kim
55
HPCA
TOP-TIER
Morphling: A Throughput-Maximized TFHE-based Accelerator using Transform-domain Reuse Architecture
IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024
Prasetiyo, Adiwena Putra, and Joo-young Kim
54
ASP-DAC
MAJOR
ACane: An Efficient FPGA-based Embedded Vision Platform with Accumulation-as-Convolution Packing for Autonomous Mobile Robots Architecture
Asia and South Pacific Design Automation Conference (ASP-DAC), 2024
Jinho Yang, Sungwoong Yune, Sukbin Lim, Donghyuk Kim, and Joo-Young Kim
2023
53
MICRO
TOP-TIER
Strix: An End-to-End Streaming Architecture with Two-Level Ciphertext Batching for Fully Homomorphic Encryption with Programmable Bootstrapping Architecture
IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023
Adiwena Putra, Prasetiyo, Yi Chen, John Kim, and Joo-Young Kim
52
ICCAD
TOP-TIER
PRIMO: A Full-Stack Processing-in-DRAM Emulation Framework for Machine Learning Workloads Circuit
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023
Jaehoon Heo, Yongwon Shin, Sangjin Choi, Sungwoong Yune, Jung-Hoon Kim, Hyojin Sung, Youngjin Kwon, and Joo-Young Kim
51
ESSCIRC
MAJOR
JNPU: A 1.04TFLOPS Joint-DNN Training Processor with Speculative Cyclic Quantization and Triple Heterogeneity on Microarchitecture / Precision / Dataflow Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2023
Je Yang, Sukbin Lim, Sukjin Lee, Jae-Young Kim and Joo-Young Kim
50
HotChips
HyperAccel LPU: Accelerating Hyperscale Models for Generative AI FPGA Architecture HyperAccel
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2023
Seungjae Moon, Junsoo Kim, Jung-Hoon Kim, Junseo Cha, Gyubin Choi, Seongmin Hong and Joo-Young Kim
49
VLSI
TOP-TIER
SP-PIM: A 22.41TFLOPS/W, 8.81Epochs/Sec Super-Pipelined Processing-In-Memory Accelerator with Local Error Prediction for On-Device Learning Circuit
Symposium on VLSI Technology and Circuits (VLSI), 2023
*Jung-Hoon Kim, *Jaehoon Heo, Wontak Han, Jaeuk Kim and Joo-Young Kim (*equal contribution)
48
CICC
MAJOR
A 26.55TOPS/W Explainable AI Processor with Dynamic Workload Allocation and Heat Map Compression/Pruning Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2023
Junsoo Kim, Geonwoo Ko, Ji-Hoon Kim, Changha Lee, Taewoo Kim, Chan-Hyun Yoon, Joo-Young Kim
47
HPCA
TOP-TIER
LightTrader: A Standalone AI-enabled High-Frequency Trading System with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators Architecture Rebellions
IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023
Sungyeob Yoo, Hyunsung Kim, Jinseok Kim, Sunghyun Park, Joo-Young Kim, and Jinwook Oh
2022
46
FPT
MAJOR
LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning Architecture FPGA
The International Conference on Field Programmable Technology (FPT), 2022
Je Yang, Jaeuk Kim and Joo-Young Kim
45
ASSCC
MAJOR
A 409.6 GOPS and 204.8 GFLOPS Mixed-Precision Vector Processor System for General-Purpose Machine Learning Acceleration Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2022
Jung-Hoon Kim, Sukjin Lee, Seungjae Moon, Sungyeob Yoo, and Joo-Young Kim
44
MICRO
TOP-TIER
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation Architecture FPGA Naver
IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022
Seongmin Hong, Seungjae Moon, Junsoo Kim, Sungjae Lee, Minsub Kim, Dongsoo Lee, and Joo-Young Kim
43
HotChips
LightTrader: World's first AI-enabled High-Frequency Trading Solution with 16 TFLOPS / 64 TOPS Deep Learning Inference Accelerators Architecture Rebellions
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2022
Hyunsung Kim, Sungyeob Yoo, Jaewan Bae, Kyeongryeol Bong, Yoonho Boo, Karim Charfi, Hyo-Eun Kim, Hyun Suk Kim, Jinseok Kim, Byungjae Lee, Jaehwan Lee, Myeongbo Shim, Sungho Shin, Jeong Seok Woo, Joo-Young Kim, Sunghyun Park, and Jinwook Oh; Rebellions Inc.
42
HotChips
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation FPGA Architecture Naver
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2022
Seongmin Hong, Seungjae Moon, Junsoo Kim, Sungjae Lee, Minsub Kim, Dongsoo Lee, and Joo-Young Kim
41
HotChips
Trinity: End-to-End In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics FPGA Architecture Microsoft Samsung
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2022
Ji-Hoon Kim, Seunghee Han, Kwanghyun Park, Soo-Young Ji, and Joo-Young Kim
40
FPL
MAJOR
FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure FPGA Flapmax
IEEE International Conference on Field Programmable Logic and Applications (FPL), 2022
Yashael Faith Arthanto, David Ojika, and Joo-Young Kim
39
FCCM
TOP-TIER
A Dual-Mode Similarity Search Accelerator based on Embedding Compression for Online Cross-Modal Image-Text Retrieval FPGA Amazon
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2022
Yeo-Reum Park, Ji-Hoon Kim, Jaeyoung Do, and Joo-Young Kim
38
FCCM
TOP-TIER
An Open-Source Shell Generation Framework for High-Performance Design on Multi-Die FPGAs FPGA
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2022
Gyeongcheol Shin, Junsoo Kim, and Joo-Young Kim
37
CICC
MAJOR
T-PIM: A 2.21-to-161.08TOPS/W Processing-In-Memory Accelerator for End-to-End On-Device Training Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2022
Jaehoon Heo, Junsoo Kim, Wontak Han, Sukbin Lim, and Joo-Young Kim
2021
36
ACSMD
A Heterogeneous Vector-Array Architecture with Resource Scheduling for Multi-User/Multi-DNN Workloads Architecture
Architecture, Compiler, and System Support for Multi-model DNN Workloads (ACSMD) Workshop, 2021 (MICRO Workshop)
Sungyeob Yoo, Jung-Hoon Kim, and Joo-Young Kim
35
FCCM
TOP-TIER
Accelerating Large-Scale Nearest Neighbor Search with Computational Storage Device FPGA Samsung
IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021
Ji-Hoon Kim, Yeo-Reum Park, Jaeyoung Do, Soo-Young Ji, and Joo-Young Kim
34
DAC
TOP-TIER
FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism Architecture Automation
ACM/IEEE Design Automation Conference (DAC), 2021
Je Yang, Seongmin Hong, and Joo-Young Kim
2020
33
VLSI
TOP-TIER
Z-PIM: An Energy-Efficient Sparsity-Aware Processing-In-Memory Architecture with Fully-Variable Weight Precision Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2020
Ji-Hoon Kim, Juhyoung Lee, Jinsu Lee, Hoi-Jun Yoo, and Joo-Young Kim
Before 2020
32
MICRO
TOP-TIER
A Cloud-Scale Acceleration Architecture Architecture
International Symposium on Microarchitecture (MICRO), 2016
Adrian Caulfield, Eric Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger
31
HOTCHIPS
Toward Accelerating Deep Learning at Scale Using Specialized Logic Circuit
Hot Chips: A Symposium on High Performance Chips (HOTCHIPS), 2015
Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric Chung
30
FCCM
TOP-TIER
A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs FPGA
International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2015
Jeremy Fowers, Joo-Young Kim, Scott Hauck, and Doug Burger
29
ISCA
TOP-TIER
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services Architecture
International Symposium on Computer Architecture (ISCA), 2014
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James R. Larus, Eric Peterson, Gopi Prashanth, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger
28
ASAP
Energy Efficient Canonical Huffman Encoding Architecture
International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2014
Janarbek Matai, Joo-Young Kim, and Ryan Kastner
27
FCCM
TOP-TIER
A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer FPGA
International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2014
Joo-Young Kim, Scott Hauck, and Doug Burger
26
CICC
MAJOR
Intelligent NoC with Neuro-Fuzzy Bandwidth Regulation for a 51 IP Object Recognition Processor Circuit
IEEE Custom Integrated Circuits Conference (CICC), 2010
Seungjin Lee, Jinwook Oh, Minsu Kim, Junyoung Park, Joonsoo Kwon, Joo-Young Kim, and Hoi-Jun Yoo
25
VLSI
TOP-TIER
A 1.2mW On-Line Learning Mixed Mode Intelligent Inference Engine for Robust Object Recognition Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2010
Jinwook Oh, Seungjin Lee, Minsu Kim, Joonsoo Kwon, Junyoung Park, Joo-Young Kim, and Hoi-Jun Yoo
24
COOLCHIPS
A 36 Heterogeneous Core Architecture with Resource-Aware Fine-grained Task Scheduling for Feedback Attention based Object Recognition Circuit
IEEE Symposium on Low-Power and High-Speed Chips (COOLCHIPS), 2010
Seungjin Lee, Jinwook Oh, Minsu Kim, Joonyoung Park, Joonsoo Kwon, Joo-Young Kim, and Hoi-Jun Yoo
Before 2010
23
ESSCIRC
MAJOR
A 118.4GB/s Multi-Casting Network-on-Chip for Real-Time Object Recognition Processor Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2009
Joo-Young Kim, Kwanho Kim, Minsu Kim, Seungjin Lee, Jinwook Oh, and Hoi-Jun Yoo
22
ISLPED
A 60fps 496mW Multi-Object Recognition Processor with Workload-Aware Dynamic Power Management Circuit
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), 2009
Joo-Young Kim, Seungjin Lee, Jinwook Oh, Minsu Kim, and Hoi-Jun Yoo
21
VLSI
TOP-TIER
A 22.8GOPS 2.83mW Neuro-fuzzy Object Detection Engine for Fast Multi-Object Recognition Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2009
Minsu Kim, Joo-Young Kim, Seungjin Lee, Jinwook Oh, and Hoi-Jun Yoo
20
COOLCHIPS
An Energy Efficient Real-Time Object Recognition Processor with Neuro-Fuzzy Controlled Task Pipelining Circuit
IEEE Symposium on Low- Power and High-Speed Chips (COOLCHIPS), 2009
Joo-Young Kim, Minsu Kim, Seungjin Lee, Jinwook Oh, Kwanho Kim, and Hoi-Jun Yoo
19
ISSCC
TOP-TIER
A 201.4GOPS 496mW Real-Time Multi-Object Recognition Processor with Bio-Inspired Neural Perception Engine Circuit
IEEE International Solid-State Circuits Conference (ISSCC), 2009
Joo-Young Kim, Minsu Kim, Seungjin Lee, Jinwook Oh, Kwanho Kim, Sejong Oh, Jeong-Ho Woo, Donghyun Kim, and Hoi-Jun Yoo
18
ASSCC
MAJOR
A 66fps 38mW Nearest Neighbor Matching Processor with Hierarchical VQ Algorithm for Real-Time Object Recognition Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2008
Joo-Young Kim, Kwanho Kim, Seungjin Lee, Minsu Kim, and Hoi-Jun Yoo
17
ASSCC
MAJOR
A 76.8 GB/s 46 mW Low-latency Network-on-Chip for Real-time Object Recognition Processor Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2008
Kwanho Kim, Joo-Young Kim, Seungjin Lee, Minsu Kim, and Hoi-Jun Yoo
16
ESSCIRC
MAJOR
A 211 GOPS/W Dual-Mode Real-Time Object Recognition Processor with Network-on-Chip Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2008
Kwanho Kim, Joo-Young Kim, Seungjin Lee, Minsu Kim, and Hoi-Jun Yoo
15
VLSI
TOP-TIER
The Brain Mimicking Visual Attention Engine: An 80x60 Digital Cellular Neural Network for Rapid Global Feature Extraction Circuit
IEEE Symposium on VLSI Circuits (VLSI), 2008
Seungjin Lee, Kwanho Kim, Minsu Kim, Joo-Young Kim, and Hoi-Jun Yoo
14
DAC
TOP-TIER
Vision Platform for Mobile Intelligent Robots Based on 81.6 GOPS Objects Recognition Processor Automation
ACM Design Automation Conference (DAC), 2008
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, and Hoi-Jun Yoo
13
ISCAS
A 0.6pJ/b 3Gb/s/ch Transceiver in 0.18 um CMOS for 10mm On-chip interconnects Circuit
IEEE International Symposium on Circuit and Systems (ISCAS), 2008
Joonsung Bae, Joo-Young Kim, and Hoi-Jun Yoo
12
ISSCC
TOP-TIER
A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual Attention Engine Circuit
IEEE International Solid-State Circuits Conference (ISSCC), 2008
Kwanho Kim, Seungjin Lee, Joo-Young Kim, Minsu Kim, Donghyun Kim, Jeong-Ho Woo, and Hoi-Jun Yoo
11
ASSCC
MAJOR
Bitwise Competition Logic for Compact Digital Comparator Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2007
Joo-Young Kim, and Hoi-Jun Yoo
10
ASSCC
MAJOR
Implementation of Memory-Centric NoC for 81.6 GOPS Object Recognition Processor Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2007
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, and Hoi-Jun Yoo
9
ESSCIRC
MAJOR
Visual Image Processing RAM for Fast 2-D Data Location Search Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2007
Joo-Young Kim, Donghyun Kim, Seungjin Lee, Kwanho Kim, and Hoi-Jun Yoo
8
CICC
MAJOR
An 81.6 GOPS Object Recognition Processor Based on NoC and Visual Image Processing Memory Circuit
IEEE Custom Circuits Conference (CICC), 2007
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, and Hoi-Jun Yoo
7
NOCS
Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC Circuit
IEEE International Symposium on Network-on-Chip (NOCS), 2007
Donghyun Kim, Kwanho Kim, Joo-Young Kim, Seungjin Lee, and Hoi-Jun Yoo
6
ISCAS
A 372ps 64-bit Adder using Fast Pull-up Logic in 0.18-um CMOS Circuit
IEEE International Symposium on Circuit and Systems (ISCAS), 2006
Joo-Young Kim, Kangmin Lee, and Hoi-Jun Yoo
5
ASSCC
MAJOR
A TCAM-based Periodic Event Generator for Multi-Node Management in the Body Sensor Network Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2006
Sungdae Choi, Kyomin Sohn, Jooyoung Kim, Jerald Yoo, and Hoi-Jun Yoo
4
ASSCC
MAJOR
A 0.6-V, 6.8-uW Embedded SRAM for Ultra-low Power SoC Circuit
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2006
Kyomin Sohn, Sungdae Choi, Jeong-Ho Woo, Jooyoung Kim, and Hoi-Jun Yoo
3
ESSCIRC
MAJOR
A 24.2-uW Dual-Mode Human Body Communication Controller for Body Sensor Network Circuit
IEEE European Solid-State Circuits Conference (ESSCIRC), 2006
Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim, Jooyoung Kim, Namjun Cho, Jeong-Ho Woo, Jerald Yoo, and Hoi-Jun Yoo
2
CICC
MAJOR
A Multi-Nodes Human Body Communication Sensor Network Control Processor Circuit
IEEE Custom Circuits Conference (CICC), 2006
Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim, Jooyoung Kim, Namjun Cho, Jeong-Ho Woo, Jerald Yoo, and Hoi-Jun Yoo
1
ISWC
A Low-power Star-topology Body Area Network Controller for Periodic Data Monitoring Around and Inside the Human Body Circuit
IEEE International Symposium on Wearable Computers (ISWC), 2006
Sungdae Choi, Seong-Jun Song, Kyomin Sohn, Hyejung Kim, Jooyoung Kim, Jerald Yoo, and Hoi-Jun Yoo