DPIM: A 19.36 TOPS/W 2T1C eDRAM Transformer-in-Memory Chip with Sparsity-Aware Quantization and Heterogeneous Dense-Sparse Core

This paper presents DPIM, the first 2T1C eDRAM
Transformer-in-memory chip. Its high-density eDRAM cell supports large-capacity processing-in-memory (PIM) macros of 1.38
Mb/mm2, reducing external memory access. DPIM adopts a
sparse-aware quantization scheme to entire layers of Transformer,
which quantizes the model to 8-bit integer (INT8) with a minimal
accuracy drop of 2% in the BERT-large model on the GLUE
dataset while increasing the bit-slice sparsity ratio of both
weight and activation from dense matrices to 83.3% and 88.4%,
respectively. Its heterogeneous PIM macro supports intensive
dense matrix multiplications with an extreme to moderate range
of sparse matrix multiplications with a peak throughput of 3.03-
12.12 TOPS, enhancing the efficiency up to 4.84-19.36 TOPS/W