Gallery – CastLab

2025

HuMoniX: A 57.3fps 12.8TFLOPS/W Text-to-Motion Processor with Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity

We propose a novel text-to-motion processor called HuMoniX, enabling real-time human motion generation by integrating two heterogeneous engine clusters. The 12mm2 HuMoniX chip, fabricated using 14nm technology, operates at 50-600MHz with a supply voltage of 0.63 to 0.94V. Figure 23.10.6 shows its measurement results and comparison table. It demonstrates robust…Read More

2024

A 28nm 4.96 TOPS/W End-to-End Diffusion Accelerator with Reconfigurable Hyper-Precision and Unified Non-Matrix Processing Engine

This paper presents Picasso, an end-to-end diffusion accelerator. Picasso proposes a novel hyper-precision data type and reconfigurable architecture that can maximize hardware efficiency with extended dynamic range, with no compromise in accuracy. Picasso also proposes a unified engine operating all non-matrix operations in a streamlined processing flow and minimizes the…Read More

DPIM: A 19.36 TOPS/W 2T1C eDRAM Transformer-in-Memory Chip with Sparsity-Aware Quantization and Heterogeneous Dense-Sparse Core

This paper presents DPIM, the first 2T1C eDRAM Transformer-in-memory chip. Its high-density eDRAM cell supports large-capacity processing-in-memory (PIM) macros of 1.38 Mb/mm2, reducing external memory access. DPIM adopts a sparse-aware quantization scheme to entire layers of Transformer, which quantizes the model to 8-bit integer (INT8) with a minimal accuracy drop…Read More