HuMoniX: A 57.3fps 12.8TFLOPS/W Text-to-Motion Processor with Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity

We propose a novel text-to-motion processor called HuMoniX,
enabling real-time human motion generation by integrating two heterogeneous engine
clusters. The 12mm2 HuMoniX chip, fabricated using 14nm technology, operates at
50-600MHz with a supply voltage of 0.63 to 0.94V. Figure 23.10.6 shows its measurement
results and comparison table. It demonstrates robust accuracy in final 3D mesh outputs,
achieving PSNR values of 23.4 to 50.4. In single batch experiments, HuMoniX, with 3.2GB/s
external memory bandwidth (BW), achieves up to 57.3fps, making it 12.4 to 650.5× faster
than edge GPU systems [5], despite having a smaller BW. This performance is achieved by
skipping redundant computations and weight fetches in FFN layers and reducing the number
of frames requiring SMPL processing. By reducing total energy consumption by 72.3%, it
achieves energy efficiency of 8.9 to 11.0TOPS/W for joint creation and 12.3 to
12.8TFLOPS/W for mesh construction, respectively. These results demonstrate that it is
1729 to 3572× more efficient than comparable GPUs and 2 to 6M× more efficient than
CPU-based executions. In conclusion, HuMoniX, a human motion generation processor,
achieves real-time generation speeds with high energy efficiency.