Google TPU v4
Realizes: dense matrix multiply and transformer attention pipelines
Google TPU v4 is the latest pod-scale accelerator from Google that deterministically realizes dense linear algebra and transformer attention via custom systolic arrays. Each TPU v4 die pairs stacked HBM3, the newest TPU pod interconnect routers, and liquid cooling to sustain the throughput demanded by the latest Cloud TPU v4 pods, which stitch thousands of chips across the pod interconnect fabric for multi-petaflop training.
Examples
🔗
Cloud TPU v4 pod
Cloud TPU v4 pods link the latest TPU v4 dies with the upgraded TPU pod interconnect and HBM3 memory to scale transformer training and inference across thousands of chips.
MATRIX_MULTIPLY
CONVOLUTION
ATTENTION
ultra-high
cloud-scale
low