Google TPU v4

Realizes: dense matrix multiply and transformer attention pipelines

Google TPU v4 is the latest pod-scale accelerator from Google that deterministically realizes dense linear algebra and transformer attention via custom systolic arrays. Each TPU v4 die pairs stacked HBM3, the newest TPU pod interconnect routers, and liquid cooling to sustain the throughput demanded by the latest Cloud TPU v4 pods, which stitch thousands of chips across the pod interconnect fabric for multi-petaflop training.

Examples

🔗

Cloud TPU v4 pod

Cloud TPU v4 pods link the latest TPU v4 dies with the upgraded TPU pod interconnect and HBM3 memory to scale transformer training and inference across thousands of chips.

MATRIX_MULTIPLY CONVOLUTION ATTENTION ultra-high cloud-scale low