Google TPU v2

Realizes: AI training and inference acceleration

Google's second-generation TPU v2 is a datacenter-scale AI accelerator built around large systolic arrays, high-bandwidth memory, and bfloat16 matrix units, forming Cloud TPU v2 pods to deliver high-throughput training and inference for deep learning workloads.

Examples

BERT pre-training on TPU v2

Pre-training the BERT-Large transformer using bfloat16 matrix updates across a 128-chip TPU v2 Pod slice for large-scale language modeling benchmarks.

Batched matrix multiplications, convolutions, and layer norm kernels executed through the Matrix Multiply Unit (MXU) pipelines. high cluster ≈45 pJ per fused multiply-add (bfloat16)

Cloud TPU v2 pod

Cloud TPU v2 pods built from Google TPU v2 chips and high-bandwidth fabric for distributed training.

Float16 matrix multiply and convolution pipelines high cloud-scale low