Google TPU v1
Realizes: AI accelerator
Google's first TPU, announced in 2016, ties a large 256×256 systolic array built for dense matrix multiplies to local weight memory so inference workloads across Google data centers run deterministically with predictable throughput and latency from the ASIC systolic array hardware.
Examples
🔗
TPU v1 Pod for inference
Google Cloud TPU v1 Pods link dozens of these original boards to deliver pod-scale inference throughput, keeping the deterministic systolic arrays busy with matrix multiplies for trained models.
[INFERENCE,MATRIX_MULTIPLY]
milliseconds
pod-scale
≈65 pJ per multiply-accumulate