Google TPU v1

Realizes: AI accelerator

Google's first TPU, announced in 2016, ties a large 256×256 systolic array built for dense matrix multiplies to local weight memory so inference workloads across Google data centers run deterministically with predictable throughput and latency from the ASIC systolic array hardware.

Examples

🔗

TPU v1 Pod for inference

Google Cloud TPU v1 Pods link dozens of these original boards to deliver pod-scale inference throughput, keeping the deterministic systolic arrays busy with matrix multiplies for trained models.

[INFERENCE,MATRIX_MULTIPLY] milliseconds pod-scale ≈65 pJ per multiply-accumulate