AWS Inferentia
Realizes: deep learning inference pipelines
AWS-designed chip for deep learning inference with high throughput and low latency, used in Inferentia-based EC2 Inf1 instances and the Neuron SDK stack.
Examples
🔗
AWS EC2 Inf1 instances
Inf1 instances pair AWS Inferentia chips with the Neuron SDK to accelerate transformer and CNN inference with high throughput and low latency.
CONVOLUTION
ATTENTION
DENSE
high
cloud-scale
low