AWS Inferentia

Realizes: deep learning inference pipelines

AWS-designed chip for deep learning inference with high throughput and low latency, used in Inferentia-based EC2 Inf1 instances and the Neuron SDK stack.

Examples

AWS EC2 Inf1 instances

Inf1 instances pair AWS Inferentia chips with the Neuron SDK to accelerate transformer and CNN inference with high throughput and low latency.

CONVOLUTION ATTENTION DENSE high cloud-scale low