Intel Xe-HPC

Realizes: Dense HPC GPU acceleration for AI training, scientific simulation, and matrix algebra

Ponte Vecchio GPUs combine HBM2e stacks, AVX-512 adapted cores, and a tile-based Intel 7/4 process optimized for HPC tiles, locking thousands of wide SIMT lanes per tile and coordinating them through a scalable fabric designed for large-scale scientific and AI workloads.

Examples

🔗

Aurora HPC node (Ponte Vecchio GPUs)

Each Aurora supercomputer node stitches six Ponte Vecchio GPUs with HBM2e and a coherent fabric, delivering deterministic throughput for HPC and AI training workloads at the Argonne Leadership Computing Facility.

AI training throughput ~400 TFLOPS FP16 per GPU (2.4 PFLOPS FP16 per node) using HIP/oneAPI kernels for transformer training, showing DGX-class performance in exascale experiments. hours exascale ~200 pJ/FLOP