Intel Xe-HPC
Realizes: Dense HPC GPU acceleration for AI training, scientific simulation, and matrix algebra
Ponte Vecchio GPUs combine HBM2e stacks, AVX-512 adapted cores, and a tile-based Intel 7/4 process optimized for HPC tiles, locking thousands of wide SIMT lanes per tile and coordinating them through a scalable fabric designed for large-scale scientific and AI workloads.
Examples
🔗
Aurora HPC node (Ponte Vecchio GPUs)
Each Aurora supercomputer node stitches six Ponte Vecchio GPUs with HBM2e and a coherent fabric, delivering deterministic throughput for HPC and AI training workloads at the Argonne Leadership Computing Facility.
AI training throughput ~400 TFLOPS FP16 per GPU (2.4 PFLOPS FP16 per node) using HIP/oneAPI kernels for transformer training, showing DGX-class performance in exascale experiments.
hours
exascale
~200 pJ/FLOP