Intel has broke its silence on its forthcoming Xeon Phi processor which is supposed to hit the shops later this year. They provided the first look inside its Xeon Phi processor in a Hot Chips talk by George Chrysos, chief architect of Knights Corner.
The chip packs more than 50 quad-threaded Pentium-class cores with 512-bit vector units and about 25 Mbytes cache around a 512-bit, three-ring interconnect. Xeon Phi is essentially an x86 symmetrical multiprocessing system on a chip. It runs popular programming environments used in large server clusters and supercomputers such as OpenMP, MPI, OpenCL, Pthreads and Intel’s existing tools. ntel used Xeon Phi in an internal system called Discovery that delivers about 1,400 MFlops/watt, dissipating 72.5 kW and hitting number 150 on the latest version of the Top 500 supercomputers list. By contrast, one Nvidia-based system at number 177 on the list consumes 81.5W, Intel noted.
The Hot Chips talk revealed aspects of the Xeon Phi architecture. Intel won’t disclose product details or a road map until the first chip is announced later this year. The company is expected to roll out a family of products, eventually scaling to well beyond 50 cores. Cray said it will use Xeon Phi in its next supercomputer called Cascades.
The chip’s cores have one 512-bit vector unit and two scalar units and one private 512 Kbyte L2 cache. This is the biggest vector unit developed by Intel and each one can manage 8 double precision or 16 single precision SIMD operations per clock cycle. Xeon Sandy Bridge and AMD Bulldozer can only manage half of that and since there will be 50-plus cores, that means that Knight’s Corner can manage 400 double precision flops per cycle. On a 2 GHz processor, that works out at 800 gigaflops and since Intel is using 22nm technology process it is going to be much more than that.
Chrysos said that just two percent of the Knights Corner die is dedicated to decoding x86 instructions. Chrysos added that the design uses other HPC features. There is a math accelerator called the Extended Map Unit. This does polynomial approximations of transcendent functions like square roots, reciprocals, and exponents to speed up execution of these functions in hardware. It still lacks a device to remove stones from horses’ hooves or a divide by your shoe size capability.
Intel hopes the large caches help propel the chip’s use in future exascale supercomputers. The wide scalar units help crunch scientific workloads based on a variety of algorithms including FFTs and Monte Carlo simulations. Xeon Phi began its life as Larrabee, a graphics chip made out of x86 cores. Seeing a narrowing opportunity to compete with the likes of AMD and Nvidia in mainstream graphics, Intel shifted its strategy to target massively parallel HPC systems where it hopes to be easier to use than competing GPUs.