HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU threads in conjunction with an accelerator programming model to share and manage the different node resources. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. In order to offset the high development cost of creating CUDA or OpenCL kernels, directives have been proposed for programming accelerator devices, but their implications are not well known. In this paper, we evaluate the state of the art accelerator directives to program several applications kernels, explore transformations to achieve good performance, and examine the expressivity and performance penalty of using high-level directives versus CUDA. We also compare our results to OpenMP implementations to understand the benefits of running the kernels in the accelerator versus CPU cores.
This paper explores GPU programming models and compares the use of two sets of accelerator directives in two real-world application kernel studies. We explain the challenges and limitations encountered and, based on the lessons learned, and reach initial conclusions on how to transform code to take advantage of the accelerator directive. We also compared the performance of running the codes on the GPU versus the CPU, and found that in all the cases the GPU yielded significantly better performance. In order to use the accelerator directives efficiently, it is necessary to perform code transformations to close the gap in performance to native CUDA implementations.
Oscar Hernandez, Wei Ding, Barbara Chapman, Christos Kartsaklis, Ramanan Sankaran and Richard Graham. Experiences with High-Level Programming Directives for Porting Applications to GPUs. Facing the Multicore – Challenge II. Lecture Notes in Computer Science, Volume 7174/2012, 96-107, 2012. [doi: 10.1007/978-3-642-30397-5_9] [Free PDF]
Upd: Link to Free pdf had been updated!