Implementation of a High Performance GPGPU Compiler
In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework. The input to our compiler is a naive GPU kernel procedure, which is functionally correct but without any consideration for performance optimization. Our compiler applies a set of optimization techniques to the naive kernel and generates the optimized GPU kernel. Our compiler supports optimizations for GPU kernels using either global memory or texture memory. The implementation of our compiler is facilitated with a source-to-source compiler infrastructure, Cetus. The code transformation in the Cetus compiler framework is called a pass. We classify all the passes used in our work into two categories: functional passes and optimization passes. The functional passes translate input kernels into desired intermediate representation, which clearly represents memory access patterns and thread configurations. A series of optimization passes improve the performance of the kernels by adapting them to the target GPGPU architecture. Our experiments show that the optimized code achieves very high performance, either superior or very close to highly fine-tuned libraries.
Conclusion
In this paper, we present our experience in developing a compiler framework to optimize GPGPU programs using Cetus. As a source-to-source compiler framework, Cetus enables researchers like us to implement code optimizations on high level language without the knowledge of low level language like assembly. Optimizations at the high level language can be effective for different low level implementations. As shown in our work, the optimized OpenCL kernels can be effective for both NVIDIA and AMD platforms. To facilitate further development on our GPGPU compiler, we expect Cetus to add the OpenCL and CUDA support internally or some extension interfaces for parallel languages. Another
important feature is static single assignment, which can simplify data dependency analysis.
Yi Yang and Huiyang Zhou. The Implementation of a High Performance GPGPU Compiler. International Journal of Parallel Programming, pp. 1-14, 2012. [doi: 10.1007/s10766-012-0228-3]
Category: Articles, Computer Science






