Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs) for non-graphics applications. From early academic proof-of-concept papers around the year 2000, the use of GPUs has now matured to a point where there are countless industrial applications. Together with the expanding use of GPUs, we have also seen a tremendous development in the programming languages and tools, and getting started programming GPUs has never been easier. However, whilst getting started with GPU programming can be simple, being able to fully utilize GPU hardware is an art that can take months or years to master. The aim of this article is to simplify this process, by giving an overview of current GPU programming strategies, profile-driven development, and an outlook to future trends.
In this article, we have given an overview of hardware and traditional optimization techniques for the GPU. We have furthermore given a step-by-step guide to profile driven development, in which bottlenecks and possible solutions are outlined. The focus is on state-of-the-art hardware with accompanying tools, and we have addressed the most prominent bottlenecks: memory, arithmetics, and latencies.
André R. Brodtkorb, Trond R. Hagen and Martin L. Sætra. Graphics processing unit (GPU) programming strategies and trends in GPU computing. Journal of Parallel and Distributed Computing. Available online, 2012 [doi: 10.1016/j.jpdc.2012.04.003] [Free PDF]