The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from many types of partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously. Graphics processing units (GPUs) have recently burst onto the scientific computing scene as a technology that has yielded substantial performance and energy-efficiency improvements. A central challenge in implementing GMG on GPUs, though, is that computational work on coarse levels cannot fully utilize the capacity of a GPU. In this work, we perform numerical studies of GMG on CPU–GPU heterogeneous computers. Furthermore, we compare our implementation with an efficient CPU implementation of GMG and with the most popular fast Poisson solver, Fast Fourier Transform, in the cuFFT library developed by NVIDIA.
The main purpose of this paper is to consider the following important questions, all of which are central to understanding geometric multigrid methods on GPU architecture:
- Is it possible to achieve a satisfactory speedup on GPUs for multigrid algorithms?
- How does the performance of multigrid algorithms on GPUs compare with their performance on a single state-of-the-art CPU core?
- How much of the computational power of GPUs can be used for multigrid algorithms? How cost-effective are CPU–GPU systems?
- Compared with the optimized implementation of direct solvers based on FFT, does GMG have any advantages in addition to its generality?
In this work, we studied the performance of GMG on CPU-GPU heterogeneous computers. Our numerical results suggest that in the best-case scenario the GPU version of GMG can achieve 18.5 times speed-up in 2D and 16.0 times speed-up in 3D compared with an efficient implementation of multigrid methods on CPUs. When the problem is relatively small we found that the heterogeneous algorithm (0 < Lè < L) usually gives the best computational performance. On the other hand, when the problem size is large enough, then we concluded that it is generally preferable to do the computation on GPUs. We observed the smoothing and computing norm of the residual account for the low floating-point performance and low efficiency of GMG on GPUs. Furthermore, we compared our method with the Fast Fourier Transform in the state-of-the-art cuFFT library. For the test cases with 16 million unknowns (L = 12 in 2D and L = 8 in 3D), we showed that the optimal FMG method is 33% and 23% faster than FFT in 2D and 3D, respectively. Of at least equal importance is that GPU is more cost-effective (in terms of initial cost and daily energy consumption) than modern multicore CPUs for geometric multigrid methods.
Chunsheng Feng, Shi Shu, Jinchao Xu, Chen-Song Zhang. Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers. arXiv:1208.4247v1 [math.NA]