We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We investigate a Truncated Neumann Series based preconditioner in combination with deflation and compare it with Block Incomplete Cholesky schemes. This combination exhibits fine-grain parallelism and hence we gain considerably in execution time. It’s numerical performance is also comparable to the Block Incomplete Cholesky approach. Our method provides a speedup of up to 16 times for a system of one million unknowns when compared to an optimized implementation on the CPU.
Conclusions and Future work
We have shown how two level preconditioning can be adapted to the GPU for computational efficiency. In order to achieve this we have investigated preconditioners that are suited to the GPU. At the same time we have made new data structures in order to optimize deflation operations. Through our results we demonstrate that the combination of Truncated Neumann based preconditioning and deflation proves to be computationally efficient on the GPU. At the same time it’s numerical performance is also comparable to the established method of Block-Incomplete Cholesky Preconditioning.
We have also evaluated two different approaches of implementing the deflation step. From the model problem we have learned that the choice of implementing deflation method could be crucial in the overall run-time of the method.
Using this knowledge we are now working on model problems with bubbles instead of simple interfaces. With these geometries we can use the Level-Set Sub-domain based deflation vectors to capture and eliminate small eigenvalues with considerably less deflation vectors. This way we can use the explicit inverse calculation for E and have double-digit speedups for larger problems.