We accelerated an ab initio molecular QMC calculation by using GPGPU. Only the bottle-neck part of the calculation is replaced by CUDA subroutine and performed on GPU. The performance on a (single core CPU + GPU) is compared with that on a (single core CPU with double precision), getting 23.6 (11.0) times faster calculations in single (double) precision treatments on GPU. The energy deviation caused by the single precision treatment was found to be within the accuracy required in the calculation, ∼10−5 hartree. The accelerated computational nodes mounting GPU are combined to form a hybrid MPI cluster on which we confirmed the performance linearly scales to the number of nodes.
We applied GPGPU to accelerate the single core performance on a QMC code combined with a QM/MM treatment in FMO method. Only the bottle-neck subroutine of the code is translated to be written in CUDA and performed on GPU. A large scale summation in the part is divided into sub summations distributed to threads running on many cores in GPU, getting 23.6 (11.0) times faster performance in single (double) precision when we compare the performance on a (single core CPU double precision + GPU with single precision) with that on a (single core CPU with double precision). The accuracy in single precision calculation was confirmed to be kept within the required extent (chemical accuracy, ∼0.001 hartree in energy). Such accelerated nodes are combined to build a MPI cluster, on which we confirmed the MPI performance scaling linearly with the number of nodes upto four. Achieve factors of the acceleration are compared with ideal limits, and possible accounts for the discrepancy are investigated, putting the present work as a first step towards further efficient acceleration of such strategy replacing only the most time consuming subroutine with CUDA-GPU one.