We have improved our prior implementation of Strassen’s algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU). A combination of algorithmic and implementation optimizations result in a factor of 2.3 speed improvement over our previous work, running on an NVIDIA 295. We have also reoptimized the implementation for an NVIDIA 480, from which we obtain a factor of up to 10 speedup in comparison with a Core i7 processor of the same technology generation. This paper discusses how we adapted the algorithm to operate within the limitations of the GPU and how we dealt with other issues encountered in the implementation process, as well as reporting performance results for a multiplications ranging from 255K bits, to 24.512M bits in size.
Niall Emmart and Charles Weems. High Precision Integer Multiplication with a GPU. IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. pp. 1781-1787, 2011. [doi: 10.1109/IPDPS.2011.336]