The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive general purpose applications. Very expensive GFLOPs and TFLOP performance has become very cheap with the GPGPUs. Current work focuses mainly on the highly parallel implementation of Matrix Exponentiation. Matrix Exponentiation is widely used in many areas of scientific community ranging from highly critical flight, CAD simulations to financial, statistical applications. Proposed solution for Matrix Exponentiation uses OpenCL for exploiting the hyper parallelism offered by the many core GPGPUs. It employs many general GPU optimizations and architectural specific optimizations. This experimentation covers the optimizations targeted specific to the Scientific Graphics cards (Tesla-C2050). Heterogeneous Highly Parallel Matrix Exponentiation method has been tested for matrices of different sizes and with different powers. The devised Kernel has shown 1000X speedup and 44 fold speedup with the naive GPU Kernel.
Current methodology is implemented in heterogeneous language OpenCL we can run on any compute device on any architecture. In this experiment we have tested our algorithm on dense matrices of size up to 512 by 512 against higher powers up to 256 and evaluated the results. All the results are strictly compared with the sequential code results for any precision problems. Our methodology preserves the high precision and enables the supercomputing capability with the relatively cheaper GPUs.
Our solution gives more than thousand fold performance on the high end scientific graphic card Tesla C 2050 for the higher power of matrices of bigger sizes. This approach includes several architectural performance benefits specific to Tesla C 2050 and also some general optimization techniques supported by all multi core processors including GPUs. In Fig 5 to Fig 12, our approach is compared with the Naïve GPU method and our method always outperforms the Naïve GPU approach.
Chittampally Vasanth Raja, Srinivas Balasubramanian and Prakash S Raghavendra. Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU. International Journal of Distributed and Parallel Systems, Vol 3, No. 2, 2012. arXiv:1204.3052v1 [cs.DC] [doi: 10.5121/ijdps.2012.3209] [Free PDF]