With the rapid evolution of processor architectures, more attention has been paid to the hardware-oriented numeric applications. Based on Fermi architecture, we investigate the approach to accelerate high performance computing (HPC) applications with concurrent kernels. We concentrated on two performance factors, namely the launching order of concurrent kernels and the kernel granularity. Extensive experiments show that the launching order of concurrent kernels can hardly affect application performance. Particularly, we identify the heuristics of kernel granularity that may result in the best performance, i.e. the occupancy of each kernel should be in the interval [30%, 50%].
Fengshun Lu, Junqiang Song, Fukang Yin and Xiaoqian Zhu. GPU Computing Using Concurrent Kernels: A Case Study. Lecture Notes in Electrical Engineering, Volume 126, pp 173-181, 2012. [doi: 10.1007/978-3-642-25766-7_23]