In this work, authors used OpenCL framework to accelerate an EMRI modeling application using the hardware accelerators – Cell BE and Tesla CUDA GPU. Authors describe these compute technologies, present performance results, and then compare them with those from our previous implementations based on the native CUDA and Cell SDKs. The OpenCL framework allowed to execute identical source-code on both architectures and yet obtain strong performance gains that are comparable to what can be derived from the native SDKs.
The main goal of this work is to evaluate an emerging computational platform, OpenCL, for scientific computation. OpenCL is potentially extremely important for all computational scientists because it is hardware and vendor neutral and yet (as results suggest) able to deliver strong performance i.e. it provides portability without sacrificing performance. In this work, they consider all major types of compute hardware (CPU, GPU and even a hybrid architecture i.e. Cell BE) and provide comparative performance results based on a specific research code.
More specifically, authors take an important NR application – the EMRI Teukolsky Code – and perform a low-level parallelization of its most computationally intensive part using the OpenCL framework, for optimized execution on the Cell BE and Tesla CUDA GPU. They describe the parallelization approach taken and also the relevant important aspects of the considered compute hardware in some detail. In addition, they compare the performance gains we obtain from our OpenCL implementation to the gains from native Cell and CUDA SDK based implementations. The final outcome of our work is very similar on these architectures – we obtain well over an order-of-magnitude gain in overall application performance. The results also suggest that an OpenCL-based implementation delivers comparable performance to that based on a native SDK on both types of accelerator hardware. Moreover, the OpenCL source-code is identical for both these hardware platforms, which is a non-trivial benefit – it promises tremendous savings in parallel code-development and optimization efforts.