Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge (a library based framework for heterogeneous multi-core systems), Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a new programming language for general purpose computation on the GPU) and CUDA-lite (an enhancement to CUDA that transforms code based on annotations). In addition, efforts are underway to improve compiler tools for automatic parallelization and optimization of affine loop nests for GPUs and for automatic translation of OpenMP parallelized codes to CUDA.
In this paper we present an alternative approach: a new computational framework for the development of massively data parallel scientific codes applications suitable for use on such petascale/exascale hybrid systems built upon the highly scalable Cactus framework. As the first non-trivial demonstration of its usefulness, we successfully developed a new 3D CFD code that achieves improved performance.
In this paper an implementation of a new generic capability for computing on hybrid CPU/GPU architectures in the Cactus computational framework has been presented. The capability to handle the data exchange between GPU and CPU address space and deploying the computations in the hybrid environment was implemented as a new thorn “CaCUDA”. Moreover the application remarkably facilitates the implementation process by generating the templates of all declared kernel functions. Due to the flexibility and extensibility of the Cactus framework no changes to the Cactus flesh were necessary, guaranteeing that existing features and user implemented thorns are not affected by this addition.
As a test case application of these new framework’s features an incompressible CFD code has been implemented to test the overall performance and scalability. The results proving its usability have been presented. Our current effort is focused on minimizing the costs of the data exchange between GPU and CPU and optimizing the boundary exchange. Further integration in this area may improve performance and scalability.
Marek Blazewicz, Steven R. Brandt, Peter Diener, David M. Koppelman, Krzysztof Kurowski, Frank Löffler, Erik Schnetter, Jian Tao. A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems. arXiv:1201.2118v1 [cs.DC] [Free PDF]