The CUBLAS and CULA acceleration of adaptive finite element framework for bioluminescence tomography
In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, although the multi-thread CPU technology and the multi CPU technology have already been applied. In this paper, we for the first time introduce a new kind of acceleration technology to accelerate the AFE framework for BLT, using the graphics processing unit (GPU). Besides the processing speed, the GPU technology can get a balance between the cost and performance. The CUBLAS and CULA are two main important and powerful libraries for programming on NVIDIA GPUs. With the help of CUBLAS and CULA, it is easy to code on NVIDIA GPU and there is no need to worry about the details about the hardware environment of a specific GPU. The numerical experiments are designed to show the necessity, effect and application of the proposed CUBLAS and CULA based GPU acceleration. From the results of the experiments, we can reach the conclusion that the proposed CUBLAS and CULA based GPU acceleration method can improve the processing speed of the AFE framework very much while getting a balance between cost and performance.
The CUBLAS and CULA based GPU acceleration technology has been proposed the first time for the AFE framework in BLT, for getting a balance between cost and performance when dealing with the parallelizable floating point operations. In order to evaluate the need and feasibility of the GPU acceleration, we’ve carried out a set of experiments on the main time consuming operations in the AFE framework. From the results of the experiments, we can reach the conclusion that besides the projection operation and the optimization operation, the matrix inversion and multiplication operations are the main time consuming operations and these operations are all parallelizable floating point operations.
In order to evaluate the effect of the GPU acceleration, we’ve carried out a set of comparison experiments on single thread operations, multi-thread accelerated operations and GPU accelerated operations. The results of the experiments can lead us to the conclusion that the GPU acceleration can improve the processing speed of the AFE framework very much.
In order to show the application of the proposed GPU acceleration, we’ve carried out a single source reconstruction on numerical phantom. Although we’ve carried out real experiments on phantoms and nude mouse, as the memory of the graphics card that we use is limited, the results are not shown in this paper. In the future, we will investigate a Tesla and carry out more biological experiments.
To sum up, the results of all the performed experiments can convince that the GPU acceleration works very well in the AFE framework for BLT. The processing speed of the AFE framework has been improved very much. The GPU technology can cooperate with multi-thread CPU technology to get high performance while keeping low cost.
B. Zhang, X. Yang, F. Yang, X. Yang, C. Qin, D. Han, X. Ma, K. Liu, and J. Tian, The CUBLAS and CULA based GPU acceleration of adaptive finite element framework for bioluminescence tomography, Opt. Express 18, 20201-20214 (2010). [doi: 10.1364/OE.18.020201]