This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing all available cores and allocating sufficient amount of work among all computing units, can lead to improved performance. In our simulation, we used Fedora operating system; a system with Intel Xeon Dual core processor having thread count 24 coupled with NVIDIA Quadro FX 3800 as graphical processing unit.
We studied the behavior of parallel algorithms with respect to OpenMP and OpenCL. The initial results we found were not satisfactory. But, as the number of input data size increased OpenCL gives good performance.
Latest systems are equipped with multi-core architecture. So, OpenMP will be a viable option for cases such as matrix multiplication, image convolution, and other applications. But OpenCL scores well with matrix multiplication.
OpenCL involves a lot of background work like memory allocation, kernel settings and loading, getting platform, device information, computing work-item sizes etc. All this adds overhead in OpenCL. However, we find that, in spite of this overhead, OpenCL gives very good performance. But OpenCL fails in application where it has less scope of work; this can be seen from the string reversal example.
Another finding is that critical section is too expensive. We have implemented OpenMP version of N-Queen problem, but, we find that it has no improvement as only one thread is running at a time. However, we can take advantage of “task” directive in application such as tree traversal.
Overall, we sum up our conclusion as
OpenCL > OpenMP > Sequential
Where > indicates performance. As a future work, we will find algorithms, where OpenMP is more preferable over OpenCL.
Future research work is required in the following problem areas: given an application program, we must check how useful OpenMP or OpenCL is in heterogeneous environment consisting of multiple GPUs and multi-cores. Secondly, a library routine can be developed, which will port application program to CPU using OpenMP or to GPU using OpenCL or combination of these two technologies.
Krishnahari Thouti and S. R. Sathe. Comparison of OpenMP & OpenCL Parallel Processing Technologies. International Journal of Advanced Computer Science and Applications (IJACSA), Volume 3, issue 4, pp 56-61, 2012. arXiv:1211.2038 [cs.DC] [Free PDF]