We describe experience on design and implementation of an efficient count sort algorithm on Compute Unified Device Architecture graphics processing units. The novelty of this work is twofold. At first, we propose a count sort algorithm for integers that needs no synchronization at its last step and thus, offers superior performance. At second, this work contributes ad hoc techniques for optimizing the performance of the algorithm on Compute Unified Device Architecture-enabled graphics processing units.
Kolonias, V., Voyiatzis, A. G., Goulas, G. and Housos, E. (2011), Design and implementation of an efficient integer count sort in CUDA GPUs. Concurrency and Computation: Practice and Experience, 23: 2365–2381. [DOI: 10.1002/cpe.1776]