We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientifically successfully tested and validated on astrophysical massive data classification problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances and scalability.
We investigated the state of the art computing technologies, by choosing the one best suited to deal with a wide range of real physical problems. A multi-purpose genetic algorithm (GA) implemented with GPGPU/CUDA parallel computing technology has been designed and developed. The model comes from the paradigm of supervised machine learning, addressing both the problems of classification and regression applied on massive data sets.
The model was derived from a serial implementation named GAME, deployed on the DAME Program,  and , hybrid distributed infrastructure and already scientifically tested and validated on astrophysics massive data sets problems with successful results. Since GAs are inherently parallel, the parallel computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances. We described our effort to adapt our genetic algorithm for general purpose on GPU. We discussed the efficiency and computational costs of various components involved that are present in the algorithm. Several benchmark results were shown. The use of CUDA translates into a 75x average speedup, by successfully eliminating the largest bottleneck in the multi-core CPU code. Although a speedup of up to 200x over a modern CPU is impressive, it ignores the larger picture of use a Genetic Algorithm as a whole. In any real-world the dataset can be very large (those we have previously called Massive Data Sets) and this requires greater attention to GPU memory management, in terms of scheduling and data transfers host-to-device and vice versa. Moreover, the identical results for classification functional cases demonstrate the consistency of the implementation for the three different computing architectures, enhancing the scalability of the proposed GAME model when approaching massive data sets problems.
Finally, the very encouraging results suggest to investigate further optimizations, like: (i) moving the formation of the population matrix and its evolution in place on the GPU. This approach has the potential to significantly reduce the number of operations in the core computation, but at the cost of higher memory usage; (ii) exploring more improvements by mixing Thrust and CUDA C code, that should allow a modest speedup justifying development efforts at a lower level; (iii) use of new features now available on NVIDIA Fermi architecture, such as faster atomics and more robust thread synchronization and multi GPUs capability.
Stefano Cavuoti, Mauro Garofalo, Massimo Brescia, Antonio Pescapé, Giuseppe Longo and Giorgio Ventre. Genetic Algorithm Modeling with GPU Parallel Computing Technology. Instrumentation and Methods for Astrophysics, 2012. arXiv:1211.5481 [astro-ph.IM] [Free PDF]