In recent years, Independent Component Analysis (ICA) has become a standard to identify relevant dimensions of the data in neuroscience. ICA is a very reliable method to analyze data but it is, computationally, very costly. The use of ICA for online analysis of the data, used in brain computing interfaces, results are almost completely prohibitive. We show an increase with almost no cost (a rapid video card) of speed of ICA by about 25 fold. The EEG data, which is a repetition of many independent signals in multiple channels, is very suitable for processing using the vector processors included in the graphical units. We profiled the implementation of this algorithm and detected two main types of operations responsible of the processing bottleneck and taking almost 80% of computing time: vector-matrix and matrix-matrix multiplications. By replacing function calls to basic linear algebra functions to the standard CUBLAS routines provided by GPU manufacturers, it does not increase performance due to CUDA kernel launch overhead. Instead, we developed a GPU-based solution that, comparing with the original BLAS and CUBLAS versions, obtains a 25x increase of performance for the ICA calculation.
ICA is one of the de-facto standard methods for source separation and removal of noise and artifacts. In neuroscience, it has been widely used for EEG, fMRI and invasive electrophysiology. In all these neuroimaging methods, technology has increased the data volume, improving spatial and temporal resolution. With current standards, analyzing data with ICA requires a vast, often intractable amount of computing power. In practical EEG analysis, this computer power requirements impedes the rapid exploration of different methods since each implementation of ICA runs overnight or even taking more than one day. A rapid iteration and examination of different procedures becomes completely impractical. Even more, ICA is difficult to use for online access of the data. Over the last years there has been an exponential development of brain computing interfaces which require online access to the relevant dimensions of the data.
ICA constitutes a formidable tool for finding relevant directions and BCI procedures which use ICA to present participants with different components to determine which are easier to control is a timely necessity. For this, it is imperative to implement ICA at much faster speeds than is being implemented with current CPU and here we present a major advancement in this direction.
Our aim here was purely methodological: improve the speed of Infomax ICA by at least 10x. We performed a detailed profiling and detected the bottleneck in the calculation of independent components, showing that vector-matrix and matrix-matrix operations take almost all computational time. Based on these results, we implemented an hybrid ad-hoc solution for GPU optimizations: CUDAICA. With this solution, we compared CUDAICA to the original BLAS (compiled with standard ATLAS and the optimized MKL libraries) and CUBLAS implementations. We observed a 25x performance increment using CUDAICA, over the standard ATLAS implementation, and 4.5x performance increment compared to the MKL implementation.
With this calculation time, a 128-electrodes EEG of 1-hour experiment would take 1500 seconds approximately to compute the independent components. This timing opens up new possibilities of the method, for instance for Brain Computer Interface applications, making possible to think of an experiment where Independent components may be calculated during the experiment and use them as a feedback feature.
CUDAICA was developed under the GNU General Public License, and is freely available from our wiki with a description of application features, FAQ and installation instructions (http://calamaro.exp.dc.uba.ar/cudaica/doku.php?id=start). CUDAICA woks as a standalone application and integrates to the EEGLAB Toolbox adding an option to process ICA using CUDAICA, just like any other ICA implementation. It was designed for standard EEGLAB users, with no extra effort needed to run this implementation. It works under CUDA enabled hardware, that is, almost every modern graphic card, making CUDAICA widely available and easy to use.
Federico Raimondo, Juan E. Kamienkowski, Mariano Sigman and Diego Fernandez Slezak. CUDAICA: GPU Optimization of Infomax-ICA EEG Analysis. Comput Intell Neurosci, 2012. [doi: 10.1155/2012/206972] [Free PDF]