CUDA–MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units
Motif discovery in biological sequences is of prime importance and a major challenge in computational biology. Consequently, numerous motif discovery tools have been developed to date. However, the rapid growth of both genomic sequence and gene transcription data, establishes the need for the development of scalable motif discovery tools. An approach to improve the runtime of motif discovery by an order-of-magnitude without losing sensitivity is to employ emerging many-core architectures such as CUDA-enabled GPUs. In this paper, we present a highly parallel formulation and implementation of the MEME motif discovery algorithm using the CUDA programming model. To achieve high efficiency, we introduce two parallelization approaches: sequence-level and substring-level parallelization. Furthermore, a hybrid computing framework is described to take advantage of both CPU and GPU compute resources. Our performance evaluation on a GeForce GTX 280 GPU, results in average runtime speedups of 21.4 (19.3) for the starting point search and 20.5 (16.4) for the overall runtime using the OOPS (ZOOPS) motif search model. The runtime speedups of CUDA–MEME on a single GPU are also comparable to those of ParaMEME running on 16 CPU cores of a high-performance workstation cluster. In addition to the fast speed, CUDA–MEME has the capability of finding motif instances consistent with the sequential MEME.
Yongchao Liu, Bertil Schmidt, Weiguo Liu, Douglas L. Maskell. CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognition Letters, 2010, 31(14): 2170 – 2177. [DOI: 10.1016/j.patrec.2009.10.009]
CUDA-MEME running on one Tesla C1060 GPU is up to 23x faster than MEME running on a x86 CPU. This cuts compute time from hours on CPUs to minutes using GPUs. The data in the chart below are for the OOPS (one occurrence per sequence) model for 4 datasets.