Tag: algorithm
Enhancing data parallelism for ant colony optimisation on GPUs
In this paper, we deal with a GPU implementation of Ant Colony Optimisation (ACO), a population-based optimisation method which comprises two major stages: Tour construction and Pheromone update.
9th Workshop on Practical Aspects of High-Level Parallel Programming (PAPP 2012)
The PAPP workshop focuses on practical aspects of high-level parallel programming: design, implementation and optimisation of high-level programming languages, semantics of parallel languages, formal verification, design or certification of libraries, middle-wares and tools
Efficiently Computing Tensor Eigenvalues on a GPU
In this paper we present an implementation of SS-HOPM targeted for a GPU. We describe how to exploit symmetry to save both storage and computation in the two main computational kernels of the algorithm, and for the case of solving many small tensor eigenproblems we show how to map the computation onto a GPU.
QCD simulations with staggered fermions on GPUs
We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it.
The faster-than-fast Fourier transform
For a large range of practically useful cases, MIT researchers find a way to increase the speed of one of the most important algorithms in the information sciences
CAMPAIGN: Library of GPU-accelerated data clustering algorithms
CAMPAIGN is a library of data clustering algorithms and tools, written in ‘C for CUDA’ for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource.
Efficient two-level preconditionined conjugate gradient method on the GPU
We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We investigate a Truncated Neumann Series based preconditioner in combination with deflation and compare it with Block Incomplete Cholesky schemes.
Accelerating arithmetic coding on a graphic processing unit
We implement the block-parallel arithmetic encoder on GPUs using the NVIDIA GPU and the Computer Unified Device Architecture (CUDA) programming model. The source data sequence is divided into small blocks.
FENZI: GPU-enabled Molecular Dynamics Simulations of Large Membrane Regions
This paper presents the design and implementation of an advanced GPU algorithm for Molecular Dynamics simulations of large membrane regions in the NVT, NVE, and NPT ensembles using explicit solvent and Particle Mesh Ewald (PME) method for treating the conditionally convergent electrostatic component of the classical force field.
GPApriori: GPU-Accelerated Frequent Itemset Mining
In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining. We tested our implementation with an Nvidia Tesla graphic processor and demonstrate up to 100x speedup as compared with several state-of-the-art FIM algorithms on a CPU.






