In this paper, we deal with a GPU implementation of Ant Colony Optimisation (ACO), a population-based optimisation method which comprises two major stages: Tour construction and Pheromone update.
The PAPP workshop focuses on practical aspects of high-level parallel programming: design, implementation and optimisation of high-level programming languages, semantics of parallel languages, formal verification, design or certification of libraries, middle-wares and tools
In this paper we present an implementation of SS-HOPM targeted for a GPU. We describe how to exploit symmetry to save both storage and computation in the two main computational kernels of the algorithm, and for the case of solving many small tensor eigenproblems we show how to map the computation onto a GPU.
We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it.
CAMPAIGN is a library of data clustering algorithms and tools, written in ‘C for CUDA’ for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource.
We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We investigate a Truncated Neumann Series based preconditioner in combination with deflation and compare it with Block Incomplete Cholesky schemes.
We implement the block-parallel arithmetic encoder on GPUs using the NVIDIA GPU and the Computer Unified Device Architecture (CUDA) programming model. The source data sequence is divided into small blocks.
This paper presents the design and implementation of an advanced GPU algorithm for Molecular Dynamics simulations of large membrane regions in the NVT, NVE, and NPT ensembles using explicit solvent and Particle Mesh Ewald (PME) method for treating the conditionally convergent electrostatic component of the classical force field.
In this paper we describe GPA priori, a GPU-accelerated implementation of Frequent Item set Mining. We tested our implementation with an Nvidia Tesla graphic processor and demonstrate up to 100x speedup as compared with several state-of-the-art FIM algorithms on a CPU.