GTC2013

Tag: algorithm

Random Sampling for Short Lattice Vectors on Graphics Cards

Random Sampling for Short Lattice Vectors on Graphics Cards

| 19 October, 2011 | 0 Comments

We present a GPU implementation of the Simple Sampling Reduction (SSR) algorithm that searches for short vectors in lattices. SSR makes use of the famous BKZ algorithm. It complements an exhaustive search in a suitable search region to insert random, short vectors to the lattice basis.

Continue Reading

Special Issue on “Aspects of Numerical Algorithms, Parallelization and Applications”

Special Issue on “Aspects of Numerical Algorithms, Parallelization and Applications”

| 17 October, 2011 | 0 Comments

Aspects of Numerical Algorithms, Parallelization and Applications have been a major thrust of research and have application throughout computational science and engineering. Numerical algorithms are widely used by scientists engaged in various scientific areas.

Continue Reading

Accelerating floating-point fitness functions in evolutionary algorithms: a FPGA-CPU-GPU performance comparison

Accelerating floating-point fitness functions in evolutionary algorithms: a FPGA-CPU-GPU performance comparison

| 17 October, 2011 | 0 Comments

The main objective of the work presented in this paper is to compare implementations on FPGAs and CPUs of different fitness functions in evolutionary algorithms in order to study the performance of the floating-point arithmetic in FPGAs and CPUs that is often present in the optimization problems tackled by these algorithms.

Continue Reading

Efficient GPU-based time domain solver for the acoustic wave equation

Efficient GPU-based time domain solver for the acoustic wave equation

| 12 October, 2011 | 0 Comments

An efficient algorithm for time-domain solution of the acoustic wave equation for the purpose of room acoustics is presented. It is based on adaptive rectangular decomposition of the scene and uses analytical solutions within the partitions that rely on spatially invariant speed of sound.

Continue Reading

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

Performance Characterization and Optimization of Atomic Operations on AMD GPUs

| 11 October, 2011 | 0 Comments

In this paper, we first quantify the performance impact of atomic instructions to application kernels on AMD GPUs. We then propose a novel software-based implementation of atomic operations that can significantly improve the overall kernel performance.

Continue Reading

SIGGRAPH Asia 2011: GPU-efficient recursive filtering and summed-area tables

SIGGRAPH Asia 2011: GPU-efficient recursive filtering and summed-area tables

| 10 October, 2011 | 0 Comments

This video present a new algorithmic framework for parallel evaluation. It partitions the image into 2D blocks, with a small band of additional data buffered along each block perimeter.

Continue Reading

Gemma in April: A Matrix-like Parallel Programming Architecture on OpenCL

Gemma in April: A Matrix-like Parallel Programming Architecture on OpenCL

| 6 October, 2011 | 0 Comments

In this article, we propose a novel parallel computing architecture. The architecture includes Gemma, a general parallel programming model, and April, a programming framework based on Gemma and OpenCL. Gemma uses matrix operation, especially matrix multiplication, to describe general computing tasks.

Continue Reading

Fast multipole method on GPU

Fast multipole method on GPU

| 27 September, 2011 | 0 Comments

We propose GPU-friendly data structures and SIMD parallel algorithm flows to facilitate the FMM-based 3-D capacitance extraction on GPU. Effective GPU performance modeling methods are also proposed to properly balance the workload of each critical kernel in our FMMGpu implementation

Continue Reading

Unstructured grid applications on GPU

Unstructured grid applications on GPU

| 23 September, 2011 | 2 Comments

In this paper we analyze the algorithm for unstructured grid analysis on the basis of hardware occupancy and memory access efficiency. In general, the algorithm can be divided into three stages: cell-oriented analysis, edge-oriented analysis and information update, which present different memory access patterns.

Continue Reading

26th IEEE International Parallel & Distributed Processing Symposium

26th IEEE International Parallel & Distributed Processing Symposium

| 21 September, 2011 | 1 Comment

PDPS is an international forum for engineers and scientists from around the world to present their latest research findings in all aspects of parallel computation. In addition to technical sessions of submitted paper presentations, the meeting offers workshops, tutorials, and commercial presentations & exhibits.

Continue Reading