Featured Articles

  • Towards Cost-Effective Bio-inspired Optimization: A Prospective Study on the GPU Architecture
  • Processing piecewise autoregressive model image interpolation algorithm on GPU with CUDA
  • Science Magazine Focus: What It’ll Take to Go Exascale
  • Using explicit platform descriptions to support programming of heterogeneous many-core systems
  • Optimizing the multipole-to-local operator in the fast multipole method for GPU
  • From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming
  • Enhancing data parallelism for ant colony optimisation on GPUs
  • A GPU-Enabled, High-Resolution Cosmological Microlensing Parameter Survey
  • Welcome to the Jungle: Future of Heterogeneous Supercomputing
  • BEAGLE: An Application Programming Interface and HPC Library for Statistical Phylogenetics

Featured Videos

Featured Videos
  • Erlang and CUDA: Concurrent and Fast
    Erlang and CUDA: Concurrent and Fast

    Erlang is ideal for writing robust highly concurrent applications which take full advantage of today’s multi-core computers. NVIDIA’s CUDA API is great for using GPUs to rip through large datasets at incredible speeds. What happens when you put these two together?

  • GPU Hardware Rendering with 3ds Max and Quicksilver
    GPU Hardware Rendering with 3ds Max and Quicksilver

    In the video Gary Davis of Autodesk covers his 90-minute lecture from Autodesk University 2011, focusing on GPU-accelerated rendering within Autodesk 3ds Max 2012.

  • GTC Asia 2011 Closing Ceremony
    GTC Asia 2011 Closing Ceremony

    Featuring Steve Scott, CTO, Tesla, NVIDIA and Michael Jong, Senior Marketing Director, NVIDIA

News

New NVIDIA CUDA Release Makes It Faster and Easier To Accelerate Scientific Research with GPU

New NVIDIA CUDA Release Makes It Faster and Easier To Accelerate Scientific Research with GPU

NVIDIA today released a new version of its CUDA parallel computing platform, which will make it easier for computational biologists, chemists, physicists, geophysicists, other researchers, and engineers to advance their simulations and computational work by using GPUs.

Nvidia 28nm GK104 “Kepler” GPU specs leaked

Nvidia 28nm GK104 “Kepler” GPU specs leaked

Nvidia’s next-generation 28nm GPU architecture, codenamed Kepler, is officially expected to launch in early Q2 2012 according to the latest schedule from the company.

Unified Processing Unit (UPU): A next-generation of CPU architecture?

Unified Processing Unit (UPU): A next-generation of CPU architecture?

ICube is a fabless semiconductor company developing semiconductor System-On-Chip (SOC) solutions based on our Harmony Unified Processor Technology, genuinely integrating two different processor types: a central processing unit (CPU) and a graphics processing unit (GPU), into one unified core.

Events & Training

Call For Papers: High Performance Simulation of biological systems

Call For Papers: High Performance Simulation of biological systems

The goal of this track is to explore the use of emerging parallel computing architectures as well as High Performance Computing systems (Supercomputers, Clusters, Grids) for the simulation of relevant biological systems.

9th Workshop on Practical Aspects of High-Level Parallel Programming (PAPP 2012)

9th Workshop on Practical Aspects of High-Level Parallel Programming (PAPP 2012)

The PAPP workshop focuses on practical aspects of high-level parallel programming: design, implementation and optimisation of high-level programming languages, semantics of parallel languages, formal verification, design or certification of libraries, middle-wares and tools

ICCS 2012 Empowering Science through Computing

ICCS 2012 Empowering Science through Computing

aims to bring together annually researchers and scientists from mathematics and computer science as basic computing disciplines, researchers from various application areas who are pioneering advanced application of computational methods to sciences such as physics, chemistry, life sciences, and engineering

HPC

Science Magazine Focus: What It’ll Take to Go Exascale

Science Magazine Focus: What It’ll Take to Go Exascale

Science magazine published interesting news focus on exascale computing. Article describes challenges on the way the modern supercomputers are built and run

Welcome to the Jungle: Future of Heterogeneous Supercomputing

Welcome to the Jungle: Future of Heterogeneous Supercomputing

In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters.

Massive Data Parallel Computational Framework for Exascale Hybrid Platforms

Massive Data Parallel Computational Framework for Exascale Hybrid Platforms

In this paper we present an alternative approach: a new computational framework for the development of massively data parallel scientific codes applications suitable for use on such petascale/exascale hybrid systems built upon the highly scalable Cactus framework.

Publications

Towards Cost-Effective Bio-inspired Optimization: A Prospective Study on the GPU Architecture

Towards Cost-Effective Bio-inspired Optimization: A Prospective Study on the GPU Architecture

This paper studies the impact of varying the population’s size and the problem’s dimensionality in a parallel implementation, for an NVIDIA GPU, of a canonical genetic algorithm.

Processing piecewise autoregressive model image interpolation algorithm on GPU with CUDA

Processing piecewise autoregressive model image interpolation algorithm on GPU with CUDA

This paper presents a parallel implementation of piecewise autoregressive modeling image interpolation algorithm, using CUDA (Compute Unified Device Architecture) on GPU

Science Magazine Focus: What It’ll Take to Go Exascale

Science Magazine Focus: What It’ll Take to Go Exascale

Science magazine published interesting news focus on exascale computing. Article describes challenges on the way the modern supercomputers are built and run

Software

AIDA64: OpenCL GPGPU Stress Test

AIDA64: OpenCL GPGPU Stress Test

A new OpenCL-based multi-threaded GPU and APU stressing module for the AIDA64 System Stability Test, using GPGPU computational tasks. Support for AMD CrossFireX, AMD DualGraphics and nVIDIA SLI configurations with simultaneous stressing of all available GPU and APU devices.

BEAGLE: An Application Programming Interface and HPC Library for Statistical Phylogenetics

BEAGLE: An Application Programming Interface and HPC Library for Statistical Phylogenetics

BEAGLE is an API and library for high-performance evaluation of phylogenetic likelihoods. The API provides a uniform interface for performing calculations on an expanding variety of computer hardware platforms including GPUs, multicore CPUs, and SSE vectorization.

EpiGPU: exhaustive pairwise epistasis scans on consumer level graphics cards

EpiGPU: exhaustive pairwise epistasis scans on consumer level graphics cards

The implementation presented uses OpenCL – an open-source library designed to run on any commercially available GPU and on any operating system.

Code Example

Trip over threads to trap multicore bugs with Maze

Trip over threads to trap multicore bugs with Maze

What makes debugging of multiprocess and multithread applications so difficult? The first thing that comes to mind of every concurrent programmer is the lack of program execution reproducibility. The reason for such program behavior is the preemptive scheduling employed by real-time operating systems.

Whitepaper: The Xcelerit Software Development Kit

Whitepaper: The Xcelerit Software Development Kit

The paper presents the Xcelerit SDK, a parallel programming toolkit that leverages the dataflow programming model to efficiently use multi-core CPUs, graphics processors (GPUs), and combinations of these in a cluster (or grid) from a single high-level source code.

Hands-on tutorial: An introduction to OpenCL for HPC programmers

Hands-on tutorial: An introduction to OpenCL for HPC programmers

This is “programmer’s introduction” where we cover the ideas behind OpenCL but also show how these ideas are translated into source code. We will do this through a series of progressively more challenging examples

Also Recently

fMRI analysis on the GPU – Possibilities and challenges

fMRI analysis on the GPU – Possibilities and challenges

| 23 January, 2012 | 0 Comments

We describe how to perform preprocessing and statistical analysis of fMRI data on the GPU. Non-parametric tests of fMRI data become practically feasible by using the GPU. GPUs are required to handle the future increase in spatial and temporal resolution. GPUs enable more advanced real-time analysis.

Continue Reading

GPU PRO 3: Advanced Rendering Techniques

GPU PRO 3: Advanced Rendering Techniques

| 23 January, 2012 | 0 Comments

GPU Pro3, the third volume in the GPU Pro book series, offers practical tips and techniques for creating real-time graphics that are useful to beginners and seasoned game and graphics programmers alike.

Continue Reading

Programming GPU Devices Using OpenACC Directives on the Cray XK6 Platform

Programming GPU Devices Using OpenACC Directives on the Cray XK6 Platform

| 21 January, 2012 | 0 Comments

Attendees of this HP2C training event will learn about the Cray XK6 hybrid multi-core and GPU architecture and its programming environment.

Continue Reading

HPC GPU Meetup in Paris January 25th 2012

HPC GPU Meetup in Paris January 25th 2012

| 20 January, 2012 | 0 Comments

On January the 25th 2012, a Meetup of the HPC GPU supercomputing group is held in the Ecole des Mines de Paris. Par4All and HPC project will be present with a talk by Ronan Keryell on automatic paralleization of C and Fortran for multicores and GPU.

Continue Reading

Efficiently Computing Tensor Eigenvalues on a GPU

Efficiently Computing Tensor Eigenvalues on a GPU

| 20 January, 2012 | 0 Comments

In this paper we present an implementation of SS-HOPM targeted for a GPU. We describe how to exploit symmetry to save both storage and computation in the two main computational kernels of the algorithm, and for the case of solving many small tensor eigenproblems we show how to map the computation onto a GPU.

Continue Reading