Featured Articles

Featured Videos

Featured Videos
  • GPU programming with Python
    GPU programming with Python

    Why use GPUs from Python? This workshop will provide a brief introduction to GPU programming with Python, including run-time code generation and use of high-level tools like PyCUDA and PyOpenCL, and Loo.py.

  • Introduction to pyOpenCL: python interface to the Open Computing Language
    Introduction to pyOpenCL: python interface to the Open Computing Language

    In this talk we will provide an introduction to pyOpenCL, python interface to the Open Computing Language. OpenCL is a framework to execute parallel programs across heterogeneous platforms consisting of of both CPUs and GPUs.

  • Allinea DDT to support CUDA 5 and Kepler K20
    Allinea DDT to support CUDA 5 and Kepler K20

    Allinea Software today announced immediate availability of debugging support in Allinea DDT for the latest NVIDIA Tesla K20 family of GPU accelerators, based on the Kepler architecture, and the recently released NVIDIA CUDA 5 toolkit.

GTC2013

News

Pioneering Genomics Researcher to Give Keynote Addresses at GPU Technology Conference

Pioneering Genomics Researcher to Give Keynote Addresses at GPU Technology Conference

NVIDIA today announced its lineup of world-class keynote speakers for the fourth-annual GPU Technology Conference (GTC), which will be held at the McEnery Convention Center in San Jose, Calif., March 18-21. NVIDIA CEO and co-founder Jen-Hsun Huang will discuss the profound and growing impact of GPU technology in gaming, science, industry, media and entertainment, design and other field

Largest known prime number (17M digits long) is discovered

Largest known prime number (17M digits long) is discovered

The new prime number, 2 multiplied by itself 57,885,161 times, less one (2^57,885,161-1), has 17,425,170 digits. With 360,000 CPUs peaking at 150 trillion calculations per second, 17th-year GIMPS is the longest continuously-running global supercomputing project in Internet history.

Colfax Developer Training: Parallel Programming for Intel Xeon Phi Coprocessors

Colfax Developer Training: Parallel Programming for Intel Xeon Phi Coprocessors

Colfax Developer Training (CDT) will discuss the applicability of the Intel many-core technology, demonstrate the programming models for Intel Xeon Phi coprocessor including native execution and offload-based approaches, and provide extensive optimization techniques.

Intel Shevlin Park: Cross-Platform Implementation of C++ AMP in Clang/LLVM using OpenCL

Intel Shevlin Park: Cross-Platform Implementation of C++ AMP in Clang/LLVM using OpenCL

Intel engineers implemented the Microsoft specification of C++ AMP within OpenCL and using LLVM/Clang so that it can be used cross-platform.

Allinea DDT to support CUDA 5 and Kepler K20

Allinea DDT to support CUDA 5 and Kepler K20

Allinea Software today announced immediate availability of debugging support in Allinea DDT for the latest NVIDIA Tesla K20 family of GPU accelerators, based on the Kepler architecture, and the recently released NVIDIA CUDA 5 toolkit.

OpenCL 1.2 support for Intel Xeon Phi coprocessor

OpenCL 1.2 support for Intel Xeon Phi coprocessor

Intel is extending open standards support to include OpenCL 1.2 for Intel Xeon Phi coprocessors. OpenCL broadens the parallel programming options from Intel and allows developers to maximize parallel application performance on Intel Xeon Phi coprocessors.

Software

OpenMM 5 Now Available

OpenMM 5 Now Available

The OpenMM software package enables molecular dynamics (MD) simulations to be accelerated on high performance computer architectures, such as GPUs.

PARALUTION: a library for sparse iterative methods

PARALUTION: a library for sparse iterative methods

PARALUTION is a library for sparse iterative methods with special focus on multi-core and accelerator technology such as GPUs. The software provides fine-grained parallel preconditioners which can utilize the modern multi-/many-core devices.

Alea.CUDA Framework and Development Process

Alea.CUDA Framework and Development Process

Alea.CUDA is a dynamic GPU code generation framework, which programmatically generates GPU code with the full flexibility of CUDA.

AMGCL: Generic Algebraic Multigrid (AMG) Hierarchy Builder

AMGCL: Generic Algebraic Multigrid (AMG) Hierarchy Builder

amgcl is a simple and generic algebraic multigrid (AMG) hierarchy builder. The constructed hierarchy may be used as a standalone solver or as a preconditioner with some iterative solver. Conjugate Gradient and Stabilized BiConjugate Gradient iterative solvers are provided. It is also possible to use generic solvers from other libraries, e.g. ViennaCL.

AMD CodeXL: comprehensive debugging, profiling, and analysis tool for CPU, GPU and APU

AMD CodeXL: comprehensive debugging, profiling, and analysis tool for CPU, GPU and APU

AMD CodeXL is a new unified developer tool suite that is designed to improve developer productivity by enabling them to quickly and easily identify performance issues and programming errors in their applications, without requiring source code modifications. The first version includes a GPU debugger, CPU and GPU profilers and a static GPU performance analyzer

Aparapi: Open source API for data parallel Java on GPU

Aparapi: Open source API for data parallel Java on GPU

Aparapi allows developers to code entirely in Java and at runtime is capable of converting Java bytecode to OpenCL so that it can be executed on the GPU.

HPC

Submissions for ISC’13 Research Papers, Tutorials is Now Open

Submissions for ISC’13 Research Papers, Tutorials is Now Open

The 2013 International Supercomputing Conference Research Paper and Tutorials Committees are now accepting abstracts and tutorial proposals for ISC’13 in Europe. The ISC’13 Call for Papers is supported by the IEEE Germany Section.

Nvidia: Examining The Threat From Intel’s Xeon Phi

Nvidia: Examining The Threat From Intel’s Xeon Phi

Intel is a fierce player and represents the first legitimate competitive threat that Nvidia has seen in the space. I believe, however, that Intel’s competitive position in this nascent field is overstated in the near- to medium-term and that expanding TAM and strong leadership from Nvidia’s team should ensure that the segment’s sales and profitability continue to grow, even if Intel gets a piece of the action.

Heterogeneous System Architecture: Purpose and Outlook

Heterogeneous System Architecture: Purpose and Outlook

The HSA Foundation, known as the “HSAF”, is an open, industry standard consortium founded to define and deliver open standards and tools for hardware and software to fully take advantage of high performance of parallel compute engines, and do so in the lowest possible power envelope.

Keeneland Project Deploys Full Scale GPU Supercomputing System

Keeneland Project Deploys Full Scale GPU Supercomputing System

The Keeneland Full Scale System is a 615 TFLOPS HP Proliant SL250-based supercomputer with 264 nodes, where each node contains two Intel Sandy Bridge processors, three NVIDIA M2090 GPU accelerators, 32 GB of host memory, and a Mellanox InfiniBand FDR interconnection network.

AMD announced FirePro S10000 dual-GPU card

AMD announced FirePro S10000 dual-GPU card

Today at SC12 conference Advanced Micro Devices (AMD) introduced what it claims is the most powerful server graphics card, the AMD FirePro S10000.

Nvidia introduced K20 top of the line family of Tesla GPUs

Nvidia introduced K20 top of the line family of Tesla GPUs

Nvidia today formally introduced its K20 top of the line family of GPUs. Tesla K20 series would consist of K20X and marginally slower K20.

Publications

High-Precision Numerical Simulations on a CUDA GPU: Kerr Black Hole Tails

High-Precision Numerical Simulations on a CUDA GPU: Kerr Black Hole Tails

In this paper, we describe such an application in the research area of black hole physics: studying the late-time behavior of decaying fields in Kerr black hole space-time

Accelerating numerical modeling of wave propagation through 2-D anisotropic materials using OpenCL

Accelerating numerical modeling of wave propagation through 2-D anisotropic materials using OpenCL

We present an implementation of the numerical modeling of elastic waves propagation, in 2D anisotropic materials, using the new parallel computing devices (PCDs)

Novel Dynamic Partial Reconfiguration Implementation of K-Means Clustering on FPGAs: Comparative Results with GPPs and GPUs

Novel Dynamic Partial Reconfiguration Implementation of K-Means Clustering on FPGAs: Comparative Results with GPPs and GPUs

In this work, a parameterized implementation of the K-means clustering algorithm in Field Programmable Gate Array (FPGA) is presented and compared with previous FPGA implementation as well as recent implementations on Graphics Processing Units (GPUs) and GPPs.

Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing

Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing

In this paper we explore the role of hardware accelerators in hyperspectral remote sensing missions and further inter-compare two types of solutions: field programmable gate arrays (FPGAs) and graphics processing units (GPUs)

Research for Chinese Spam Filtering Based on GPU

Research for Chinese Spam Filtering Based on GPU

In order to accelerate the speed of large-scale Chinese spam filtering, we propose a spam filtering solution based on GPU as well as considering classical text classification algorithms

GPU in Physics Computation: Case Geant4 Navigation

GPU in Physics Computation: Case Geant4 Navigation

The goal was to find out whether Geant4 physics simulations could benefit from GPU acceleration and how difficult it is to modify Geant4 code to run in a GPU

Tutorials

GPU programming with Python

GPU programming with Python

Why use GPUs from Python? This workshop will provide a brief introduction to GPU programming with Python, including run-time code generation and use of high-level tools like PyCUDA and PyOpenCL, and Loo.py.

Introduction to pyOpenCL: python interface to the Open Computing Language

Introduction to pyOpenCL: python interface to the Open Computing Language

In this talk we will provide an introduction to pyOpenCL, python interface to the Open Computing Language. OpenCL is a framework to execute parallel programs across heterogeneous platforms consisting of of both CPUs and GPUs.

Portable LDPC Decoding on Multicores Using OpenCL

Portable LDPC Decoding on Multicores Using OpenCL

This article proposes to address, in a tutorial style, the benefits of using Open Computing Language (OpenCL) as a quick way to allow programmers to express and exploit parallelism in signal processing algorithms, such as those used in error-correcting code systems.

GPU Programming in MATLAB

GPU Programming in MATLAB

Article demonstrates features in Parallel Computing Toolbox that enable you to run your MATLAB code on a GPU by making a few simple changes to your code

CUDAfy Me: Traveling Salesman problem with CUDA from C#

CUDAfy Me: Traveling Salesman problem with CUDA from C#

CUDAfy is a set of libraries and tools that permit general purpose programming of CUDA Graphics Processing Units (GPUs) from within the Microsoft .NET framework. John Michael Hauck wrote excellent article on how to transfer your CPU code to the GPU using Traveling Salesman problem as an example.

Tesla K20 GPU Quicksort with Dynamic Parallelism

Tesla K20 GPU Quicksort with Dynamic Parallelism

In a recent blog post NVIDIA discussed the new Dynamic Parallelism feature of upcoming GPU Kepler K20 using Quicksort as an example. Dynamic Parallelism allows the GPU to operate more autonomously from the CPU by generating new work for itself at run-time, from inside a kernel.

Events

AMD 2013 Developer Summit

AMD 2013 Developer Summit

For the third consecutive year, AMD invites computing innovators, developers and researchers in the rapidly growing field of heterogeneous computing to submit their latest work and research findings in the form of archival presentations for AMD’s annual developer summit.

ISC Big Data conference

ISC Big Data conference

ISC events, the organizers of the International Supercomputing Conference (ISC) and ISC Cloud proudly announce a new conference series – the ISC Big Data Conference. Over the course of two days, the forum will scrutinize all facets of big data in order to empower practitioners and vendors who want to learn more about this rapidly evolving set of technologies.

Workshop: Rapid Molecular Dynamics Prototyping and Simulations on GPUs with OpenMM

Workshop: Rapid Molecular Dynamics Prototyping and Simulations on GPUs with OpenMM

This workshop is designed for those interested in accelerating MD simulations on GPUs and/or developing new MD algorithms that can automatically be implemented and accelerated on GPUs. No programming background is required, though a programming track will be offered for those who are interested in integrating OpenMM into their code.

Webinar: CUDA 5 Math Library Performance Overview

Webinar: CUDA 5 Math Library Performance Overview

Jonathan Cohen and the NVIDIA CUDA Library Team present the latest benchmark results using the extensive numerical libraries included with CUDA 5. This webinar will cover all the data points and the significance of the new Math Library Performance Report

Early Bird Discount for GTC 2013

Early Bird Discount for GTC 2013

2013 GPU Technology Conference (GTC), the world’s premier event for accelerated computing. Learn tips and tricks while networking with fellow enthusiasts and NVIDIA engineers. Hosted by NVIDIA March 18-21 in San Jose, California.

International Workshop on Runtime and Operating Systems for Supercomputers

International Workshop on Runtime and Operating Systems for Supercomputers

The International Workshop on Runtime and Operating Systems for Supercomputers provides a forum for researchers to exchange ideas and discuss research questions that are relevant to upcoming supercomputers.

Also recently

Implementation of a High Performance GPGPU Compiler

Implementation of a High Performance GPGPU Compiler

| 5 February, 2013 | 0 Comments

In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework.

Continue Reading

An efficient implementation of Bailey and Borwein’s algorithm for parallel random number generation on graphics processing units

An efficient implementation of Bailey and Borwein’s algorithm for parallel random number generation on graphics processing units

| 30 January, 2013 | 0 Comments

This paper investigates the serial and parallel implementation of a Linear Congruential Generator for Graphics Processing Units (GPU) based on the binary representation of the normal number

Continue Reading

Parallelization of Dynamic Programming in Nussinov RNA Folding Algorithm on the CUDA GPU

Parallelization of Dynamic Programming in Nussinov RNA Folding Algorithm on the CUDA GPU

| 28 January, 2013 | 0 Comments

In this work, the Nussinov algorithm is analyzed but from the CUDA GPU programming perspective. The algorithm is radically redesigned in order to utilize the highly parallel NUMA architecture of the GPU.

Continue Reading

Realistic Simulation of Tomato Garden Based on GPU

Realistic Simulation of Tomato Garden Based on GPU

| 24 January, 2013 | 0 Comments

The objective of this study is to present a fast algorithm to simulate a virtual tomato garden based on GPUs acceleration. Parametric L-system is used to describe the topological structure of individual tomato plants.

Continue Reading

Progress towards accelerating HOMME on hybrid multi-core systems

Progress towards accelerating HOMME on hybrid multi-core systems

| 22 January, 2013 | 0 Comments

We demonstrate that the most expensive kernel in the model executes more than three times faster on the GPU than the CPU. These improvements are expected to provide improved efficiency when incorporated into the full model that has been configured for the target problem

Continue Reading