NVIDIA today announced its lineup of world-class keynote speakers for the fourth-annual GPU Technology Conference (GTC), which will be held at the McEnery Convention Center in San Jose, Calif., March 18-21. NVIDIA CEO and co-founder Jen-Hsun Huang will discuss the profound and growing impact of GPU technology in gaming, science, industry, media and entertainment, design and other field
The new prime number, 2 multiplied by itself 57,885,161 times, less one (2^57,885,161-1), has 17,425,170 digits. With 360,000 CPUs peaking at 150 trillion calculations per second, 17th-year GIMPS is the longest continuously-running global supercomputing project in Internet history.
Colfax Developer Training (CDT) will discuss the applicability of the Intel many-core technology, demonstrate the programming models for Intel Xeon Phi coprocessor including native execution and offload-based approaches, and provide extensive optimization techniques.
Intel engineers implemented the Microsoft specification of C++ AMP within OpenCL and using LLVM/Clang so that it can be used cross-platform.
Allinea Software today announced immediate availability of debugging support in Allinea DDT for the latest NVIDIA Tesla K20 family of GPU accelerators, based on the Kepler architecture, and the recently released NVIDIA CUDA 5 toolkit.
Intel is extending open standards support to include OpenCL 1.2 for Intel Xeon Phi coprocessors. OpenCL broadens the parallel programming options from Intel and allows developers to maximize parallel application performance on Intel Xeon Phi coprocessors.
The OpenMM software package enables molecular dynamics (MD) simulations to be accelerated on high performance computer architectures, such as GPUs.
PARALUTION is a library for sparse iterative methods with special focus on multi-core and accelerator technology such as GPUs. The software provides fine-grained parallel preconditioners which can utilize the modern multi-/many-core devices.
Alea.CUDA is a dynamic GPU code generation framework, which programmatically generates GPU code with the full flexibility of CUDA.
amgcl is a simple and generic algebraic multigrid (AMG) hierarchy builder. The constructed hierarchy may be used as a standalone solver or as a preconditioner with some iterative solver. Conjugate Gradient and Stabilized BiConjugate Gradient iterative solvers are provided. It is also possible to use generic solvers from other libraries, e.g. ViennaCL.
AMD CodeXL is a new unified developer tool suite that is designed to improve developer productivity by enabling them to quickly and easily identify performance issues and programming errors in their applications, without requiring source code modifications. The first version includes a GPU debugger, CPU and GPU profilers and a static GPU performance analyzer
Aparapi allows developers to code entirely in Java and at runtime is capable of converting Java bytecode to OpenCL so that it can be executed on the GPU.
The 2013 International Supercomputing Conference Research Paper and Tutorials Committees are now accepting abstracts and tutorial proposals for ISC’13 in Europe. The ISC’13 Call for Papers is supported by the IEEE Germany Section.
Intel is a fierce player and represents the first legitimate competitive threat that Nvidia has seen in the space. I believe, however, that Intel’s competitive position in this nascent field is overstated in the near- to medium-term and that expanding TAM and strong leadership from Nvidia’s team should ensure that the segment’s sales and profitability continue to grow, even if Intel gets a piece of the action.
The HSA Foundation, known as the “HSAF”, is an open, industry standard consortium founded to define and deliver open standards and tools for hardware and software to fully take advantage of high performance of parallel compute engines, and do so in the lowest possible power envelope.
The Keeneland Full Scale System is a 615 TFLOPS HP Proliant SL250-based supercomputer with 264 nodes, where each node contains two Intel Sandy Bridge processors, three NVIDIA M2090 GPU accelerators, 32 GB of host memory, and a Mellanox InfiniBand FDR interconnection network.
Today at SC12 conference Advanced Micro Devices (AMD) introduced what it claims is the most powerful server graphics card, the AMD FirePro S10000.
Nvidia today formally introduced its K20 top of the line family of GPUs. Tesla K20 series would consist of K20X and marginally slower K20.
In this paper, we describe such an application in the research area of black hole physics: studying the late-time behavior of decaying fields in Kerr black hole space-time
We present an implementation of the numerical modeling of elastic waves propagation, in 2D anisotropic materials, using the new parallel computing devices (PCDs)
Novel Dynamic Partial Reconfiguration Implementation of K-Means Clustering on FPGAs: Comparative Results with GPPs and GPUs
In this work, a parameterized implementation of the K-means clustering algorithm in Field Programmable Gate Array (FPGA) is presented and compared with previous FPGA implementation as well as recent implementations on Graphics Processing Units (GPUs) and GPPs.
In this paper we explore the role of hardware accelerators in hyperspectral remote sensing missions and further inter-compare two types of solutions: field programmable gate arrays (FPGAs) and graphics processing units (GPUs)
In order to accelerate the speed of large-scale Chinese spam filtering, we propose a spam filtering solution based on GPU as well as considering classical text classification algorithms
The goal was to find out whether Geant4 physics simulations could benefit from GPU acceleration and how difficult it is to modify Geant4 code to run in a GPU
Why use GPUs from Python? This workshop will provide a brief introduction to GPU programming with Python, including run-time code generation and use of high-level tools like PyCUDA and PyOpenCL, and Loo.py.
In this talk we will provide an introduction to pyOpenCL, python interface to the Open Computing Language. OpenCL is a framework to execute parallel programs across heterogeneous platforms consisting of of both CPUs and GPUs.
This article proposes to address, in a tutorial style, the benefits of using Open Computing Language (OpenCL) as a quick way to allow programmers to express and exploit parallelism in signal processing algorithms, such as those used in error-correcting code systems.
Article demonstrates features in Parallel Computing Toolbox that enable you to run your MATLAB code on a GPU by making a few simple changes to your code
CUDAfy is a set of libraries and tools that permit general purpose programming of CUDA Graphics Processing Units (GPUs) from within the Microsoft .NET framework. John Michael Hauck wrote excellent article on how to transfer your CPU code to the GPU using Traveling Salesman problem as an example.
In a recent blog post NVIDIA discussed the new Dynamic Parallelism feature of upcoming GPU Kepler K20 using Quicksort as an example. Dynamic Parallelism allows the GPU to operate more autonomously from the CPU by generating new work for itself at run-time, from inside a kernel.
For the third consecutive year, AMD invites computing innovators, developers and researchers in the rapidly growing field of heterogeneous computing to submit their latest work and research findings in the form of archival presentations for AMD’s annual developer summit.
ISC events, the organizers of the International Supercomputing Conference (ISC) and ISC Cloud proudly announce a new conference series – the ISC Big Data Conference. Over the course of two days, the forum will scrutinize all facets of big data in order to empower practitioners and vendors who want to learn more about this rapidly evolving set of technologies.
This workshop is designed for those interested in accelerating MD simulations on GPUs and/or developing new MD algorithms that can automatically be implemented and accelerated on GPUs. No programming background is required, though a programming track will be offered for those who are interested in integrating OpenMM into their code.
Jonathan Cohen and the NVIDIA CUDA Library Team present the latest benchmark results using the extensive numerical libraries included with CUDA 5. This webinar will cover all the data points and the significance of the new Math Library Performance Report
2013 GPU Technology Conference (GTC), the world’s premier event for accelerated computing. Learn tips and tricks while networking with fellow enthusiasts and NVIDIA engineers. Hosted by NVIDIA March 18-21 in San Jose, California.
The International Workshop on Runtime and Operating Systems for Supercomputers provides a forum for researchers to exchange ideas and discuss research questions that are relevant to upcoming supercomputers.
In this paper we present our experience in developing an optimizing compiler for general purpose computation on graphics processing units (GPGPU) based on the Cetus compiler framework.
An efficient implementation of Bailey and Borwein’s algorithm for parallel random number generation on graphics processing units
This paper investigates the serial and parallel implementation of a Linear Congruential Generator for Graphics Processing Units (GPU) based on the binary representation of the normal number
In this work, the Nussinov algorithm is analyzed but from the CUDA GPU programming perspective. The algorithm is radically redesigned in order to utilize the highly parallel NUMA architecture of the GPU.
The objective of this study is to present a fast algorithm to simulate a virtual tomato garden based on GPUs acceleration. Parametric L-system is used to describe the topological structure of individual tomato plants.
We demonstrate that the most expensive kernel in the model executes more than three times faster on the GPU than the CPU. These improvements are expected to provide improved efficiency when incorporated into the full model that has been configured for the target problem