CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU). CUDA was developed with several design goals in mind:
- Provide a small set of extensions to standard programming languages, like C, that enable a straightforward implementation of parallel algorithms. With CUDA C/C++, programmers can focus on the task of parallelization of the algorithms rather than spending time on their implementation.
- Support heterogeneous computation where applications use both the CPU and GPU. Serial portions of applications are run on the CPU, and parallel portions are offloaded to the GPU. As such, CUDA can be incrementally applied to existing applications. The CPU and GPU are treated as separate devices that have their own memory spaces. This configuration also allows simultaneous computation on the CPU and GPU without contention for memory resources.
CUDA-capable GPUs have hundreds of cores that can collectively run thousands of computing threads. These cores have shared resources including a register file and a shared memory. The on-chip shared memory allows parallel tasks running on these cores to share data without sending it over the system memory bus.
New CUDA 5 Features
CUDA 5 enables developers to take full advantage of the performance of NVIDIA GPUs, including GPU accelerators based on the NVIDIA Kepler compute architecture — the fastest, most efficient, highest-performance computing architecture ever built. Key features include:
- Dynamic Parallelism- Brings GPU acceleration to new algorithms
GPU threads can dynamically spawn new threads, allowing the GPU to adapt to the data. By minimizing the back and forth with the CPU, dynamic parallelism greatly simplifies parallel programming. And it enables GPU acceleration of a broader set of popular algorithms, such as those used in adaptive mesh refinement and computational fluid dynamics applications.
- GPU-Callable Libraries- Enables third-party ecosystem
A new CUDA BLAS library allows developers to use dynamic parallelism for their own GPU-callable libraries. They can design plug-in APIs that allow other developers to extend the functionality of their kernels, and allow them to implement callbacks on the GPU to customize the functionality of third-party GPU-callable libraries. The “object linking” capability provides an efficient and familiar process for developing large GPU applications by enabling developers to compile multiple CUDA source files into separate object files, and link them into larger applications and libraries.
- GPUDirect Support for RDMA- Minimizes system memory bottlenecks
GPUDirect technology enables direct communication between GPUs and other PCI-E devices, and supports direct memory access between network interface cards and the GPU. It also significantly reduces MPISendRecv latency between GPU nodes in a cluster and improves overall application performance.
- NVIDIA Nsight Eclipse Edition- Generate CUDA code quickly and easily
NVIDIA Nsight Eclipse Edition enables programmers to develop, debug and profile GPU applications within the familiar Eclipse-based IDE on Linux and Mac OS X platforms. An integrated CUDA editor and CUDA samples speed the generation of CUDA code, and automatic code refactoring enables easy porting of CPU loops to CUDA kernels. An integrated expert analysis system provides automated performance analysis and step-by-step guidance to fix performance bottlenecks in the code, while syntax highlighting makes it easy to differentiate GPU code from CPU code.
- CUDA Dynamic Parallelism allows
functions running on the GPU to launch kernels using the familiar
syntax and to directly call CUDA Runtime API routines (previously this ability was only available from
functions can now be separately compiled and linked using
. This allows creation of closed-source static libraries of
functions and the ability for these libraries to call user-defined
callback functions. The linker support is considered to be a BETA feature in this release.
- Nsight Eclipse Edition for Linux and Mac OS is an integrated development environment UI that allows developing, debugging, and optimizing CUDA code.
- A new command-line profiler,
, provides summary information about where applications spend the most time, so that optimization efforts can be properly ocused.
- See also the New Features section of this document.
- This release contains the following:
- NVIDIA CUDA Toolkit documentation
- NVIDIA CUDA compiler (
) and supporting tools
- NVIDIA CUDA runtime libraries
- NVIDIA CUDA-GDB debugger
- NVIDIA CUDA-MEMCHECK
- NVIDIA Visual Profiler,
, and command-line profiler
- NVIDIA Nsight Eclipse Edition
- NVIDIA CUBLAS, CUFFT, CUSPARSE, CURAND, Thrust, and
- NVIDIA Performance Primitives (NPP) libraries
Download CUDA 5 at http://developer.nvidia.com/cuda/cuda-downloads