Accelerated Massive Parallelism with Microsoft Visual C++
Kate Gregory and Ade Miller
Capitalize on the faster GPU processors in today’s computers with the C++ AMP code library—and bring massive parallelism to your project. With this practical book, experienced C++ developers will learn parallel programming fundamentals with C++ AMP through detailed examples, code snippets, and case studies. Learn the advantages of parallelism and get best practices for harnessing this technology in your applications.
Discover how to:
- Gain greater code performance using graphics processing units (GPUs)
- Choose accelerators that enable you to write code for GPUs
- Apply thread tiles, tile barriers, and tile static memory
- Debug C++ AMP code with Microsoft Visual Studio®
- Use profiling tools to track the performance of your code
Download the case studies and sample code for each chapter
The N-body case study shows how to use C++ AMP to get the most out of your GPU hardware in a computational application. It contains several implementations of the classic n-body problem that models particles moving under the influence of gravity. The code has implementations for simple and tiled C++ AMP kernels as well as an implementation that runs on more than one GPU. The accompanying CPU based sample also includes single and multi-core implementations of the same algorithm. The case study also shows the use of inter-op with DirectX to minimize the overhead of displaying your application’s results.
The Cartoonizer case study demonstrates braided parallelism, using both the available cores on the CPU and any available GPU(s). It implements color simplification and edge detection algorithms using C++ AMP and orchestrates the processing of images using the Parallel Patterns Library and Asynchronous Agents Library. Single accelerator implementations of simple, tiled and texture based algorithms are all shown. In addition, the case study also shows two approaches for dividing the cartoonizing workload up across more than one accelerator, either by splitting images into subsections or forking the pipeline and processing images on separate accelerators before multiplexing them back into the correct sequence.
The Reduction case study shows twelve different implementations of the reduce algorithm. Each implementation shows different approaches and the book discusses their performance characteristics and the trade-offs associated with each implementation. Reduction is an important data parallel operation so it is worth considering its implementation in some detail.
You will need at least Visual Studio 2012 RC Professional to run the samples. However, you will need Visual Studio 2012 RC Ultimate to use some the parallel diagnostic tools such as the Concurrency Visualizer. The DirectX SDK (June 2010) is also required to build the N-body case study and Chapter 11 samples.
Demos and Talks
Daniel Moth gave two talks on C++ AMP at the GPU Technology Conference 2012. This included a showing the Cartoonizer case study.