Tag: parallel programming
By studying many current GPU computing applications, we have learned that the limits of an application’s scalability are often related to some combination of memory bandwidth saturation, memory contention, imbalanced data distribution, or data structure/algorithm interactions.
With this practical book, experienced C++ developers will learn parallel programming fundamentals with C++ AMP through detailed examples, code snippets, and case studies.
This screencast assumes knowledge of the C++ AMP API, e.g. that you totally understand the matrix multiplication implementation in C++ AMP. Watch this screencast on what features are available in Visual Studio 2012 for debugging C++ AMP code.
ISC 12 Tutorial: Relative, Reverse & CUDA Debugging for Computationally Intensive Application Development
A significant challenge in developing, maintaining and porting numerical simulations is avoiding subtle errors that undermine the validity of the results without causing an obvious failure. This tutorial will share experiences, best practices and debugging techniques for identifying and resolving such defects in parallel applications.
Developing parallel applications that take advantage of all the compute resources available on the underlying system is not a trivial task, and doing that across multiple devices in a standard manner is even more difficult.
New parallel objective function determination methods for the job shop scheduling problem are proposed in this paper, considering makespan and the sum of jobs execution times criteria, however, the methods proposed can be applied also to another popular objective functions such as jobs tardiness or flow time
High-performance computing tools for the integrated assessment and modelling of social-ecological systems
Integrated spatio-temporal assessment and modelling of complex social–ecological systems is required to address global environmental challenges.
In this thesis, a novel three-dimensional anisotropic front propagation algorithm for simulation of geological folds on parallel architecture is presented. The algorithm’s abundant parallelism is demonstrated on multi-core CPUs and GPU architectures.