Proven Algorithmic Techniques for Many-core Processors Workshop
From Monday, August 13, 2012
To Friday, August 17, 2012
By studying many current GPU computing applications, we have learned that the limits of an application’s scalability are often related to some combination of memory bandwidth saturation, memory contention, imbalanced data distribution, or data structure/algorithm interactions. Successful GPU application developers often adjust their data structures and problem formulation specifically for massive threading and execute their threads leveraging shared on-chip memory resources for bigger impact. We looked for patterns among those transformations, and here present the seven most common and crucial algorithm and data optimization techniques we discovered.
Lecturers from the Virtual School of Computational Science and Engineering will present via videoconference to several university campuses across the USA.
Prerequisites:
- Experience working in a Unix environment
- Experience developing and running scientific codes written in C or C++
- Basic knowledge of CUDA (A short online course, Introduction to CUDA, is available to registered on-site students who need assistance in meeting this prerequisite)
- Although not required, knowledge from “Programming Heterogeneous Parallel Computing Systems,” offered July 9-13 this year is highly recommended.
Instructors:
- Wen-Mei W. Hwu, professor of electrical and computer engineering and principal investigator of the CUDA Center of Excellence, University of Illinois at Urbana-Champaign
- David Kirk, NVIDIA fellow
- John Stratton, Ph.D. candidate in Electrical and Computer Engineering and author of the exercise solutions to “Programming Massively Parallel Processors – A Hands-on Approach”
Course outline:
- Introduction
- why problem formulation and algorithm design choices can have dramatic effect on performance
- common algorithmic strategies for high performance
- Increasing locality in dense arrays
- tiling of data access and layout
- Reducing output interference
- conversion from scatter to gather
- parallelizing reductions and histograms
- Dealing with non-uniform data
- data sorting and binning
- Dealing with sparse data
- sorting and compaction
- Dealing with dynamic data
- parallel queue-based algorithms
- Improving data efficiency in large data traversal
- stencil and other grid-based computation
- Case studies from application domains
- molecular dynamics
- computational fluid dynamics
- medical imaging
- computer vision
- gene sequencing
- Hands-on Lab
Sites
- University of Illinois at Urbana-Champaign, National Center for Supercomputing Applications, Urbana, IL
- Harvard University, Cambridge, MA
- Michigan State University, Institute for Cyber Enabled Research, East Lansing, MI
- Pittsburgh Supercomputing Center, Pittsburgh, PA
- Pennsylvania State University, State College, PA
- Rutgers University, Piscataway, NJ
- University of California Los Angeles, Los Angeles, CA
- University of Oklahoma, Norman, OK
- University of South Carolina, Columbia, SC
- University of Tennessee Knoxville, Knoxville, TN
- University of Utah, Salt Lake City, UT
- Vanderbilt University, Nashville, TN
- Washington University in St. Louis, St. Louis, MO
Visit https://hub.vscse.org to register and choose appropriate site. The workshop fee is $100.
Category: Training & Events






