Heterogeneous Parallel Programming class at Coursera
This Coursera course teaches the use of CUDA/OpenCL, OpenACC, and MPI for programming heterogeneous parallel computing systems. It is application oriented and only introduces necessary technological knowledge to solidify understanding.
About the Course
The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge needed for understanding. It covers data parallel execution model, memory models for locality, parallel algorithm patterns, overlapping computation with communication, and scalable programming using joint MPI-CUDA in large scale computing clusters. It has been offered as a one-week intensive summer school for the past four years. In the past two years, there have been ten video-linked academic sides with a total of more than two hundred students each year.
About the Instructor(s)

Course Syllabus
- Week One: Introduction to Heterogeneous Computing and a Quick Overview of CUDA C and MPI, with lab setup and programming assignment of vector addition in CUDA C
- Week Two: Kernel-Based Data Parallel Programming and Memory Model for Locality, with programming assignment of simple and tiled matrix multiplication.
- Week Three: Performance Considerations and Task Parallelism Model, with programming assignment in performance tuning.
- Week Four: Parallel Algorithm Patterns – Reduction/Scan, stencil computation and Sparse computation, with programming assignment of reduction tree.
- Week Five: MPI in a Heterogeneous Computing Cluster: domain partitioning, data distribution, data exchange, and using heterogeneous computing nodes, with programming assignment of a MPI-CUDA application.
- Week Six: Related Programming Models – OpenACC, CUDA FORTRAN, C++AMP, Thrust, and important trends in heterogeneous parallel computing, with final exam.
Recommended Background
Suggested Readings
Category: Computer Science, Training & Events






