This Coursera course teaches the use of CUDA/OpenCL, OpenACC, and MPI for programming heterogeneous parallel computing systems. It is application oriented and only introduces necessary technological knowledge to solidify understanding.
Next session: 24 September 2012 (6 weeks long)
Workload: 6-8 hours/week
About the Course
All computing systems, from mobile to supercomputers, are becoming heterogeneous parallel computers using both multi-core CPUs and many-thread GPUs for higher power efficiency and computation throughput. While the computing community is racing to build tools and libraries to ease the use of these heterogeneous parallel computing systems, effective and confident use of these systems will always require knowledge about the low-level programming interfaces in these systems. This course is designed for students in all disciplines to learn the essence of these programming interfaces (CUDA/OpenCL, OpenMP, and MPI) and how they should orchestrate the use of these interfaces to achieve application goals.
The course is unique in that it is application oriented and only introduces the necessary underlying computer science and computer engineering knowledge needed for understanding. It covers data parallel execution model, memory models for locality, parallel algorithm patterns, overlapping computation with communication, and scalable programming using joint MPI-CUDA in large scale computing clusters. It has been offered as a one-week intensive summer school for the past four years. In the past two years, there have been ten video-linked academic sides with a total of more than two hundred students each year.
About the Instructor(s)
Wen-mei W. Hwu is a Professor and holds the Sanders-AMD
Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture
, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of Parallel Computing Institute and director of the IMPACT research group (www.crhc.uiuc.edu/Impact
). He is a co-founder and CTO of MulticoreWare. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Eta Kappa Nu Holmes MacDonald Outstanding Teaching Award, the CAM/IEEE ISCA Influential Paper Award, and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the $208M NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.
- Week One: Introduction to Heterogeneous Computing and a Quick Overview of CUDA C and MPI, with lab setup and programming assignment of vector addition in CUDA C
- Week Two: Kernel-Based Data Parallel Programming and Memory Model for Locality, with programming assignment of simple and tiled matrix multiplication.
- Week Three: Performance Considerations and Task Parallelism Model, with programming assignment in performance tuning.
- Week Four: Parallel Algorithm Patterns – Reduction/Scan, stencil computation and Sparse computation, with programming assignment of reduction tree.
- Week Five: MPI in a Heterogeneous Computing Cluster: domain partitioning, data distribution, data exchange, and using heterogeneous computing nodes, with programming assignment of a MPI-CUDA application.
- Week Six: Related Programming Models – OpenACC, CUDA FORTRAN, C++AMP, Thrust, and important trends in heterogeneous parallel computing, with final exam.
Programming experience in C/C++.
Category: Computer Science, Training & Events