CUDAfy is a set of libraries and tools that permit general purpose programming of NVIDIA CUDA Graphics Processing Units (GPUs) from within the Microsoft .NET framework. John Michael Hauck wrote an excellent article on how to transfer your CPU code to the GPU using Traveling Salesman problem as an example.
It may come as a surprise to many that we only use 10% of our brain capacity, regardless of what Snopes says. It is also true that we use only 10% of the computing power of our PCs. This is because our programs are not written to utilize our computer’s “Cudabral Cortex” (commonly called the GPGPU). It is time for us as developers to unleash 100% of our computer’s potential by tapping the GPGPU. By doing this, we can in some cases increase the performance of our programs by a factor of 10 or even 100.
Yet your hesitation to consider using the GPGPU may come from an innate fear of the learning curve involved. Yet the learning curve has recently been straightened out by a wonderful library called Cudafy. Cudafy allows methods written in C# to be executed GPGPU. This series of posts will meander down the path of creating a massively parallel program and explain some of the details along the way. After reading these posts, you will see the world as a massively parallel phenomena just begging to be modeled.
Before diving into GPGPU programming, I want to first present the challenge we will be solving. It is called the Traveling Salesman Problem. Our program will be given the coordinates of several cities. Our task is to find the shortest path that connects all the cities. To keep the problem simple we will assume a flat earth, that 1 degree of latitude equals 1 degree of longitude, and that we will not constrain which city we start in or end up at. To further simplify the task, we will use a brute force method. That is, we will test the length of every permutation of cities to find the shortest one. So, if we are given 5 cities, we will be computing the length of 120 (5!) different paths; if we are given 10 cities, we will be testing 3628800 (10!) paths.
We will review code that solves this problem in three ways:
- A single-threaded application on the CPU,
- A multi-threaded application on a multi-core CPU, and
- A massively parallel application on the GPGPU.
As you examine the CPU-based solutions, you will no doubt notice that parameters are passed to functions that seem unneeded and that the algorithm could be optimized in many ways. These may be legitimate concerns on your part, especially if you are itching to challenge my inevitable benchmark comparisons. Yet, my goal is not to compare a performance crafted CPU solution with a GPGPU solution. Instead, my goal is to show the same problem being solved in exactly the same way across all three solutions computing platforms. Where you take it from here is your business.
Category: Code Examples