GTC2013

GPGPU Java Programming, Tutorial 2

| 29 September, 2011

Java Code Geeks published second GPGPU Java lesson. Here we repost it:

Before we dive into coding some background. There are two competing GPGPU SDKs: OpenCL and CUDA. OpenCL is an open standard supported by all GPU vendors (namely AMD, NVIDIA and Intel), while CUDA is NVIDIA specific and will work only on NVIDIA cards. Both SDKs support C/C++ code which of course leaves us Java developers in the cold. So far there is no pure java OpenCL or CUDA support. This is not much help for the Java programmer who needs to take advantage of a GPUs massive parallelism potential unless she fiddles with Java Native interface. Of course there are some Java tools out there that ease the pain of GPGPU Java programming.

This time I will take a look at jcuda and see how we can write a simple GPGPU program. Let’s start by setting up a CUDA GPGPU linux development environment (although Windows and Mac environments shouldn’t be hard to setup either):

Step 1: Install an NVIDIA CUDA enabled GPU in your computer. The NVIDIA Developers‘ site has a list of CUDA enabled GPUs. New NVIDIA GPUs are almost certainly CUDA enabled but just in case check on the card’s specification to make sure…

Step 2: Install the NVIDIA Driver and CUDA SDK. Download them and find installation instructions from here.

Step 3: Go to directory ~/NVIDIA_GPU_Computing_SDK/C/src/deviceQuery and run make.

Step 4: If the compilation was successful go to directory ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release and run the file deviceQuery. You will get lots of technical information about your card.

Step 5: Now that you have a CUDA environment let’s write and compile a CUDA program in C. Write the following code and save it as multiply.cu

#include

__global__ void multiply(float a, float b, float* c)
{
  *c=a*b;
}

int main()
{
  float a, b, c;
  float *c_pointer;
  a=1.35;
  b=2.5;

  cudaMalloc((void**)&c_pointer, sizeof(float));
  multiply<<>>(a, b, c_pointer);
  cudaMemcpy(&c, c_pointer, sizeof(float),cudaMemcpyDeviceToHost);
/*** This is C!!! You manage your garbage on your own! 	***/
  cudaFree(c_pointer);
  printf("Result = %fn",c);
}

Compile it using the cuda compiler and run it:

$ nvcc multiply.cu -o multiply
$ ./multiply
Result = 3.375000
$

So what does the above code do? The multiply function with the __global__ qualifier is called the kernel and is the actual code that will be executed in the GPU. The code in the main function is executed in the CPU as normal C code although there are some semantic differences:

  1. The multiply function is called with the <<<1,1>>> brackets. The two numbers inside the brackets tell CUDA how many times the code should be executed. CUDA enables us to create what are called one, two, or even three-dimensional thread blocks. The numbers in this example indicate a single thread block running in one dimension, thus our code will be executed 1×1=1 time.
  2. The cudaMalloc, cudaMemcpy, and cudaFree functions are used to handle the GPU memory in a similar fashion as we handle the computer’s normal memory in C. The cudaMemcpy function is important since GPU has its own RAM and before we can process any data in the kernel we need to load them to GPU memory. Of course we also need to copy the results back to normal memory when done.

Now that we got the basics of how to execute code in the GPU let’s see how we can run GPGPU code from Java. Remember the kernel code will still be written in C but at least the main function is now java code with the help of jcuda.

Continue tutorial at Java Code Geeks

Tags: , ,

Category: Code Examples