GTC2013

Will OpenCL help displace GPGPU?

| 23 October, 2012

Yossi Kreinin, a software developer at Mobileye, expressed interesting opinion on future of GPUs:

OpenCL is usually perceived as a C dialect for GPGPU programming – doing “general-purpose” computations (not necessarily graphics) on GPU hardware. “It’s like Nvidia’s CUDA C, but portable”. However, OpenCL the language is not really tied to the GPU architecture. That is, hardware could run OpenCL programs and have an architecture very different from a GPU, resulting in a very different performance profile.

OpenCL is possibly the first programming language promising portability across accelerators: “OpenCL is for accelerators what C is for CPUs”. Portability is disruptive. When hardware vendor A displaces vendor B, portable software usually helps a great deal. Will OpenCL – “the GPGPU language” – eventually help displace GPGPU, by facilitating “GP-something-else” – “general-purpose” accelerators which aren’t like GPUs? Yossi  discussed this question on general grounds, and consider two specific examples of recent OpenCL accelerators: Adapteva’s Parallella and ST’s P2012.

Why displace GPGPU?

First of all, whether GPGPU is likely to be displaced or not – what could “GP-something-else” possibly give us that GPGPU doesn’t?

There are two directions from which benefits could come – you could call them two opposite directions:

  1. Let software (ab)use more types of special-purpose accelerators. GPGPU lets you utilize (abuse?) your GPU for general-purpose stuff. It could be nice to have “GPDSP” to utilize the DSPs in your phone, “GPISP” to utilize the ISP, “GPCVP” to utilize computer vision accelerators likely to emerge in the future, etc. From GPGPU to GP-everything.
  2. Give software accelerators which are more general-purpose to begin with. GPGPU means doing your general-purpose stuff under the constraints imposed by the GPU architecture. An OpenCL accelerator lifting some of these constraints could be very welcome.

Could OpenCL help us get benefits from any of the directions (1) and (2)?

(1) is about making use of anal-retentive, efficiency-obsessed, weird, incompatible hardware. It’s rather hard, for OpenCL or for any other portable, reasonably “pretty” language.

OpenCL does provide constructs more or less directly mapping to some of the “ugly” features common to many accelerators, for example:

  • Explicitly addressed local memory (as opposed to cache)
  • DMA (bulk memory transfers)
  • Short vector data types to make use of SIMD opcodes
  • Light-weight threads and barriers

But even with GPUs, OpenCL can’t target all of the GPU’s resources. There’s the subset of the GPU accessible to GPGPU programs – and then there are the more idiosyncratic and less flexible parts used for actual graphics processing.

Read full post →

Tags: , , , ,

Category: Computer Science

  • Crni

    Yossi should try for himself to write portable and efficient implementation of an algorithm according to Yossi choice in OpenCL. When Yossi try this, Yossi will see that performance is not portable, and that for any different target Yossi will have to spend very significant time in doing various optimization; Yossi will be also able to see that the optimization are oftentimes mutually exclusive. Then Yossi will be able to appreciate that practitioners will always come up first to follow obvious ideas, if only they really were that obvious.