We collected notable GPU computing talks at upcoming SIGGRAPH 2012 meeting. All DOIs linked to the corresponding technical papers.
Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.
We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.
We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.
Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., Durand, F. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Trans. Graph. 31 4, Article 32 (July 2012), 12 pages. [DOI: 10.1145/2185520.2185528].
Adaptive Manifolds for Real-Time High-Dimensional Filtering
We present a technique for performing high-dimensional filtering of images and videos in real time. Our approach produces high-quality results and accelerates filtering by computing the filter’s response at a reduced set of sampling points, and using these for interpolation at all N input pixels. We show that for a proper choice of these sampling points, the total cost of the filtering operation is linear both in N and in the dimension d of the space in which the filter operates. As such, ours is the first high-dimensional filter with such a complexity. We present formal derivations for the equations that define our filter, as well as for an algorithm to compute the sampling points. This provides a sound theoretical justification for our method and for its properties. The resulting filter is quite flexible, being capable of producing responses that approximate either standard Gaussian, bilateral, or non-local-means filters. Such flexibility also allows us to demonstrate the first hybrid Euclidean-geodesic filter that runs in a single pass. Our filter is faster and requires less memory than previous approaches, being able to process a 10-Megapixel full-color image at 50 fps on modern GPUs. We illustrate the effectiveness of our approach by performing a variety of tasks ranging from edge-aware color filtering in 5-D, noise reduction (using up to 147 dimensions), single-pass hybrid Euclidean-geodesic filtering, and detail enhancement, among others.
Gastal, E., Oliveira, M. 2012. Adaptive Manifolds for Real-Time High-Dimensional Filtering. ACM Trans. Graph. 31 4, Article 33 (July 2012), 13 pages. [DOI: 10.1145/2185520.2185529].
Diffusion Curve Textures for Resolution Independent Texture Mapping
We introduce a vector representation called diffusion curve textures for mapping diffusion curve images (DCI) onto arbitrary surfaces. In contrast to the original implicit representation of DCIs [Orzan et al. 2008], where determining a single texture value requires iterative computation of the entire DCI via the Poisson equation, diffusion curve textures provide an explicit representation from which the texture value at any point can be solved directly, while preserving the compactness and resolution independence of diffusion curves. This is achieved through a formulation of the DCI diffusion process in terms of Green’s functions. This formulation furthermore allows the texture value of any rectangular region (e.g. pixel area) to be solved in closed form, which facilitates anti-aliasing. We develop a GPU algorithm that renders anti-aliased diffusion curve textures in real time, and demonstrate the effectiveness of this method through high quality renderings with detailed control curves and color variations.
Sun, X., Xie, G., Dong, Y., Lin, S., Xu, W., Wang, W., Tong, X., Guo, B. 2012. Diffusion Curve Textures for Resolution Independent Texture Mapping. ACM Trans. Graph. 31 4, Article 74 (July 2012), 9 pages.
[DOI = 10.1145/2185520.2185570].
Point Sampling with General Noise Spectrum
Point samples with different spectral noise properties (often defined using color names such as white, blue, green, and red) are important for many science and engineering disciplines including computer graphics. While existing techniques can easily produce white and blue noise samples, relatively little is known for generating other noise patterns. In particular, no single algorithm is available to generate different noise patterns according to user-defined spectra. In this paper, we describe an algorithm for generating point samples that match a user-defined Fourier spectrum function. Such a spectrum function can be either obtained from a known sampling method, or completely constructed by the user. Our key idea is to convert the Fourier spectrum function into a differential distribution function that describes the samples’ local spatial statistics; we then use a gradient descent solver to iteratively compute a sample set that matches the target differential distribution function. Our algorithm can be easily modified to achieve adaptive sampling, and we provide a GPU-based implementation. Finally, we present a variety of different sample patterns obtained using our algorithm, and demonstrate suitable applications.
Zhou, Y., Huang, H., Wei, L., Wang, R. 2012. Point Sampling with General Noise Spectrum. ACM Trans. Graph. 31 4, Article 76 (July 2012), 11 pages. [DOI = 10.1145/2185520.2185572].
Tensor Displays: Compressive Light Field Synthesis using Multilayer Displays with Directional Backlighting
We introduce tensor displays: a family of compressive light field displays comprising all architectures employing a stack of timemultiplexed, light-attenuating layers illuminated by uniform or directional backlighting (i.e., any low-resolution light field emitter). We show that the light field emitted by an N-layer,M-frame tensor display can be represented by an Nth-order, rank-M tensor. Using this representation we introduce a unified optimization framework, based on nonnegative tensor factorization (NTF), encompassing all tensor display architectures. This framework is the first to allow joint multilayer, multiframe light field decompositions, significantly reducing artifacts observed with prior multilayer-only and multiframe-only decompositions; it is also the first optimization method for designs combining multiple layers with directional backlighting. We verify the benefits and limitations of tensor displays by constructing a prototype using modified LCD panels and a custom integral imaging backlight. Our efficient, GPU-based NTF implementation enables interactive applications. Through simulations and experiments we show that tensor displays reveal practical architectures with greater depths of field, wider fields of view, and thinner form factors, compared to prior automultiscopic displays.
Wetzstein, G., Lanman, D., Hirsch, M., Raskar, R. 2012. Tensor Displays: Compressive Light Field Synthesis
using Multilayer Displays with Directional Backlighting. ACM Trans. Graph. 31 4, Article 80 (July 2012), 11
pages. [DOI = 10.1145/2185520.2185576].
Adaptive Image-based Intersection Volume
A method for image-based contact detection and modeling, with guaranteed precision on the intersection volume, is presented. Unlike previous image-based methods, our method optimizes a nonuniform ray sampling resolution and allows precise control of the volume error. By cumulatively projecting all mesh edges into a generalized 2D texture, we construct a novel data structure, the Error Bound Polynomial Image (EBPI), which allows efficient computation of the maximum volume error as a function of ray density. Based on a precision criterion, EBPI pixels are subdivided or clustered. The rays are then cast in the projection direction according to the non-uniform resolution. The EBPI data, combined with ray-surface intersection points and normals, is also used to detect transient edges at surface intersections. This allows us to model intersection volumes at arbitrary resolution, while avoiding the geometric computation of mesh intersections. Moreover, the ray casting acceleration data structures can be reused for the generation of high quality images.
Wang, B., Faure, F., Pai, D. 2012. Adaptive Image-based Intersection Volume.
ACM Trans. Graph. 31 4, Article 97 (July 2012), 9 pages. [DOI = 10.1145/2185520.2185593].
Mass Splitting for Jitter-Free Parallel Rigid Body Simulation
We present a parallel iterative rigid body solver that avoids common artifacts at low iteration counts. In large or real-time simulations, iteration is often terminated before convergence to maximize scene size. If the distribution of the resulting residual energy varies too much from frame to frame, then bodies close to rest can visibly jitter. Projected Gauss-Seidel (PGS) distributes the residual according to the order in which contacts are processed, and preserving the order in parallel implementations is very challenging. In contrast, Jacobi-based methods provide order independence, but have slower convergence. We accelerate projected Jacobi by dividing each body mass term in the effective mass by the number of contacts acting on the body, but use the full mass to apply impulses. We further accelerate the method by solving contacts in blocks, providing wallclock performance competitive with PGS while avoiding visible artifacts. We prove convergence to the solution of the underlying linear complementarity problem and present results for our GPU implementation, which can simulate a pile of 5000 objects with no visible jittering at over 60 FPS.
Tonge, R., Benevolenski, F., Voroshilov, A. 2012. Mass Splitting for Jitter-Free Parallel Rigid Body
Simulation. ACM Trans. Graph. 31 4, Article 105 (July 2012), 8 pages. [DOI = 10.1145/2185520.2185601].