X-ray scattering is a valuable tool for measuring the structural properties of materials used in the design and fabrication of energy-relevant nanodevices (e.g., photovoltaic, energy storage, battery, fuel, and carbon capture and sequestration devices) that are key to the reduction of carbon emissions. Although today’s ultra-fast X-ray scattering detectors can provide tremendous information on the structural properties of materials, a primary challenge remains in the analyses of the resulting data. We are developing novel high-performance computing algorithms, codes, and software tools for the analyses of X-ray scattering data. In this paper we describe two such HPC algorithm advances. Firstly, we have implemented a flexible and highly efficient Grazing Incidence Small Angle Scattering (GISAXS) simulation code based on the Distorted Wave Born Approximation (DWBA) theory with C++/CUDA/MPI on a cluster of GPUs. Our code can compute the scattered light intensity from any given sample in all directions of space; thus allowing full construction of the GISAXS pattern. Preliminary tests on a single GPU show speedups over 125x compared to the sequential code, and almost linear speedup when executing across a GPU cluster with 42 nodes, resulting in an additional 40x speedup compared to using one GPU node. Secondly, for the structural fitting problems in inverse modeling, we have implemented a Reverse Monte Carlo simulation algorithm with C++/CUDA using one GPU. Since there are large numbers of parameters for fitting in the in X-ray scattering simulation model, the earlier single CPU code required weeks of runtime. Deploying the AccelerEyes Jacket/Matlab wrapper to use GPU gave around 100x speedup over the pure CPU code. Our further C++/CUDA optimization delivered an additional 9x speedup.
We have designed and implemented two classes of parallel algorithms to help the beam-line scientists and users at the Advanced Light Source to achieve realtime analyses of the X-ray scattering data. Our new DWBA code for simulating the GISAXS patterns has achieved more than 125x speedups on one GPU card compared to the sequential CPU code. Further parallelization across multi-GPU using MPI led to an additional 40x speedup on a 42-nodes GPU cluster. We also developed a new GPU-accelerated inverse modeling code based on the Reverse Monte Carlo method. We have demonstrated over 9x speedup over the previously developed Jacket-based GPU code, which significantly reduced the fitting time for morphology prediction.
In addition to tremendous runtime reduction, our new codes utilize the memory more efficiently, which allows much larger samples with higher resolutions to be simulated than what were previously possible using the old sequential code.
Abhinav Sarje. Large-scale Nanostructure Simulations from X-ray Scattering Data On Graphics Processor Clusters. Lawrence Berkeley National Laboratory Report. 2012. [Free PDF]