Tag: linear algebra
A GPU application for high-order compact finite difference scheme
A high-order compact finite difference scheme for the solution of fluid flow problems is implemented to run on GPU using CUDA
Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware
This paper explores the effect of these different options on the performance of a routine that evaluates sparse matrix–vector products (SpMV) across three different generations of NVIDIA GPU hardware
High Precision Integer Multiplication with a GPU
We have improved our prior implementation of Strassen’s algorithm for high performance multiplication of very large integers on a general purpose graphics processor (GPU).
Evaluation of the Energy Performance of Dense Linear Algebra Kernels on Multi-core and Many-Core Processors
The results of this study provide basic insights on the energy scalability of multi- and many-core designs and multi-threaded software as the building blocks of future EXAFLOPS systems
MAGMA linear algebra library 1.1 is Released
The main focus is the development of a dense linear algebra library for hybrid systems of homogeneous x86-based multicores accelerated with GPUs. MAGMA is designed to be similar to LAPACK in functionality
SpeedIT 2.0 library: Acceleration of Sparse Linear Algebra
The SpeedIT Tools library provides a set of accelerated solvers for sparse linear systems of equations. Acceleration is achieved with a single reasonably priced NVIDIA Graphics Processing Unit (GPU) that supports CUDA and proprietary advanced optimization techniques.
CULA Sparse: a GPU Accelerated Sparse Linear Algebra Library
EM Photonics, the maker of CULA Dense, announced today the general availability of CULA Sparse, a GPU-accelerated library of sparse matrix solvers.
MAGMA: LAPACK for GPGPU
MAGMA 1.0 RC2 is now available. This release includes the MAGMA sources! MAGMA 1.0 RC2 is intended for a single CUDA enabled NVIDIA GPU. It extends version 0.2 by adding support for the Fermi GPUs
An Improved MAGMA GEMM for Fermi GPUs
Authors present an improved matrix-matrix multiplication routine (General Matrix Multiply [GEMM]) in the MAGMA BLAS library that targets the NVIDIA Fermi graphics processing units (GPUs) using Compute Unified Data Architecture (CUDA).
BLAS Comparison on FPGA, CPU and GPU
Authors present a comparison of BLAS Level 2 on CPU, FPGA and GPU platforms in terms of performance and energy efficiency. On the FPGA, we developed a custom implementation for Gaxpy (matrix-vector multiplication), which can be easily extended to matrix-matrix multiplication. They have introduced a bank-interleaved vector memory and a novel matrix memory which can support 2 dimensional burst access.





