Can anyone provide me with a parallel algorithm for calculating the sparse Cholesky factorization? It must be suitable for execution on a GPU. Any answers in CUDA, OpenCL, or even pseudo-code would be much appreciated.
Can anyone provide me with a parallel algorithm for calculating the sparse Cholesky factorization?
Share
Generally speaking, direct sparse methods are not a great fit for the GPU. While the best direct solvers (thinking about packages like CHOLMOD, SuperLU, MUMPS here) use strategies to generate dense sub blocks which can be processed using L3 BLAS, the size and shape of the blocks don’t tend to profit from using a GPU BLAS for acceleration. It doesn’t mean it can’t be done, just that the performance improvements may not be worth the effort.
Seeing as you are asking about a sparse Cholesky factorization, I assumed the matrix is symmetric positive definite. In that case you might consider using an iterative solver — there are a number of good implementations of Conjugate Gradient and other Krylov subspace methods with simple preconditioners which might be of some use. The Cusp library for CUDA might be worth investigating if your problem is amenable to iterative methods. The ViennaCL library offers something similar if you are looking for OpenCL.