TY - RPRT

T1 - Clearer, Simpler and more Efficient LAPACK Routines for Symmetric Positive Definite Band Factorization

AU - Gustavson, Fred G.

AU - Quintania-Orti, Enrique S.

AU - Quintana-Orti, Gregorio

AU - Remon, Alfredo

AU - Wasniewski, Jerzy

PY - 2008

Y1 - 2008

N2 - We describe a minor format change for representing a symmetric band matrix AB using the same array space specified by LAPACK. In LAPACK, band codes operating on the lower part of a symmetric matrix reference matrix element (i, j) as AB1+i−j,j . The format change we propose allows LAPACK band codes to reference the (i, j) element as ABi,j . Doing this yields lower band codes that use standard
matrix terminology so that they become clearer and hence easier to understand. As a second contribution, we simplify the LAPACK Cholesky Band Factorization routine pbtrf by reducing from six to three the number of subroutine calls one needs to invoke during a right-looking block factorization step. Our new routines perform exactly the same number of floating-point arithmetic operations as the current LAPACK routine pbtrf. Almost always they deliver higher performance. The experimental results show that this is especially true on SMP platforms where parallelism is obtained via the use level-3 multi-threaded BLAS. We only consider the lower triangular case of the factorization here; the upper triangular case is currently under investigation.

AB - We describe a minor format change for representing a symmetric band matrix AB using the same array space specified by LAPACK. In LAPACK, band codes operating on the lower part of a symmetric matrix reference matrix element (i, j) as AB1+i−j,j . The format change we propose allows LAPACK band codes to reference the (i, j) element as ABi,j . Doing this yields lower band codes that use standard
matrix terminology so that they become clearer and hence easier to understand. As a second contribution, we simplify the LAPACK Cholesky Band Factorization routine pbtrf by reducing from six to three the number of subroutine calls one needs to invoke during a right-looking block factorization step. Our new routines perform exactly the same number of floating-point arithmetic operations as the current LAPACK routine pbtrf. Almost always they deliver higher performance. The experimental results show that this is especially true on SMP platforms where parallelism is obtained via the use level-3 multi-threaded BLAS. We only consider the lower triangular case of the factorization here; the upper triangular case is currently under investigation.

M3 - Report

T3 - D T U Compute. Technical Report

BT - Clearer, Simpler and more Efficient LAPACK Routines for Symmetric Positive Definite Band Factorization

PB - Technical University of Denmark, DTU Informatics, Building 321

CY - Lyngby

ER -