\documentclass[11pt]{article} \batchmode \newcommand{\m}[1]{{\bf{#1}}} % for matrices and vectors \newcommand{\tr}{^{\sf T}} % transpose \topmargin 0in \textheight 9in \oddsidemargin 0pt \evensidemargin 0pt \textwidth 6.5in %------------------------------------------------------------------------------ \begin{document} %------------------------------------------------------------------------------ \title{User Guide for KLU and BTF} \author{ Timothy A. Davis\thanks{ DrTimothyAldenDavis@gmail.com, http://www.suitesparse.com. This work was supported by Sandia National Labs, and the National Science Foundation. Portions of the work were done while on sabbatical at Stanford University and Lawrence Berkeley National Laboratory (with funding from Stanford University and the SciDAC program). } \and Eka Palamadai Natarajan} \input{klu_version.tex} \maketitle %------------------------------------------------------------------------------ \begin{abstract} KLU is a set of routines for solving sparse linear systems of equations. It is particularly well-suited to matrices arising in SPICE-like circuit simulation applications. It relies on a permutation to block triangular form (BTF), several methods for finding a fill-reducing ordering (variants of approximate minimum degree, and nested dissection), and a sparse left-looking LU factorization method to factorize each block. A MATLAB interface is included. KLU appears as Collected Algorithm 907 of the ACM \cite{DavisNatarajan10}. \end{abstract} %------------------------------------------------------------------------------ \newpage \tableofcontents \newpage %------------------------------------------------------------------------------ \section{License and Copyright} %------------------------------------------------------------------------------ KLU, Copyright\copyright 2004-2023 University of Florida. All Rights Reserved. KLU is available under alternate licenses; contact T. Davis for details. {\bf KLU License:} see KLU/Doc/License.txt for the license. {\bf Availability:} {\tt http://www.suitesparse.com} {\bf Acknowledgments:} This work was supported by Sandia National Laboratories (Mike Heroux) and the National Science Foundation under grants 0203270 and 0620286. %------------------------------------------------------------------------------ \newpage \section{Overview} %------------------------------------------------------------------------------ KLU is a set of routines for solving sparse linear systems of equations. It first permutes the matrix into upper block triangular form, via the BTF package. This is done by first finding a permutation for a zero-free diagonal (a maximum transversal) \cite{Duff81}. If there is no such permutation, then the matrix is structurally rank-deficient, and is numerically singular. Next, Tarjan's method \cite{Duff78a,Tarjan72} is used to find the strongly-connected components of the graph. The block triangular form is essentially unique; any method will lead to the same number and sizes of blocks, although the ordering of the blocks may vary (consider a diagonal matrix, for example). Assuming the matrix has full structural rank, the permuted matrix has the following form: \[ PAQ = \left[ \begin{array}{ccc} A_{11} & \cdots & A_{1k} \\ & \ddots & \vdots \\ & & A_{kk} \\ \end{array} \right], \] where each diagonal block is square with a zero-free diagonal. Next, each diagonal block is factorized with a sparse left-looking method \cite{GilbertPeierls88}. The kernel of this factorization method is an efficient method for solving $Lx=b$ when $L$, $x$, and $b$ are all sparse. This kernel is used to compute each column of $L$ and $U$, one column at a time. The total work performed by this method is always proportional to the number of floating-point operations, something that is not true of any other sparse LU factorization method. Prior to factorizing each diagonal block, the blocks are ordered to reduce fill-in. By default, the symmetric approximate minimum degree (AMD) ordering is used on $A_{ii}+A_{ii}^T$ \cite{AmestoyDavisDuff96,AmestoyDavisDuff03}. Another ordering option is to find a column ordering via COLAMD \cite{DavisGilbertLarimoreNg00_algo,DavisGilbertLarimoreNg00}. Alternatively, a permutation can be provided by the user, or a pointer to a user-provided ordering function can be passed, which is then used to order each block. Only the diagonal blocks need to be factorized. Consider a linear system where the matrix is permuted into three blocks, for example: \[ \left[ \begin{array}{ccc} A_{11} & A_{12} & A_{13} \\ & A_{22} & A_{23} \\ & & A_{33} \\ \end{array} \right] \left[ \begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ \end{array} \right] = \left[ \begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \\ \end{array} \right]. \] The non-singular system $A_{33} x_{3} = b_{3}$ can first be solved for $x_{3}$. After a block back substitution, the resulting system becomes \[ \left[ \begin{array}{cc} A_{11} & A_{12} \\ & A_{22} \\ \end{array} \right] \left[ \begin{array}{c} x_{1} \\ x_{2} \\ \end{array} \right] = \left[ \begin{array}{c} b_{1} - A_{13} x_{3}\\ b_{2} - A_{23} x_{3}\\ \end{array} \right] = \left[ \begin{array}{c} b'_{1} \\ b'_{2} \\ \end{array} \right] \] and the $A_{22} x_{2} = b'_{2}$ system can be solved for $x_{2}$. The primary advantage of this method is that no fill-in occurs in the off-diagonal blocks ($A_{12}$, $A_{13}$, and $A_{23}$). This is particular critical for sparse linear systems arising in SPICE-like circuit simulation \cite{Kundert86,KundertSangiovanniVincentelli85,NagelPederson,Quarles:M89/42}. Circuit matrices are typically permutable into block triangular form, with many singletons (1-by-1 blocks). They also often have a handful of rows and columns with many nonzero entries, due to voltage and current sources. These rows and columns are pushed into the upper block triangular form, and related to the singleton blocks (for example, $A_{33}$ in the above system is 1-by-1, and the column in $A_{13}$ and $A_{23}$ has many nonzero entries). Since these nearly-dense rows and columns do not appear in the LU factorization of the diagonal blocks, they cause no fill-in. The structural rank of a matrix is based solely on the pattern of its entries, not their numerical values. With random entries, the two ranks are equal with probability one. The structural rank of any matrix is an upper bound on the numerical rank. The maximum transversal algorithm in the BTF package is useful in determining if a matrix is structurally rank deficient, and if so, it reveals a (non-unique) set of rows and columns that contribute to that rank deficiency. This is useful in determining what parts of a circuit are poorly formulated (such as dangling components). When ordered and factorized with KLU, very little fill-in occurs in the resulting LU factors, which means that there is little scope for use of the BLAS \cite{ACM679a}. Sparse LU factorization methods that use the BLAS (such as SuperLU \cite{SuperLU99} amd UMFPACK \cite{Davis03_algo,Davis03}) are slower than KLU when applied to sparse matrices arising in circuit simulation. Both KLU and SuperLU are based on Gilbert and Peierl's left-looking method \cite{GilbertPeierls88}. SuperLU uses supernodes, but KLU does not; thus the name {\em KLU} refers to a ``Clark Kent'' LU factorization algorithm (what SuperLU was before it became Super). For details of the permutation to block triangular form, left-looking sparse LU factorization, and approximate minimum degree, refer to \cite{Davis06book}. Concise versions of these methods can be found in the CSparse package. KLU is also the topic of a Master's thesis by Palamadai Natarajan \cite{Palamadai05}; a copy of the thesis can be found in the {\tt KLU/Doc} directory. It includes a description of an earlier version of KLU; some of the function names and parameter lists have changed in this version. The descriptions of the methods used still applies to the current version of KLU, however. KLU appears as {\em Algorithm 907: KLU, a direct sparse solver for circuit simulation problems}, ACM Transactions on Mathematical Software, vol 37, no 3, 2010. %------------------------------------------------------------------------------ \section{Availability} %------------------------------------------------------------------------------ KLU and its required ordering packages (BTF, COLAMD, AMD, and SuiteSparse\_config) are available at \newline {\tt http://www.suitesparse.com.} In addition, KLU can make use of any user-provided ordering function. One such function is included, which provides KLU with an interface to the ordering methods used in CHOLMOD \cite{ChenDavisHagerRajamanickam06}, such as METIS, a nested dissection method \cite{KarypisKumar98e}. After permutation to block triangular form, circuit matrices have very good node separators, and are thus excellent candidates for nested dissection. The METIS ordering takes much more time to compute than the AMD ordering, but if the ordering is reused many times (typical in circuit simulation) the better-quality ordering can pay off in lower total simulation time. To use KLU, you must obtain the KLU, BTF, SuiteSparse\_config, AMD, and COLAMD packages in the SuiteSparse suite of sparse matrix libraries. See {\tt http://www.suitesparse.com} for each of these packages. They are also all contained within the single {\tt SuiteSparse.zip} or {\tt SuiteSparse.tar.gz} distribution. %------------------------------------------------------------------------------ \section{Using KLU and BTF in MATLAB} %------------------------------------------------------------------------------ KLU has a single MATLAB interface which provides several options for factorizing a matrix and/or using the factors to solve a linear system. The following is a synopsis of its use. For more details, type {\tt help klu} in MATLAB. {\footnotesize \begin{verbatim} LU = klu (A) factorizes R\A(p,q) into L*U+F, returning a struct x = klu (A,'\',b) x = A\b x = klu (b,'/',A) x = b/A x = klu (LU,'\',b) x = A\b, where LU = klu(A) x = klu (b,'/',LU) x = b/A, where LU = klu(A) \end{verbatim} } With a single input {\tt klu(A)} returns a MATLAB struct containing the LU factors. The factorization is in the form \verb'L*U + F = R \ A(p,q)' where {\tt L*U} is the LU factorization of just the diagonal blocks of the block triangular form, {\tt F} is a sparse matrix containing the entries in the off-diagonal blocks, {\tt R} is a diagonal matrix containing the row scale factors, and {\tt p} and {\tt q} are permutation vectors. The {\tt LU} struct also contains a vector {\tt r} which describes the block boundaries (the same as the third output parameter of {\tt dmperm}). The {\tt k}th block consists of rows and columns {\tt r(k)} to {\tt r(k+1)-1} in the permuted matrix {\tt A(p,q)} and the factors {\tt L} and {\tt U}. An optional final input argument ({\tt klu(A,opts)} for example) provides a way of modifying KLU's user-definable parameters, including a partial pivoting tolerance and ordering options. A second output argument ({\tt [LU,info] = klu ( ... )}) provides statistics on the factorization. The BTF package includes three user-callable MATLAB functions which replicate most of features of the MATLAB built-in {\tt dmperm} function, and provide an additional option which can significantly limit the worst-case time taken by {\tt dmperm}. For more details, type {\tt help btf}, {\tt help maxtrans}, and {\tt help strongcomp} in MATLAB. Additional information about how these functions work can be found in \cite{Davis06book}. {\footnotesize \begin{verbatim} [p,q,r] = btf (A) similar to [p,q,r] = dmperm (A) q = maxtrans (A) similar to q = dmperm (A') [p,r] = strongcomp (A) similar to [p,q,r] = dmperm (A + speye(n)) \end{verbatim} } Both {\tt btf} and {\tt maxtrans} include a second option input, {\tt maxwork}, which limits the total work performed in the maximum transversal to {\tt maxwork * nnz(A)}. The worst-case time taken by the algorithm is $O$ ({\tt n * nnz(A)}), where the matrix {\tt A} is {\tt n}-by-{\tt n}, but this worst-case time is rarely reached in practice. To use the KLU and BTF functions in MATLAB, you must first compile and install them. In the {\tt BTF/MATLAB} directory, type {\tt btf\_install}, and then type {\tt klu\_install} in the {\tt KLU/MATLAB} directory. Alternatively, if you have the entire SuiteSparse, simply run the {\tt SuiteSparse\_install} function in the {\tt SuiteSparse} directory. After running the installation scripts, type {\tt pathtool} and save your path for future MATLAB sessions. If you cannot save your path because of file permissions, edit your {\tt startup.m} by adding {\tt addpath} commands (type {\tt doc startup} and {\tt doc addpath} for more information). %------------------------------------------------------------------------------ \section{Using KLU and BTF in a C program} \label{Cversion} %------------------------------------------------------------------------------ KLU and BTF include the following C-callable functions. Each function is available in two or four versions: with \verb'int32_t' or \verb'int64_t' integers, and (for functions that deal with numerical values), with {\tt double} or complex {\tt double} values. The usage of real and complex, and \verb'int32_t' and \verb'int64_t', must not be mixed, except that some functions can be used for both real and complex cases. %------------------------------------------------------------------------------ \subsection{KLU Common object} %------------------------------------------------------------------------------ The {\tt klu\_common} object ({\tt klu\_l\_common} for the \verb'int64_t' version) contains user-definable parameters and statistics returned from KLU functions. This object appears in every KLU function as the last parameter. Details are given in the {\tt klu.h} include file, which also appears below in Section~\ref{klu_include}. Primary parameters and statistics are summarized below. The defaults are chosen specifically for circuit simulation matrices. \begin{itemize} \item {\tt tol}: partial pivoting tolerance. If the diagonal entry has a magnitude greater than or equal to {\tt tol} times the largest magnitude of entries in the pivot column, then the diagonal entry is chosen. Default value: 0.001. \item {\tt ordering}: which fill-reducing ordering to use: 0 for AMD, 1 for COLAMD, 2 for a user-provided permutation {\tt P} and {\tt Q} (or a natural ordering if {\tt P} and {\tt Q} are NULL), or 3 for the {\tt user\_order} function. Default: 0 (AMD). \item {\tt scale}: whether or not the matrix should be scaled. If {\tt scale < 0}, then no scaling is performed and the input matrix is not checked for errors. If {\tt scale >= 0}, the input matrix is check for errors. If {\tt scale=0}, then no scaling is performed. If {\tt scale=1}, then each row of {\tt A} is divided by the sum of the absolute values in that row. If {\tt scale=2}, then each row of {\tt A} is divided by the maximum absolute value in that row. Default: 2. \item {\tt btf}: if nonzero, then BTF is used to permute the input matrix into block upper triangular form. This step is skipped if {\tt Common.btf} is zero. Default: 1. \item {\tt maxwork}: sets an upper limit on the amount of work performed in {\tt btf\_maxtrans} to \newline {\tt maxwork*nnz(A)}. If the limit is reached, a partial zero-free diagonal might be found. This has no effect on whether or not the matrix can be factorized, since the matrix can be factorized with no BTF pre-ordering at all. This option provides a tradeoff between the effectiveness of the BTF ordering and the cost to compute it. A partial result can result in fewer, and larger, blocks in the BTF form, resulting to more work required to factorize the matrix. No limit is enforced if {\tt maxwork <= 0}. Default: 0. \item {\tt user\_order}: a pointer to a function that can be provided by the application that uses KLU, to redefine the fill-reducing ordering used by KLU for each diagonal block. The \verb'int32_t' and \verb'int64_t' prototypes must be as follows: {\footnotesize \begin{verbatim} int32_t my_ordering_function (int32_t n, int32_t Ap [ ], int32_t Ai [ ], int32_t Perm [ ], klu_common *Common) ; int64_t my_long_ordering_function (int64_t n, int64_t Ap [ ], int64_t Ai [ ], int64_t Perm [ ], klu_l_common *Common) ; \end{verbatim} } The function should return 0 if an error occurred, and either -1 or a positive (nonzero) value if no error occurred. If greater than zero, then the return value is interpreted by KLU as an estimate of the number of nonzeros in $L$ or $U$ (whichever is greater), when the permuted matrix is factorized. Only an estimate is possible, since partial pivoting with row interchanges is performed during numerical factorization. The input matrix is provided to the function in the parameters {\tt n}, {\tt Ap}, and {\tt Ai}, in compressed-column form. The matrix {\tt A} is {\tt n}-by-{\tt n}. The {\tt Ap} array is of size {\tt n+1}; the {\tt j}th column of {\tt A} has row indices {\tt Ai[Ap[j] ... Ap[j+1]-1]}. The {\tt Ai} array is of size {\tt Ap[n]}. The first column pointer {\tt Ap[0]} is zero. The row indices might not appear sorted in each column, but no duplicates will appear. The output permutation is to be passed back in the {\tt Perm} array, where {\tt Perm[k]=j} means that row and column {\tt j} of {\tt A} will appear as the {\tt k}th row and column of the permuted matrix factorized by KLU. The {\tt Perm} array is already allocated when it is passed to the user function. The user function may use, and optionally modify, the contents of the {\tt klu\_common Common} object. In particular, prior to calling KLU, the user application can set both {\tt Common.user\_order} and {\tt Common.user\_data}. The latter is a {\tt void *} pointer that KLU does not use, except to pass to the user ordering function pointed to by {\tt Common.user\_order}. This is a mechanism for passing additional arguments to the user function. An example user function is provided in the {\tt KLU/User} directory, which provides an interface to the ordering method in CHOLMOD. \end{itemize} %------------------------------------------------------------------------------ \subsection{KLU Symbolic object} %------------------------------------------------------------------------------ KLU performs its sparse LU factorization in two steps. The first is purely symbolic, and does not depend on the numerical values. This analysis returns a {\tt klu\_symbolic} object ({\tt klu\_l\_symbolic} in the \verb'int64_t' version). The {\tt Symbolic} object contains a pre-ordering which combines the block triangular form with the fill-reducing ordering, and an estimate of the number of nonzeros in the factors of each block. Its size is thus modest, only proportional to {\tt n}, the dimension of {\tt A}. It can be reused multiple times for the factorization of a sequence of matrices with identical nonzero pattern. Note: a {\em nonzero} in this sense is an entry present in the data structure of {\tt A}; such entries may in fact be numerically zero. %------------------------------------------------------------------------------ \subsection{KLU Numeric object} %------------------------------------------------------------------------------ The {\tt Numeric} object contains the numeric sparse LU factorization, including the final pivot permutations. To solve a linear system, both the {\tt Symbolic} and {\tt Numeric} objects are required. %------------------------------------------------------------------------------ \subsection{A sparse matrix in KLU} %------------------------------------------------------------------------------ The input matrix provided to KLU is in sparse compressed-column form, which is the same data structure used internally in MATLAB, except that the version used here allows for the row indices to appear in any ordering, and this version also allows explicit zero entries to appear in the data structure. The same data structure is used in CSparse, which is fully documented in \cite{Davis06book}. If you wish to use a simpler input data structure, consider creating a triplet matrix in CSparse (or CXSparse if you use the long and/or complex versions of KLU), and then convert this data structure into a sparse compressed-column data structure, using the CSparse {\tt cs\_compress} and {\tt cs\_dupl} functions. Similar functions are available in CHOLMOD {\tt cholmod\_triplet\_to\_sparse}. The matrix is described with four parameters: \begin{itemize} \item {\tt n}: an integer scalar. The matrix is {\tt n}-by-{\tt n}. Note that KLU only operates on square matrices. \item {\tt Ap}: an integer array of size {\tt n+1}. The first entry is {\tt Ap[0]=0}, and the last entry {\tt nz=Ap[n]} is equal to the number of entries in the matrix. \item {\tt Ai}: an integer array of size {\tt nz = Ap[n]}. The row indices of entries in column {\tt j} of {\tt A} are located in {\tt Ai [Ap [j] ... Ap [j+1]-1]}. The matrix is zero-based; row and column indices are in the range 0 to {\tt n-1}. \item {\tt Ax}: a {\tt double} array of size {\tt nz} for the real case, or {\tt 2*nz} for the complex case. For the complex case, the real and imaginary parts are interleaved, compatible with Fortran and the ANSI C99 Complex data type. KLU does not rely on the ANSI C99 data type, however, for portability reasons. The numerical values in column {\tt j} of a real matrix are located in {\tt Ax [Ap [j] ... Ap [j+1]-1]}. For a complex matrix, they appear in {\tt Ax [2*Ap [j] ... 2*Ap [j+1]-1]}, as real/imaginary pairs (the real part appears first, followed by the imaginary part). \end{itemize} %------------------------------------------------------------------------------- \subsection{{\tt klu\_defaults}: set default parameters} %------------------------------------------------------------------------------- This function sets the default parameters for KLU and clears the statistics. It may be used for either the real or complex cases. A value of 0 is returned if an error occurs, 1 otherwise. This function {\bf must} be called before any other KLU function can be called. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; klu_common Common ; ok = klu_defaults (&Common) ; /* real or complex */ #include "klu.h" klu_l_common Common ; int ok = klu_l_defaults (&Common) ; /* real or complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_analyze}: order and analyze a matrix} %------------------------------------------------------------------------------- The following usage returns a {\tt Symbolic} object that contains the fill-reducing ordering needed to factorize the matrix {\tt A}. A NULL pointer is returned if a failure occurs. The error status for this function, and all others, is returned in {\tt Common.status}. These functions may be used for both real and complex cases. The AMD ordering is used if {\tt Common.ordering = 0}, COLAMD is used if it is 1, the natural ordering is used if it is 2, and the user-provided {\tt Common.user\_ordering} is used if it is 3. {\footnotesize \begin{verbatim} #include "klu.h" int32_t n, Ap [n+1], Ai [nz] ; klu_symbolic *Symbolic ; klu_common Common ; Symbolic = klu_analyze (n, Ap, Ai, &Common) ; /* real or complex */ #include "klu.h" int64_t n, Ap [n+1], Ai [nz] ; klu_l_symbolic *Symbolic ; klu_l_common Common ; Symbolic = klu_l_analyze (n, Ap, Ai, &Common) ; /* real or complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_analyze\_given}: order and analyze a matrix} %------------------------------------------------------------------------------- In this routine, the fill-reducing ordering is provided by the user ({\tt Common.ordering} is ignored). Instead, the row permutation {\tt P} and column permutation {\tt Q} are used. These are integer arrays of size {\tt n}. If NULL, a natural ordering is used (so to provide just a column ordering, pass {\tt Q} as non-NULL and {\tt P} as NULL). A NULL pointer is returned if an error occurs. These functions may be used for both real and complex cases. {\footnotesize \begin{verbatim} #include "klu.h" int32_t n, Ap [n+1], Ai [nz], P [n], Q [n] ; klu_symbolic *Symbolic ; klu_common Common ; Symbolic = klu_analyze_given (n, Ap, Ai, P, Q, &Common) ; /* real or complex */ #include "klu.h" int64_t n, Ap [n+1], Ai [nz], P [n], Q [n] ; klu_l_symbolic *Symbolic ; klu_l_common Common ; Symbolic = klu_l_analyze_given (n, Ap, Ai, P, Q, &Common) ; /* real or complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_factor}: numerical factorization} %------------------------------------------------------------------------------- The {\tt klu\_factor} function factorizes a matrix, using a sparse left-looking method with threshold partial pivoting. The inputs {\tt Ap} and {\tt Ai} must be unchanged from the previous call to {\tt klu\_analyze} that created the {\tt Symbolic} object. A NULL pointer is returned if an error occurs. {\footnotesize \begin{verbatim} #include "klu.h" int32_t Ap [n+1], Ai [nz] ; double Ax [nz], Az [2*nz] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; Numeric = klu_factor (Ap, Ai, Ax, Symbolic, &Common) ; /* real */ Numeric = klu_z_factor (Ap, Ai, Az, Symbolic, &Common) ; /* complex */ #include "klu.h" int64_t Ap [n+1], Ai [nz] ; double Ax [nz], Az [2*nz] ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; Numeric = klu_l_factor (Ap, Ai, Ax, Symbolic, &Common) ; /* real */ Numeric = klu_zl_factor (Ap, Ai, Az, Symbolic, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_solve}: solve a linear system} %------------------------------------------------------------------------------- Solves the linear system $Ax=b$, using the {\tt Symbolic} and {\tt Numeric} objects. The right-hand side {\tt B} is overwritten with the solution on output. The array {\tt B} is stored in column major order, with a leading dimension of {\tt ldim}, and {\tt nrhs} columns. Thus, the real entry $b_{ij}$ is stored in {\tt B [i+j*ldim]}, where {\tt ldim >= n} must hold. A complex entry $b_{ij}$ is stored in {\tt B [2*(i+j*ldim)]} and {\tt B [2*(i+j*ldim)+1]} (for the real and imaginary parts, respectively). Returns 1 if successful, 0 if an error occurs. {\footnotesize \begin{verbatim} #include "klu.h" int32_t ldim, nrhs ; int ok ; double B [ldim*nrhs], Bz [2*ldim*nrhs] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ ok = klu_z_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ; /* complex */ #include "klu.h" int64_t ldim, nrhs ; int ok ; double B [ldim*nrhs], Bz [2*ldim*nrhs] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_l_solve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ ok = klu_zl_solve (Symbolic, Numeric, ldim, nrhs, Bz, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_tsolve}: solve a transposed linear system} %------------------------------------------------------------------------------- Solves the linear system $A^Tx=b$ or $A^Hx=b$. The {\tt conj\_solve} input is 0 for $A^Tx=b$, or nonzero for $A^Hx=b$. Otherwise, the function is identical to {\tt klu\_solve}. {\footnotesize \begin{verbatim} #include "klu.h" int32_t ldim, nrhs ; int ok ; double B [ldim*nrhs], Bz [2*ldim*nrhs] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ ok = klu_z_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ; /* complex */ #include "klu.h" int64_t ldim, nrhs ; int ok ; double B [ldim*nrhs], Bz [2*ldim*nrhs] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_l_tsolve (Symbolic, Numeric, ldim, nrhs, B, &Common) ; /* real */ ok = klu_zl_tsolve (Symbolic, Numeric, ldim, nrhs, Bz, conj_solve, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_refactor}: numerical refactorization} %------------------------------------------------------------------------------- The {\tt klu\_refactor} function takes as input the {\tt Numeric} object created by {\tt klu\_factor} (or as modified by a previous call to {\tt klu\_refactor}). It factorizes a new matrix with the same nonzero pattern as that given to the call to {\tt klu\_factor} which created it. The same pivot order is used. Since this can lead to numeric instability, the use of {\tt klu\_rcond}, {\tt klu\_rgrowth}, or {\tt klu\_condest} is recommended to check the accuracy of the resulting factorization. The inputs {\tt Ap} and {\tt Ai} must be unmodified since the call to {\tt klu\_factor} that first created the {\tt Numeric} object. This is function is much faster than {\tt klu\_factor}, and requires no dynamic memory allocation. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; int32_t Ap [n+1], Ai [nz] ; double Ax [nz], Az [2*nz] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int ok ; int64_t Ap [n+1], Ai [nz] ; double Ax [nz], Az [2*nz] ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_refactor (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_refactor (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_free\_symbolic}: destroy the {\tt Symbolic} object} %------------------------------------------------------------------------------- It is the user's responsibility to destroy the {\tt Symbolic} object when it is no longer needed, or else a memory leak will occur. It is safe to pass a NULL {\tt Symbolic} pointer. These functions may be used for both real and complex cases. {\footnotesize \begin{verbatim} #include "klu.h" klu_symbolic *Symbolic ; klu_common Common ; klu_free_symbolic (&Symbolic, &Common) ; /* real or complex */ #include "klu.h" klu_l_symbolic *Symbolic ; klu_l_common Common ; klu_l_free_symbolic (&Symbolic, &Common) ; /* real or complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_free\_numeric}: destroy the {\tt Numeric} object} %------------------------------------------------------------------------------- It is the user's responsibility to destroy the {\tt Numeric} object when it is no longer needed, or else a memory leak will occur. It is safe to pass a NULL {\tt Numeric} pointer. {\footnotesize \begin{verbatim} #include "klu.h" klu_numeric *Numeric ; klu_common Common ; klu_free_numeric (&Numeric, &Common) ; /* real */ klu_z_free_numeric (&Numeric, &Common) ; /* complex */ #include "klu.h" klu_l_numeric *Numeric ; klu_l_common Common ; klu_l_free_numeric (&Numeric, &Common) ; /* real */ klu_zl_free_numeric (&Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_sort}: sort the columns of L and U} %------------------------------------------------------------------------------- The {\tt klu\_factor} function creates a {\tt Numeric} object with factors {\tt L} and {\tt U} stored in a compressed-column form (not the same data structure as {\tt A}, but similar). The columns typically contain lists of row indices in unsorted order. This function sorts these indices, for two purposes: (1) to return {\tt L} and {\tt U} to MATLAB, which expects its sparse matrices to have sorted columns, and (2) to slightly improve the performance of subsequent calls to {\tt klu\_solve} and {\tt klu\_tsolve}. Except within a MATLAB mexFunction (see {\tt KLU/MATLAB/klu\_mex.c}, the use of this function is optional. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_sort (Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_sort (Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int ok ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_sort (Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_sort (Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_flops}: determine the flop count} %------------------------------------------------------------------------------- This function determines the number of floating-point operations performed when the matrix was factorized by {\tt klu\_factor} or {\tt klu\_refactor}. The result is returned in {\tt Common.flops}. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_flops (Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_flops (Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int ok ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_flops (Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_flops (Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_rgrowth}: determine the pivot growth} %------------------------------------------------------------------------------- Computes the reciprocal pivot growth, $\mbox{\em rgrowth} = \min_j (( \max_i |c_{ij}| ) / ( \max_i |u_{ij}| ))$, where $c_{ij}$ is a scaled entry in a diagonal block of the block triangular form. In MATLAB notation: \begin{verbatim} rgrowth = min (max (abs (R\A(p,q) - F)) ./ max (abs (U))) \end{verbatim} where the factorization is \verb'L*U + F = R \ A(p,q)'. This function returns 0 if an error occurred, 1 otherwise. If {\tt rgrowth} is very small, an inaccurate factorization may have been performed. The inputs {\tt Ap}, {\tt Ai}, and {\tt Ax} ({\tt Az} in the complex case) must be unchanged since the last call to {\tt klu\_factor} or {\tt klu\_refactor}. The result is returned in {\tt Common.rgrowth}. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; int32_t Ap [n+1], Ai [nz] ; double Ax [nz], Az [2*nz] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int ok ; int64_t Ap [n+1], Ai [nz] ; double Ax [nz], Az [2*nz] ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_rgrowth (Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_rgrowth (Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_condest}: accurate condition number estimation} %------------------------------------------------------------------------------- This function is essentially the same as the MATLAB {\tt condest} function. It computes an estimate of the 1-norm condition number, using Hager's method \cite{Hager84} and the generalization by Higham and Tisseur \cite{HighamTisseur00}. The inputs {\tt Ap}, and {\tt Ax} ({\tt Az} in the complex case) must be unchanged since the last call to {\tt klu\_factor} or {\tt klu\_refactor}. The result is returned in {\tt Common.condest}. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; int32_t Ap [n+1] ; double Ax [nz], Az [2*nz] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_condest (Ap, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_condest (Ap, Az, Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int ok ; int64_t Ap [n+1] ; double Ax [nz], Az [2*nz] ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_condest (Ap, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_condest (Ap, Az, Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_rcond}: cheap reciprocal condition number estimation} %------------------------------------------------------------------------------- This function returns the smallest diagonal entry of {\tt U} divided by the largest, which is a very crude estimate of the reciprocal of the condition number of the matrix {\tt A}. It is very cheap to compute, however. In MATLAB notation, {\tt rcond = min(abs(diag(U))) / max(abs(diag(U)))}. If the matrix is singular, {\tt rcond} will be zero. The result is returned in {\tt Common.rcond}. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_rcond (Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_rcond (Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int ok ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_rcond (Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_rcond (Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_scale}: scale and check a sparse matrix} %------------------------------------------------------------------------------- This function computes the row scaling factors of a matrix and checks to see if it is a valid sparse matrix. It can perform two kinds of scaling, computing either the largest magnitude in each row, or the sum of the magnitudes of the entries each row. KLU calls this function itself, depending upon the {\tt Common.scale} parameter, where {\tt scale < 0} means no scaling, {\tt scale=1} means the sum, and {\tt scale=2} means the maximum. That is, in MATLAB notation, {\tt Rs = sum(abs(A'))} or {\tt Rs = max(abs(A'))}. KLU then divides each row of {\tt A} by its corresponding scale factor. The function returns 0 if the matrix is invalid, or 1 otherwise. A valid sparse matrix must meet the following conditions: \begin{enumerate} \item {\tt n > 0}. Note that KLU does not handle empty (0-by-0) matrices. \item {\tt Ap}, {\tt Ai}, and {\tt Ax} ({\tt Az} for the complex case) must not be NULL. \item {\tt Ap[0]=0}, and {\tt Ap [j] <= Ap [j+1]} for all {\tt j} in the range 0 to {\tt n-1}. \item The row indices in each column, {\tt Ai [Ap [j] ... Ap [j+1]-1]}, must be in the range 0 to {\tt n-1}, and no duplicates can appear. If the workspace {\tt W} is NULL on input, the check for duplicate entries is skipped. \end{enumerate} {\footnotesize \begin{verbatim} #include "klu.h" int scale, ok ; int32_t n, Ap [n+1], Ai [nz], W [n] ; double Ax [nz], Az [2*nz], Rs [n] ; klu_common Common ; ok = klu_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_z_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ #include "klu.h" int scale, ok ; int64_t n, Ap [n+1], Ai [nz], W [n] ; double Ax [nz], Az [2*nz], Rs [n] ; klu_l_common Common ; ok = klu_l_scale (scale, n, Ap, Ai, Ax, Symbolic, Numeric, &Common) ; /* real */ ok = klu_zl_scale (scale, n, Ap, Ai, Az, Symbolic, Numeric, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_extract}: extract the LU factorization} %------------------------------------------------------------------------------- This function extracts the LU factorization into a set of data structures suitable for passing back to MATLAB, with matrices in conventional compressed-column form. The {\tt klu\_sort} function should be called first if the row indices should be returned sorted. The factorization is returned in caller-provided arrays; if any of them are NULL, that part of the factorization is not extracted (this is not an error). Returns 1 if successful, 0 otherwise. The sizes of {\tt Li}, {\tt Lx}, and {\tt Lz} are {\tt Numeric->lnz}, {\tt Ui}, {\tt Ux}, and {\tt Uz} are of size {\tt Numeric->unz}, and {\tt Fi}, {\tt Fx}, and {\tt Fz} are of size {\tt Numeric->nzoff}. Note that in the complex versions, the real and imaginary parts are returned in separate arrays, to be compatible with how MATLAB stores complex matrices. This function is not required to solve a linear system with KLU. KLU does not itself make use of the extracted LU factorization returned by this function. It is only provided to simplify the MATLAB interface to KLU, and it may be of use to the end user who wishes to examine the contents of the LU factors. {\footnotesize \begin{verbatim} #include "klu.h" int ok ; int32_t Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1], Fi [nzoff], P [n], Q [n], R [n] ; double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ; klu_symbolic *Symbolic ; klu_numeric *Numeric ; klu_common Common ; ok = klu_extract (Numeric, Symbolic, Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ; /* real */ ok = klu_z_extract (Numeric, Symbolic, Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ; /* complex */ #include "klu.h" int ok ; int64_t Lp [n+1], Li [lnz], Up [n+1], Ui [unz], Fp [n+1], Fi [nzoff], P [n], Q [n], R [n] ; double Lx [lnz], Lz [lnz], Ux [unz], Uz [unz], Fx [nzoff], Fz [nzoff], Rs [n] ; klu_l_symbolic *Symbolic ; klu_l_numeric *Numeric ; klu_l_common Common ; ok = klu_l_extract (Numeric, Symbolic, Lp, Li, Lx, Up, Ui, Ux, Fp, Fi, Fx, P, Q, Rs, R, &Common) ; /* real */ ok = klu_zl_extract (Numeric, Symbolic, Lp, Li, Lx, Lz, Up, Ui, Ux, Uz, Fp, Fi, Fx, Fz, P, Q, Rs, R, &Common) ; /* complex */ \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt klu\_malloc}, {\tt klu\_free}, {\tt klu\_realloc}: memory management} %------------------------------------------------------------------------------- KLU uses a set of wrapper routines for {\tt malloc}, {\tt free}, and {\tt realloc}. By default, these wrapper routines call the ANSI C versions of these functions. However, pointers to functions in {\tt Common} can be modified after calling {\tt klu\_defaults} to allow the use of other memory management functions (such as the MATLAB {\tt mxMalloc}, {\tt mxFree}, and {\tt mxRealloc}. These wrapper functions keep track of the current and peak memory usage of KLU. They can be called by the user. {\tt klu\_malloc} is essentially the same as {\tt p = malloc (n * sizeof (size))}, {\tt klu\_free} is essentially the same as {\tt free(p)} except that {\tt klu\_free} returns NULL which can then be assigned to {\tt p}. {\tt klu\_realloc} is similar to {\tt realloc}, except that if the reallocation fails, {\tt p} is returned unchanged. Failure conditions are returned in {\tt Common.status}. {\footnotesize \begin{verbatim} #include "klu.h" size_t n, nnew, nold, size ; void *p ; klu_common Common ; p = klu_malloc (n, size, &Common) ; p = klu_free (p, n, size, &Common) ; p = klu_realloc (nnew, nold, size, p, &Common) ; #include "klu.h" size_t n, nnew, nold, size ; void *p ; klu_l_common Common ; p = klu_l_malloc (n, size, &Common) ; p = klu_l_free (p, n, size, &Common) ; p = klu_l_realloc (nnew, nold, size, p, &Common) ; \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt btf\_maxtrans}: maximum transversal} %------------------------------------------------------------------------------- The BTF package includes three user-callable functions (each with \verb'int32_t' and \verb'int64_t' versions). They do not need to be called directly by an application that uses KLU. KLU will call these functions to perform the permutation into upper block triangular form. The {\tt btf\_maxtrans} function finds a column permutation {\tt Q} that gives {\tt A*Q} a zero-free diagonal, if one exists. If row {\tt i} is matched to column {\tt j}, then {\tt Match[i]=j}. If the matrix is structurally singular, there will be some unmatched rows. If row {\tt i} is unmatched, then {\tt Match[i]=-1}. If the matrix is square and structurally non-singular, then {\tt Q=Match} is the column permutation. The {\tt btf\_maxtrans} function can accept as input a rectangular matrix; it operates on the bipartite graph of {\tt A}. It returns the number of columns matched. Unlike the KLU user-callable functions, the BTF functions do not check its inputs at all; a segmentation fault will occur if any input pointers are NULL, for example. The function can require up to $O$({\tt n*nnz(A)}) time (excluding the {\em cheap match} phase, which takes another $O$({\tt nnz(A)}) time. If {\tt maxwork > 0} on input, the work is limited to $O$({\tt maxwork*nnz(A)}) (excluding the cheap match), but the maximum transversal might not be found if the limit is reached. The {\tt Work} array is workspace required by the methods; its contents are undefined on input and output. {\footnotesize \begin{verbatim} int32_t nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ; double maxwork, work ; nmatch = btf_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ; int64_t nrow, ncol, Ap [ncol+1], Ai [nz], Match [nrow], Work [5*ncol], nmatch ; double maxwork, work ; nmatch = btf_l_maxtrans (nrow, ncol, Ap, Ai, maxwork, &work, Match, Work) ; \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt btf\_strongcomp}: strongly connected components} %------------------------------------------------------------------------------- The {\tt btf\_strongcomp} function finds the strongly connected components of a directed graph, returning a symmetric permutation {\tt P}. The matrix {\tt A} must be square. The diagonal of {\tt A} (or {\tt A*Q} if a column permutation is given on input) is ignored. If {\tt Q} is NULL on input, the matrix {\tt P*A*P'} is in upper block triangular form. Otherwise, {\tt Q} is modified on output so that {\tt P*A*Q} is in upper block triangular form. The vector {\tt R} gives the block boundaries, where the {\tt k}th block consists of rows and columns {\tt R[k]} through {\tt R[k+1]-1} in the permuted matrix. The function returns the number of strongly connected components found (the diagonal blocks in the block triangular form). {\footnotesize \begin{verbatim} int32_t n, Ap [n+1], Ai [nz], Q [n], P [n], R [n+1], Work [4*n], ncomp ; ncomp = btf_strongcomp (n, Ap, Ai, Q, P, R, Work) ; int64_t n, Ap [n+1], Ai [nz], Q [n], P [n], R [n+1], Work [4*n], ncomp ; ncomp = btf_l_strongcomp (n, Ap, Ai, Q, P, R, Work) ; \end{verbatim} } %------------------------------------------------------------------------------- \subsection{{\tt btf\_order}: permutation to block triangular form} %------------------------------------------------------------------------------- The {\tt btf\_order} function combines the above two functions, first finding a maximum transversal and then permuting the resulting matrix into upper block triangular form. Unlike {\tt dmperm} in MATLAB, it always reveals the maximum matching along the diagonal, even if the matrix is structurally singular. On output, {\tt P} and {\tt Q} are the row and column permutations, where {\tt i = P[k]} if row {\tt i} of {\tt A} is the {\tt k}th row of {\tt P*A*Q}, and {\tt j = BTF\_UNFLIP(Q[k])} if column {\tt j} of {\tt A} is the {\tt k}th column of {\tt P*A*Q}. If {\tt Q[k] < 0}, then the {\tt (k,k)}th entry in {\tt P*A*Q} is structurally zero. The vector {\tt R}, and the return value, are the same as {\tt btf\_strongcomp}. {\footnotesize \begin{verbatim} int32_t n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ; double maxwork, work ; ncomp = btf_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ; int64_t n, Ap [n+1], Ai [nz], P [n], Q [n], R [n+1], nfound, Work [5*n], ncomp, nfound ; double maxwork, work ; ncomp = btf_l_order (n, Ap, Ai, maxwork, &work, P, Q, R, &nfound, Work) ; \end{verbatim} } %------------------------------------------------------------------------------ \subsection{Sample C programs that use KLU} %------------------------------------------------------------------------------ Here is a simple main program, {\tt klu\_simple.c}, that illustrates the basic usage of KLU. It uses KLU, and indirectly makes use of BTF and AMD. COLAMD is required to compile the demo, but it is not called by this example. It uses statically defined global variables for the sparse matrix {\tt A}, which would not be typical of a complete application. It just makes for a simpler example. {\footnotesize \input{klu_simple_c.tex} } The {\tt Ap}, {\tt Ai}, and {\tt Ax} arrays represent the matrix \[ A = \left[ \begin{array}{ccccc} 2 & 3 & 0 & 0 & 0 \\ 3 & 0 & 4 & 0 & 6 \\ 0 & -1 & -3 & 2 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 4 & 2 & 0 & 1 \\ \end{array} \right]. \] The solution to $Ax=b$ is $x = [1 \, 2 \, 3 \, 4 \, 5]^T$. The program uses default control settings (no scaling, permutation to block triangular form, and the AMD ordering). It ignores the error codes in the return values and {\tt Common.status}. The block triangular form found by {\tt btf\_order} for this matrix is given below \[ PAQ = \left[ \begin{array}{c|ccc|c} 2 & 0 & 0 & -1 & -3 \\ \hline & 2 & 0 & 3 & 0 \\ & 3 & 6 & 0 & 4 \\ & 0 & 1 & 4 & 1 \\ \hline & & & & 1 \\ \end{array} \right]. \] This ordering is not modified by the AMD ordering because the 3-by-3 matrix $A_{22} + A_{22}^T$ happens to be a dense matrix. No partial pivoting happens to occur during LU factorization; all pivots are selected along the diagonal of each block. The matrix contains two singletons, which are the original entries $a_{34}=2$ and $a_{43}=1$, and one 3-by-3 diagonal block (in which a single fill-in entry occurs during factorization: the $u_{23}$ entry of this 3-by-3 matrix). For a more complete program that uses KLU, see {\tt KLU/Demo/kludemo.c} for an \verb'int32_t' version, and {\tt KLU/Demo/kluldemo.c} for a version that uses \verb'int64_t' instead. The top-level main routine uses CHOLMOD to read in a compressed-column sparse matrix from a Matrix Market file, because KLU does not include such a function. Otherwise, no CHOLMOD functions are used. Unlike {\tt klu\_simple.c}, CHOLMOD is required to run the {\tt kludemo.c} and {\tt kluldemo.c} programs. %------------------------------------------------------------------------------ \section{Installation} \label{Install} %------------------------------------------------------------------------------ Installation of the C-callable interface requires the {\tt cmake} utility. The MATLAB installation in any platform, including Windows is simple; just type {\tt klu\_install} to compile and install KLU, BTF, AMD, and COLAMD. An optional \verb'Makefile' is provided to simplify the use of \verb'cmake'. To compile and install the C-callable KLU, BTF, AMD, and COLAMD libraries, go to the {\tt SuiteSparse} directory and type {\tt make}. The KLU and BTF libraries are placed in {\tt KLU/build/libklu.*} and {\tt BTF/build/libbtf.*}. Two KLU demo programs will be compiled and tested in the {\tt KLU/Demo} directory. You can compare the output of {\tt make} with the results in the KLU distribution, {\tt kludemo.out}. Typing {\tt make clean} will remove all but the final compiled libraries and demo programs. Typing {\tt make purge} or {\tt make distclean} removes all files not in the original distribution. When you compile your program that uses the C-callable KLU library, you need to add the {\tt KLU/build/libklu.*}, {\tt BTF/build/libbtf.*}, {\tt AMD/build/libamd.*}, and {\tt COLAMD/build/libcolamd.*} libraries, and you need to tell your compiler to look in the {\tt KLU/Include} and {\tt BTF/Include} directory for include files. If using \verb'cmake', each package includes scripts for \verb'find_library'. Alternatively, do {\tt make install}, and KLU will be installed (on Linux/Mac) in /usr/local/lib and /usr/local/include, and documentation is placed in /usr/local/doc. These installation locations can be changed; see {\tt SuiteSparse/README.txt} for details. To install in \verb'SuiteSparse/lib' and \verb'SuiteSparse/include', use \verb'make local ; make install'. If all you want to use is the KLU mexFunction in MATLAB, you can skip the use of the {\tt make} command entirely. Simply type {\tt klu\_install} in the MATLAB command window while in the {\tt KLU/MATLAB} directory. This works on any system with MATLAB, including Windows. %------------------------------------------------------------------------------ \newpage \section{The KLU routines} \label{klu_include} %------------------------------------------------------------------------------ The file {\tt KLU/Include/klu.h} listed below describes each user-callable routine in the C version of KLU, and gives details on their use. {\footnotesize \input{klu_h.tex} } %------------------------------------------------------------------------------ \newpage \section{The BTF routines} \label{btf_include} %------------------------------------------------------------------------------ The file {\tt BTF/Include/btf.h} listed below describes each user-callable routine in the C version of BTF, and gives details on their use. {\footnotesize \input{btf_h.tex} } %------------------------------------------------------------------------------ \newpage % References %------------------------------------------------------------------------------ \bibliographystyle{plain} \bibliography{KLU_UserGuide} \end{document}