\input texinfo @c -*-texinfo-*- @c %**start of header @setfilename gsl-design.info @settitle GNU Scientific Library @finalout @c -@setchapternewpage odd @c %**end of header @dircategory Scientific software @direntry * GSL-design: (GSL-design). GNU Scientific Library -- Design @end direntry @comment @include version-design.texi @set GSL @i{GNU Scientific Library} @ifinfo This file documents the @value{GSL}. Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2004 The GSL Project. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ``GNU Free Documentation License''. @end ifinfo @titlepage @title GNU Scientific Library -- Design document @comment @subtitle Edition @value{EDITION}, for gsl Version @value{VERSION} @comment @subtitle @value{UPDATED} @author Mark Galassi Los Alamos National Laboratory @author James Theiler Astrophysics and Radiation Measurements Group, Los Alamos National Laboratory @author Brian Gough Network Theory Limited @page @vskip 0pt plus 1filll Copyright @copyright{} 1996,1997,1998,1999,2000,2001,2004 The GSL Project. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. @end titlepage @contents @node Top, Motivation, (dir), (dir) @top About GSL @ifinfo This file documents the design of @value{GSL}, a collection of numerical routines for scientific computing. More information about GSL can be found at the project homepage, @uref{http://www.gnu.org/software/gsl/}. @end ifinfo The @value{GSL} is a library of scientific subroutines. It aims to provide a convenient interface to routines that do standard (and not so standard) tasks that arise in scientific research. More than that, it also provides the source code. Users are welcome to alter, adjust, modify, and improve the interfaces and/or implementations of whichever routines might be needed for a particular purpose. GSL is intended to provide a free equivalent to existing proprietary numerical libraries written in C or Fortran, such as NAG, IMSL's CNL, IBM's ESSL, and SGI's SCSL. The target platform is a low-end desktop workstation. The goal is to provide something which is generally useful, and the library is aimed at general users rather than specialists. @menu * Motivation:: * Contributing:: * Design:: * Bibliography:: * Copying:: * GNU Free Documentation License:: @end menu @node Motivation, Contributing, Top, Top @chapter Motivation @cindex numerical analysis @cindex free software There is a need for scientists and engineers to have a numerical library that: @itemize @bullet @item is free (in the sense of freedom, not in the sense of gratis; see the GNU General Public License), so that people can use that library, redistribute it, modify it @dots{} @item is written in C using modern coding conventions, calling conventions, scoping @dots{} @item is clearly and pedagogically documented; preferably with TeXinfo, so as to allow online info, WWW and TeX output. @item uses top quality state-of-the-art algorithms. @item is portable and configurable using @emph{autoconf} and @emph{automake}. @item basically, is GNUlitically correct. @end itemize There are strengths and weaknesses with existing libraries: @emph{Netlib} (http://www.netlib.org/) is probably the most advanced set of numerical algorithms available on the net, maintained by AT&T. Unfortunately most of the software is written in Fortran, with strange calling conventions in many places. It is also not very well collected, so it is a lot of work to get started with netlib. @emph{GAMS} (http://gams.nist.gov/) is an extremely well organized set of pointers to scientific software, but like netlib, the individual routines vary in their quality and their level of documentation. @emph{Numerical Recipes} (http://www.nr.com, http://cfata2.harvard.edu/nr/) is an excellent book: it explains the algorithms in a very clear way. Unfortunately the authors released the source code under a license which allows you to use it, but prevents you from re-distributing it. Thus Numerical Recipes is not @emph{free} in the sense of @emph{freedom}. On top of that, the implementation suffers from @emph{fortranitis} and other limitations. [http://www.lysator.liu.se/c/num-recipes-in-c.html] @emph{SLATEC} is a large public domain collection of numerical routines in Fortran written under a Department of Energy program in the 1970's. The routines are well tested and have a reasonable overall design (given the limitations of that era). GSL should aim to be a modern version of SLATEC. @emph{NSWC} is the Naval Surface Warfare Center numerical library. It is a large public-domain Fortran library, containing a lot of high-quality code. Documentation for the library is hard to find, only a few photocopies of the printed manual are still in circulation. @emph{NAG} and @emph{IMSL} both sell high-quality libraries which are proprietary. The NAG library is more advanced and has wider scope than IMSL. The IMSL library leans more towards ease-of-use and makes extensive use of variable length argument lists to emulate "default arguments". @emph{ESSL} and @emph{SCSL} are proprietary libraries from IBM and SGI. @emph{Forth Scientific Library} [see the URL http://www.taygeta.com/fsl/sciforth.html]. Mainly of interest to Forth users. @emph{Numerical Algorithms with C} G. Engeln-Mullges, F. Uhlig. A nice numerical library written in ANSI C with an accompanying textbook. Source code is available but the library is not free software. @emph{NUMAL} A C version of the NUMAL library has been written by H.T. Lau and is published as a book and disk with the title "A Numerical Library in C for Scientists and Engineers". Source code is available but the library is not free software. @emph{C Mathematical Function Handbook} by Louis Baker. A library of function approximations and methods corresponding to those in the "Handbook of Mathematical Functions" by Abramowitz and Stegun. Source code is available but the library is not free software. @emph{CCMATH} by Daniel A. Atkinson. A C numerical library covering similar areas to GSL. The code is quite terse. Earlier versions were under the GPL but unfortunately it has changed to the LGPL in recent versions. @emph{CEPHES} A useful collection of high-quality special functions written in C. Not GPL'ed. @emph{WNLIB} A small collection of numerical routines written in C by Will Naylor. Public domain. @emph{MESHACH} A comprehensive matrix-vector linear algebra library written in C. Freely available but not GPL'ed (non-commercial license). @emph{CERNLIB} is a large high-quality Fortran library developed at CERN over many years. It was originally non-free software but has recently been released under the GPL. @emph{COLT} is a free numerical library in Java developed at CERN by Wolfgang Hoschek. It is under a BSD-style license. The long-term goal will be to provide a framework to which the real numerical experts (or their graduate students) will contribute. @node Contributing, Design, Motivation, Top @chapter Contributing This design document was originally written in 1996. As of 2004, GSL itself is essentially feature complete, the developers are not actively working on any major new functionality. The main emphasis is now on ensuring the stability of the existing functions, improving consistency, tidying up a few problem areas and fixing any bugs that are reported. Potential contributors are encouraged to gain familiarity with the library by investigating and fixing known problems listed in the @file{BUGS} file in the CVS repository. Adding large amounts of new code is difficult because it leads to differences in the maturity of different parts of the library. To maintain stability, any new functionality is encouraged as @dfn{packages}, built on top of GSL and maintained independently by the author, as in other free software projects (such as the Perl CPAN archive and TeX CTAN archive, etc). @menu * Packages:: @end menu @node Packages, , Contributing, Contributing @section Packages The design of GSL permits extensions to be used alongside the existing library easily by simple linking. For example, additional random number generators can be provided in a separate library: @example $ tar xvfz rngextra-0.1.tar.gz $ cd rngextra-0.1 $ ./configure; make; make check; make install $ ... $ gcc -Wall main.c -lrngextra -lgsl -lgslcblas -lm @end example The points below summarise the package design guidelines. These are intended to ensure that packages are consistent with GSL itself, to make life easier for the end-user and make it possible to distribute popular well-tested packages as part of the core GSL in future. @itemize @bullet @item Follow the GSL and GNU coding standards described in this document This means using the standard GNU packaging tools, such as Automake, providing documentation in Texinfo format, and a test suite. The test suite should run using @samp{make check}, and use the test functions provided in GSL to produce the output with @code{PASS:}/@code{FAIL:} lines. It is not essential to use libtool since packages are likely to be small, a static library is sufficient and simpler to build. @item Use a new unique prefix for the package (do not use @samp{gsl_} -- this is reserved for internal use). For example, a package of additional random number generators might use the prefix @code{rngextra}. @example #include gsl_rng * r = gsl_rng_alloc (rngextra_lsfr32); @end example @item Use a meaningful version number which reflects the state of development Generally, @code{0.x} are alpha versions, which provide no guarantees. Following that, @code{0.9.x} are beta versions, which should be essentially complete, subject only to minor changes and bug fixes. The first major release is @code{1.0}. Any version number of @code{1.0} or higher should be suitable for production use with a well-defined API. The API must not change in a major release and should be backwards-compatible in its behavior (excluding actual bug-fixes), so that existing code do not have to be modified. Note that the API includes all exported definitions, including data-structures defined with @code{struct}. If you need to change the API in a package, it requires a new major release (e.g. @code{2.0}). @item Use the GNU General Public License (GPL) Follow the normal procedures of obtaining a copyright disclaimer if you would like to have the package considered for inclusion in GSL itself in the future (@pxref{Legal issues}). @end itemize Post announcements of your package releases to @email{gsl-discuss@@sourceware.org} so that information about them can be added to the GSL webpages. For security, sign your package with GPG (@code{gpg --detach-sign @var{file}}). An example package @samp{rngextra} containing two additional random number generators can be found at @url{http://www.network-theory.co.uk/download/rngextra/}. @node Design, Bibliography, Contributing, Top @chapter Design @menu * Language for implementation:: * Interface to other languages:: * What routines are implemented:: * What routines are not implemented:: * Design of Numerical Libraries:: * Code Reuse:: * Standards and conventions:: * Background and Preparation:: * Choice of Algorithms:: * Documentation:: * Namespace:: * Header files:: * Target system:: * Function Names:: * Object-orientation:: * Comments:: * Minimal structs:: * Algorithm decomposition:: * Memory allocation and ownership:: * Memory layout:: * Linear Algebra Levels:: * Error estimates:: * Exceptions and Error handling:: * Persistence:: * Using Return Values:: * Variable Names:: * Datatype widths:: * size_t:: * Arrays vs Pointers:: * Pointers:: * Constness:: * Pseudo-templates:: * Arbitrary Constants:: * Test suites:: * Compilation:: * Thread-safety:: * Legal issues:: * Non-UNIX portability:: * Compatibility with other libraries:: * Parallelism:: * Precision:: * Miscellaneous:: @end menu @node Language for implementation, Interface to other languages, Design, Design @section Language for implementation @strong{One language only (C)} Advantages: simpler, compiler available and quite universal. @node Interface to other languages, What routines are implemented, Language for implementation, Design @section Interface to other languages Wrapper packages are supplied as "extra" packages; not as part of the "core". They are maintained separately by independent contributors. Use standard tools to make wrappers: swig, g-wrap @node What routines are implemented, What routines are not implemented, Interface to other languages, Design @section What routines are implemented Anything which is in any of the existing libraries. Obviously it makes sense to prioritize and write code for the most important areas first. @c @itemize @bullet @c @item Random number generators @c Includes both random number generators and routines to give various @c interesting distributions. @c @item Statistics @c @item Special Functions @c What I (jt) envision for this section is a collection of routines for @c reliable and accurate (but not necessarily fast or efficient) estimation @c of values for special functions, explicitly using Taylor series, asymptotic @c expansions, continued fraction expansions, etc. As well as these routines, @c fast approximations will also be provided, primarily based on Chebyshev @c polynomials and ratios of polynomials. In this vision, the approximations @c will be the "standard" routines for the users, and the exact (so-called) @c routines will be used for verification of the approximations. It may also @c be useful to provide various identity-checking routines as part of the @c verification suite. @c @item Curve fitting @c polynomial, special functions, spline @c @item Ordinary differential equations @c @item Partial differential equations @c @item Fourier Analysis @c @item Wavelets @c @item Matrix operations: linear equations @c @item Matrix operations: eigenvalues and spectral analysis @c @item Matrix operations: any others? @c @item Direct integration @c @item Monte Carlo methods @c @item Simulated annealing @c @item Genetic algorithms @c We need to think about what kinds of algorithms are basic generally @c useful numerical algorithms, and which ones are special purpose @c research projects. We should concentrate on supplying the former. @c @item Cellular automata @c @end itemize @node What routines are not implemented, Design of Numerical Libraries, What routines are implemented, Design @section What routines are not implemented @itemize @bullet @item anything which already exists as a high-quality GPL'ed package. @item anything which is too big -- i.e. an application in its own right rather than a subroutine For example, partial differential equation solvers are often huge and very specialized applications (since there are so many types of PDEs, types of solution, types of grid, etc). This sort of thing should remain separate. It is better to point people to the good applications which exist. @item anything which is independent and useful separately. Arguably functions for manipulating date and time, or financial functions might be included in a "scientific" library. However, these sorts of modules could equally well be used independently in other programs, so it makes sense for them to be separate libraries. @end itemize @node Design of Numerical Libraries, Code Reuse, What routines are not implemented, Design @section Design of Numerical Libraries In writing a numerical library there is a unavoidable conflict between completeness and simplicity. Completeness refers to the ability to perform operations on different objects so that the group is "closed". In mathematics objects can be combined and operated on in an infinite number of ways. For example, I can take the derivative of a scalar field with respect to a vector and the derivative of a vector field wrt.@: a scalar (along a path). There is a definite tendency to unconsciously try to reproduce all these possibilities in a numerical library, by adding new features one by one. After all, it is always easy enough to support just one more feature @dots{} so why not? Looking at the big picture, no-one would start out by saying "I want to be able to represent every possible mathematical object and operation using C structs" -- this is a strategy which is doomed to fail. There is a limited amount of complexity which can be represented in a programming language like C. Attempts to reproduce the complexity of mathematics within such a language would just lead to a morass of unmaintainable code. However, it's easy to go down that road if you don't think about it ahead of time. It is better to choose simplicity over completeness. In designing new parts of the library keep modules independent where possible. If interdependencies between modules are introduced be sure about where you are going to draw the line. @node Code Reuse, Standards and conventions, Design of Numerical Libraries, Design @section Code Reuse It is useful if people can grab a single source file and include it in their own programs without needing the whole library. Try to allow standalone files like this whenever it is reasonable. Obviously the user might need to define a few macros, such as GSL_ERROR, to compile the file but that is ok. Examples where this can be done: grabbing a single random number generator. @node Standards and conventions, Background and Preparation, Code Reuse, Design @section Standards and conventions The people who kick off this project should set the coding standards and conventions. In order of precedence the standards that we follow are, @itemize @bullet @item We follow the GNU Coding Standards. @item We follow the conventions of the ANSI Standard C Library. @item We follow the conventions of the GNU C Library. @item We follow the conventions of the glib GTK support Library. @end itemize The references for these standards are the @cite{GNU Coding Standards} document, Harbison and Steele @cite{C: A Reference Manual}, the @cite{GNU C Library Manual} (version 2), and the Glib source code. For mathematical formulas, always follow the conventions in Abramowitz & Stegun, the @cite{Handbook of Mathematical Functions}, since it is the definitive reference and also in the public domain. If the project has a philosophy it is to "Think in C". Since we are working in C we should only do what is natural in C, rather than trying to simulate features of other languages. If there is something which is unnatural in C and has to be simulated then we avoid using it. If this means leaving something out of the library, or only offering a limited version then so be it. It is not worthwhile making the library over-complicated. There are numerical libraries in other languages, and if people need the features of those languages it would be sensible for them to use the corresponding libraries, rather than coercing a C library into doing that job. It should be borne in mind at all time that C is a macro-assembler. If you are in doubt about something being too complicated ask yourself the question "Would I try to write this in macro-assembler?" If the answer is obviously "No" then do not try to include it in GSL. [BJG] It will be useful to read the following papers, @itemize @w{} @item Kiem-Phong Vo, ``The Discipline and Method Architecture for Reusable Libraries'', Software - Practice & Experience, v.30, pp.107-128, 2000. @end itemize @noindent It is available from @uref{http://www.research.att.com/sw/tools/sfio/dm-spe.ps} or the earlier technical report Kiem-Phong Vo, "An Architecture for Reusable Libraries" @uref{http://citeseer.nj.nec.com/48973.html}. There are associated papers on Vmalloc, SFIO, and CDT which are also relevant to the design of portable C libraries. @itemize @w{} @item Kiem-Phong Vo, ``Vmalloc: A General and Efficient Memory Allocator''. Software Practice & Experience, 26:1--18, 1996. @uref{http://www.research.att.com/sw/tools/vmalloc/vmalloc.ps} @item Kiem-Phong Vo. ``Cdt: A Container Data Type Library''. Soft. Prac. & Exp., 27:1177--1197, 1997 @uref{http://www.research.att.com/sw/tools/cdt/cdt.ps} @item David G. Korn and Kiem-Phong Vo, ``Sfio: Safe/Fast String/File IO'', Proceedings of the Summer '91 Usenix Conference, pp. 235-256, 1991. @uref{http://citeseer.nj.nec.com/korn91sfio.html} @end itemize Source code should be indented according to the GNU Coding Standards, with spaces not tabs. For example, by using the @code{indent} command: @example indent -gnu -nut *.c *.h @end example @noindent The @code{-nut} option converts tabs into spaces. @node Background and Preparation, Choice of Algorithms, Standards and conventions, Design @section Background and Preparation Before implementing something be sure to research the subject thoroughly! This will save a lot of time in the long-run. The two most important steps are, @enumerate @item to determine whether there is already a free library (GPL or GPL-compatible) which does the job. If so, there is no need to reimplement it. Carry out a search on Netlib, GAMs, na-net, sci.math.num-analysis and the web in general. This should also provide you with a list of existing proprietary libraries which are relevant, keep a note of these for future reference in step 2. @item make a comparative survey of existing implementations in the commercial/free libraries. Examine the typical APIs, methods of communication between program and subroutine, and classify them so that you are familiar with the key concepts or features that an implementation may or may not have, depending on the relevant tradeoffs chosen. Be sure to review the documentation of existing libraries for useful references. @item read up on the subject and determine the state-of-the-art. Find the latest review papers. A search of the following journals should be undertaken. @itemize @w{} @item ACM Transactions on Mathematical Software @item Numerische Mathematik @item Journal of Computation and Applied Mathematics @item Computer Physics Communications @item SIAM Journal of Numerical Analysis @item SIAM Journal of Scientific Computing @end itemize @end enumerate @noindent Keep in mind that GSL is not a research project. Making a good implementation is difficult enough, without also needing to invent new algorithms. We want to implement existing algorithms whenever possible. Making minor improvements is ok, but don't let it be a time-sink. @node Choice of Algorithms, Documentation, Background and Preparation, Design @section Choice of Algorithms Whenever possible choose algorithms which scale well and always remember to handle asymptotic cases. This is particularly relevant for functions with integer arguments. It is tempting to implement these using the simple @math{O(n)} algorithms used to define the functions, such as the many recurrence relations found in Abramowitz and Stegun. While such methods might be acceptable for @math{n=O(10-100)} they will not be satisfactory for a user who needs to compute the same function for @math{n=1000000}. Similarly, do not make the implicit assumption that multivariate data has been scaled to have components of the same size or O(1). Algorithms should take care of any necessary scaling or balancing internally, and use appropriate norms (e.g. |Dx| where D is a diagonal scaling matrix, rather than |x|). @node Documentation, Namespace, Choice of Algorithms, Design @section Documentation Documentation: the project leaders should give examples of how things are to be documented. High quality documentation is absolutely mandatory, so documentation should introduce the topic, and give careful reference for the provided functions. The priority is to provide reference documentation for each function. It is not necessary to provide tutorial documentation. Use free software, such as GNU Plotutils, to produce the graphs in the manual. Some of the graphs have been made with gnuplot which is not truly free (or GNU) software, and some have been made with proprietary programs. These should be replaced with output from GNU plotutils. When citing references be sure to use the standard, definitive and best reference books in the field, rather than lesser known text-books or introductory books which happen to be available (e.g. from undergraduate studies). For example, references concerning algorithms should be to Knuth, references concerning statistics should be to Kendall & Stuart, references concerning special functions should be to Abramowitz & Stegun (Handbook of Mathematical Functions AMS-55), etc. Wherever possible refer to Abramowitz & Stegun rather than other reference books because it is a public domain work, so it is inexpensive and freely redistributable. The standard references have a better chance of being available in an accessible library for the user. If they are not available and the user decides to buy a copy in order to look up the reference then this also gives them the best quality book which should also cover the largest number of other references in the GSL Manual. If many different books were to be referenced this would be an expensive and inefficient use of resources for a user who needs to look up the details of the algorithms. Reference books also stay in print much longer than text books, which are often out-of-print after a few years. Similarly, cite original papers wherever possible. Be sure to keep copies of these for your own reference (e.g. when dealing with bug reports) or to pass on to future maintainers. If you need help in tracking down references, ask on the @code{gsl-discuss} mailing list. There is a group of volunteers with access to good libraries who have offered to help GSL developers get copies of papers. @c [JT section: written by James Theiler @c And we furthermore promise to try as hard as possible to document @c the software: this will ideally involve discussion of why you might want @c to use it, what precisely it does, how precisely to invoke it, @c how more-or-less it works, and where we learned about the algorithm, @c and (unless we wrote it from scratch) where we got the code. @c We do not plan to write this entire package from scratch, but to cannibalize @c existing mathematical freeware, just as we expect our own software to @c be cannibalized.] To write mathematics in the texinfo file you can use the @code{@@math} command with @emph{simple} TeX commands. These are automatically surrounded by @code{$...$} for math mode. For example, @example to calculate the coefficient @@math@{\alpha@} use the function... @end example @noindent will be correctly formatted in both online and TeX versions of the documentation. Note that you cannot use the special characters @{ and @} inside the @code{@@math} command because these conflict between TeX and Texinfo. This is a problem if you want to write something like @code{\sqrt@{x+y@}}. To work around it you can precede the math command with a special macro @code{@@c} which contains the explicit TeX commands you want to use (no restrictions), and put an ASCII approximation into the @code{@@math} command (you can write @code{@@@{} and @code{@@@}} there for the left and right braces). The explicit TeX commands are used in the TeX output and the argument of @code{@@math} in the plain info output. Note that the @code{@@c@{@}} macro must go at the end of the preceding line, because everything else after it is ignored---as far as texinfo is concerned it's actually a 'comment'. The comment command @@c has been modified to capture a TeX expression which is output by the next @@math command. For ordinary comments use the @@comment command. For example, @example this is a test @@c@{$\sqrt@{x+y@}$@} @@math@{\sqrt@@@{x+y@@@}@} @end example @noindent is equivalent to @code{this is a test $\sqrt@{x+y@}$} in plain TeX and @code{this is a test @@math@{\sqrt@@@{x+y@@@}@}} in Info. It looks nicer if some of the more cryptic TeX commands are given a C-style ascii version, e.g. @example for @@c@{$x \ge y$@} @@math@{x >= y@} @end example @noindent will be appropriately displayed in both TeX and Info. @node Namespace, Header files, Documentation, Design @section Namespace Use @code{gsl_} as a prefix for all exported functions and variables. Use @code{GSL_} as a prefix for all exported macros. All exported header files should have a filename with the prefix @code{gsl_}. All installed libraries should have a name like libgslhistogram.a Any installed executables (utility programs etc) should have the prefix @code{gsl-} (with a hyphen, not an underscore). All function names, variables, etc.@: should be in lower case. Macros and preprocessor variables should be in upper case. Some common conventions in variable and function names: @table @code @item p1 plus 1, e.g. function @code{log1p(x)} or a variable like @code{kp1}, @math{=k+1}. @item m1 minus 1, e.g. function @code{expm1(x)} or a variable like @code{km1}, @math{=k-1}. @end table @node Header files, Target system, Namespace, Design @section Header files Installed header files should be idempotent, i.e. surround them by the preprocessor conditionals like the following, @example #ifndef __GSL_HISTOGRAM_H__ #define __GSL_HISTOGRAM_H__ ... #endif /* __GSL_HISTOGRAM_H__ */ @end example @node Target system, Function Names, Header files, Design @section Target system The target system is ANSI C, with a full Standard C Library, and IEEE arithmetic. @node Function Names, Object-orientation, Target system, Design @section Function Names Each module has a name, which prefixes any function names in that module, e.g. the module gsl_fft has function names like gsl_fft_init. The modules correspond to subdirectories of the library source tree. @node Object-orientation, Comments, Function Names, Design @section Object-orientation The algorithms should be object oriented, but only to the extent that is easy in portable ANSI C. The use of casting or other tricks to simulate inheritance is not desirable, and the user should not have to be aware of anything like that. This means many types of patterns are ruled out. However, this is not considered a problem -- they are too complicated for the library. Note: it is possible to define an abstract base class easily in C, using function pointers. See the rng directory for an example. When reimplementing public domain Fortran code, please try to introduce the appropriate object concepts as structs, rather than translating the code literally in terms of arrays. The structs can be useful just within the file, you don't need to export them to the user. For example, if a Fortran program repeatedly uses a subroutine like, @example SUBROUTINE RESIZE (X, K, ND, K1) @end example @noindent where X(K,D) represents a grid to be resized to X(K1,D) you can make this more readable by introducing a struct, @smallexample struct grid @{ int nd; /* number of dimensions */ int k; /* number of bins */ double * x; /* partition of axes, array of size x[k][nd] */ @} void resize_grid (struct grid * g, int k_new) @{ ... @} @end smallexample @noindent Similarly, if you have a frequently recurring code fragment within a single file you can define a static or static inline function for it. This is typesafe and saves writing out everything in full. @node Comments, Minimal structs, Object-orientation, Design @section Comments Follow the GNU Coding Standards. A relevant quote is, ``Please write complete sentences and capitalize the first word. If a lower-case identifier comes at the beginning of a sentence, don't capitalize it! Changing the spelling makes it a different identifier. If you don't like starting a sentence with a lower case letter, write the sentence differently (e.g., "The identifier lower-case is ...").'' @node Minimal structs, Algorithm decomposition, Comments, Design @section Minimal structs We prefer to make structs which are @dfn{minimal}. For example, if a certain type of problem can be solved by several classes of algorithm (e.g. with and without derivative information) it is better to make separate types of struct to handle those cases. i.e. run time type identification is not desirable. @node Algorithm decomposition, Memory allocation and ownership, Minimal structs, Design @section Algorithm decomposition Iterative algorithms should be decomposed into an INITIALIZE, ITERATE, TEST form, so that the user can control the progress of the iteration and print out intermediate results. This is better than using call-backs or using flags to control whether the function prints out intermediate results. In fact, call-backs should not be used -- if they seem necessary then it's a sign that the algorithm should be broken down further into individual components so that the user has complete control over them. For example, when solving a differential equation the user may need to be able to advance the solution by individual steps, while tracking a realtime process. This is only possible if the algorithm is broken down into step-level components. Higher level decompositions would not give sufficient flexibility. @node Memory allocation and ownership, Memory layout, Algorithm decomposition, Design @section Memory allocation and ownership Functions which allocate memory on the heap should end in _alloc (e.g. gsl_foo_alloc) and be deallocated by a corresponding _free function (gsl_foo_free). Be sure to free any memory allocated by your function if you have to return an error in a partially initialized object. Don't allocate memory 'temporarily' inside a function and then free it before the function returns. This prevents the user from controlling memory allocation. All memory should be allocated and freed through separate functions and passed around as a "workspace" argument. This allows memory allocation to be factored out of tight loops. To avoid confusion over ownership, workspaces should not own each other or contain other workspaces. For clarity and ease of use in different contexts, they should be allocated from integer arguments rather than derived from other structs. @node Memory layout, Linear Algebra Levels, Memory allocation and ownership, Design @section Memory layout We use flat blocks of memory to store matrices and vectors, not C-style pointer-to-pointer arrays. The matrices are stored in row-major order -- i.e. the column index (second index) moves continuously through memory. @node Linear Algebra Levels, Error estimates, Memory layout, Design @section Linear Algebra Levels Functions using linear algebra are divided into two levels: For purely "1d" functions we use the C-style arguments (double *, stride, size) so that it is simpler to use the functions in a normal C program, without needing to invoke all the gsl_vector machinery. The philosophy here is to minimize the learning curve. If someone only needs to use one function, like an fft, they can do so without having to learn about gsl_vector. This leads to the question of why we don't do the same for matrices. In that case the argument list gets too long and confusing, with (size1, size2, tda) for each matrix and potential ambiguities over row vs column ordering. In this case, it makes sense to use gsl_vector and gsl_matrix, which take care of this for the user. So really the library has two levels -- a lower level based on C types for 1d operations, and a higher level based on gsl_matrix and gsl_vector for general linear algebra. Of course, it would be possible to define a vector version of the lower level functions too. So far we have not done that because it was not essential -- it could be done but it is easy enough to get by using the C arguments, by typing v->data, v->stride, v->size instead. A gsl_vector version of low-level functions would mainly be a convenience. Please use BLAS routines internally within the library whenever possible for efficiency. @node Error estimates, Exceptions and Error handling, Linear Algebra Levels, Design @section Error estimates In the special functions error bounds are given as twice the expected ``Gaussian'' error, i.e.@: 2-sigma, so the result is inside the error 98% of the time. People expect the true value to be within +/- the quoted error (this wouldn't be the case 32% of the time for 1 sigma). Obviously the errors are not Gaussian but a factor of two works well in practice. @node Exceptions and Error handling, Persistence, Error estimates, Design @section Exceptions and Error handling The basic error handling procedure is the return code (see gsl_errno.h for a list of allowed values). Use the GSL_ERROR macro to mark an error. The current definition of this macro is not ideal but it can be changed at compile time. You should always use the GSL_ERROR macro to indicate an error, rather than just returning an error code. The macro allows the user to trap errors using the debugger (by setting a breakpoint on the function gsl_error). The only circumstances where GSL_ERROR should not be used are where the return value is "indicative" rather than an error -- for example, the iterative routines use the return code to indicate the success or failure of an iteration. By the nature of an iterative algorithm "failure" (a return code of GSL_CONTINUE) is a normal occurrence and there is no need to use GSL_ERROR there. Be sure to free any memory allocated by your function if you return an error (in particular for errors in partially initialized objects). @node Persistence, Using Return Values, Exceptions and Error handling, Design @section Persistence If you make an object foo which uses blocks of memory (e.g. vector, matrix, histogram) you can provide functions for reading and writing those blocks, @smallexample int gsl_foo_fread (FILE * stream, gsl_foo * v); int gsl_foo_fwrite (FILE * stream, const gsl_foo * v); int gsl_foo_fscanf (FILE * stream, gsl_foo * v); int gsl_foo_fprintf (FILE * stream, const gsl_foo * v, const char *format); @end smallexample @noindent Only dump out the blocks of memory, not any associated parameters such as lengths. The idea is for the user to build higher level input/output facilities using the functions the library provides. The fprintf/fscanf versions should be portable between architectures, while the binary versions should be the "raw" version of the data. Use the functions @smallexample int gsl_block_fread (FILE * stream, gsl_block * b); int gsl_block_fwrite (FILE * stream, const gsl_block * b); int gsl_block_fscanf (FILE * stream, gsl_block * b); int gsl_block_fprintf (FILE * stream, const gsl_block * b, const char *format); @end smallexample @noindent or @smallexample int gsl_block_raw_fread (FILE * stream, double * b, size_t n, size_t stride); int gsl_block_raw_fwrite (FILE * stream, const double * b, size_t n, size_t stri de); int gsl_block_raw_fscanf (FILE * stream, double * b, size_t n, size_t stride); int gsl_block_raw_fprintf (FILE * stream, const double * b, size_t n, size_t str ide, const char *format); @end smallexample @noindent to do the actual reading and writing. @node Using Return Values, Variable Names, Persistence, Design @section Using Return Values Always assign a return value to a variable before using it. This allows easier debugging of the function, and inspection and modification of the return value. If the variable is only needed temporarily then enclose it in a suitable scope. For example, instead of writing, @example a = f(g(h(x,y))) @end example @noindent use temporary variables to store the intermediate values, @example @{ double u = h(x,y); double v = g(u); a = f(v); @} @end example @noindent These can then be inspected more easily in the debugger, and breakpoints can be placed more precisely. The compiler will eliminate the temporary variables automatically when the program is compiled with optimization. @node Variable Names, Datatype widths, Using Return Values, Design @section Variable Names Try to follow existing conventions for variable names, @table @code @item dim number of dimensions @item w pointer to workspace @item state pointer to state variable (use @code{s} if you need to save characters) @item result pointer to result (output variable) @item abserr absolute error @item relerr relative error @item epsabs absolute tolerance @item epsrel relative tolerance @item size the size of an array or vector e.g. double array[size] @item stride the stride of a vector @item size1 the number of rows in a matrix @item size2 the number of columns in a matrix @item n general integer number, e.g. number of elements of array, in fft, etc @item r random number generator (gsl_rng) @end table @node Datatype widths, size_t, Variable Names, Design @section Datatype widths Be aware that in ANSI C the type @code{int} is only guaranteed to provide 16-bits. It may provide more, but is not guaranteed to. Therefore if you require 32 bits you must use @code{long int}, which will have 32 bits or more. Of course, on many platforms the type @code{int} does have 32 bits instead of 16 bits but we have to code to the ANSI standard rather than a specific platform. @node size_t, Arrays vs Pointers, Datatype widths, Design @section size_t All objects (blocks of memory, etc) should be measured in terms of a @code{size_t} type. Therefore any iterations (e.g. @code{for(i=0; i= 0; i--) @{ ... @} /* DOESN'T WORK */ @end example @noindent use something like @example for (i = N; i-- > 0;) @{ ... @} @end example @noindent to avoid problems with wrap-around at @code{i=0}. Note that the post-decrement ensures that the loop variable is tested before it reaches zero. Beware that @code{i} will wraparound on exit from the loop. (This could also be written as @code{for (i = N; i--;)} since the test for @code{i>0} is equivalent to @code{i!=0} for an unsigned integer) If you really want to avoid confusion use a separate variable to invert the loop order, @example for (i = 0; i < N; i++) @{ j = N - i; ... @} @end example Note (BJG). Originally, I suggested using @example for (i = N; i > 0 && i--;) @{ ... @} @end example which makes the test for @code{i>0} explicit and leaves @code{i=0} on exit from the loop. However, it is slower as there is an additional branch which prevents unrolling. Thanks to J. Seward for pointing this out. Note: As a matter of style, please use post-increment (@code{i++}) and post-decrement (@code{i--}) operators by default and only use pre-increment (@code{++i}) and pre-decrement (@code{--i}) operators where specifically needed. @node Arrays vs Pointers, Pointers, size_t, Design @section Arrays vs Pointers A function can be declared with either pointer arguments or array arguments. The C standard considers these to be equivalent. However, it is useful to distinguish between the case of a pointer, representing a single object which is being modified, and an array which represents a set of objects with unit stride (that are modified or not depending on the presence of @code{const}). For vectors, where the stride is not required to be unity, the pointer form is preferred. @smallexample /* real value, set on output */ int foo (double * x); /* real vector, modified */ int foo (double * x, size_t stride, size_t n); /* constant real vector */ int foo (const double * x, size_t stride, size_t n); /* real array, modified */ int bar (double x[], size_t n); /* real array, not modified */ int baz (const double x[], size_t n); @end smallexample @node Pointers, Constness, Arrays vs Pointers, Design @section Pointers Avoid dereferencing pointers on the right-hand side of an expression where possible. It's better to introduce a temporary variable. This is easier for the compiler to optimise and also more readable since it avoids confusion between the use of @code{*} for multiplication and dereferencing. @example while (fabs (f) < 0.5) @{ *e = *e - 1; f *= 2; @} @end example @noindent is better written as, @example @{ int p = *e; while (fabs(f) < 0.5) @{ p--; f *= 2; @} *e = p; @} @end example @node Constness, Pseudo-templates, Pointers, Design @section Constness Use @code{const} in function prototypes wherever an object pointed to by a pointer is constant (obviously). For variables which are meaningfully constant within a function/scope use @code{const} also. This prevents you from accidentally modifying a variable which should be constant (e.g. length of an array, etc). It can also help the compiler do optimization. These comments also apply to arguments passed by value which should be made @code{const} when that is meaningful. @node Pseudo-templates, Arbitrary Constants, Constness, Design @section Pseudo-templates There are some pseudo-template macros available in @file{templates_on.h} and @file{templates_off.h}. See a directory link @file{block} for details on how to use them. Use sparingly, they are a bit of a nightmare, but unavoidable in places. In particular, the convention is: templates are used for operations on "data" only (vectors, matrices, statistics, sorting). This is intended to cover the case where the program must interface with an external data-source which produces a fixed type. e.g. a big array of char's produced by an 8-bit counter. All other functions can use double, for floating point, or the appropriate integer type for integers (e.g. unsigned long int for random numbers). It is not the intention to provide a fully templated version of the library. That would be "putting a quart into a pint pot". To summarize, almost everything should be in a "natural type" which is appropriate for typical usage, and templates are there to handle a few cases where it is unavoidable that other data-types will be encountered. For floating point work "double" is considered a "natural type". This sort of idea is a part of the C language. @node Arbitrary Constants, Test suites, Pseudo-templates, Design @section Arbitrary Constants Avoid arbitrary constants. For example, don't hard code "small" values like '1e-30', '1e-100' or @code{10*GSL_DBL_EPSILON} into the routines. This is not appropriate for a general purpose library. Compute values accurately using IEEE arithmetic. If errors are potentially significant then error terms should be estimated reliably and returned to the user, by analytically deriving an error propagation formula, not using guesswork. A careful consideration of the algorithm usually shows that arbitrary constants are unnecessary, and represent an important parameter which should be accessible to the user. For example, consider the following code: @example if (residual < 1e-30) @{ return 0.0; /* residual is zero within round-off error */ @} @end example @noindent This should be rewritten as, @example return residual; @end example @noindent in order to allow the user to determine whether the residual is significant or not. The only place where it is acceptable to use constants like @code{GSL_DBL_EPSILON} is in function approximations, (e.g.@: Taylor series, asymptotic expansions, etc). In these cases it is not an arbitrary constant, but an inherent part of the algorithm. @node Test suites, Compilation, Arbitrary Constants, Design @section Test suites The implementor of each module should provide a reasonable test suite for the routines. The test suite should be a program that uses the library and checks the result against known results, or invokes the library several times and does a statistical analysis on the results (for example in the case of random number generators). Ideally the one test program per directory should aim for 100% path coverage of the code. Obviously it would be a lot of work to really achieve this, so prioritize testing on the critical parts and use inspection for the rest. Test all the error conditions by explicitly provoking them, because we consider it a serious defect if the function does not return an error for an invalid parameter. N.B. Don't bother to test for null pointers -- it's sufficient for the library to segfault if the user provides an invalid pointer. The tests should be deterministic. Use the @code{gsl_test} functions provided to perform separate tests for each feature with a separate output PASS/FAIL line, so that any failure can be uniquely identified. Use realistic test cases with 'high entropy'. Tests on simple values such as 1 or 0 may not reveal bugs. For example, a test using a value of @math{x=1} will not pick up a missing factor of @math{x} in the code. Similarly, a test using a value of @math{x=0} will not pick any missing terms involving @math{x} in the code. Use values like @math{2.385} to avoid silent failures. If your test uses multiple values make sure there are no simple relations between them that could allow bugs to be missed through silent cancellations. If you need some random floats to put in the test programs use @code{od -f /dev/random} as a source of inspiration. Don't use @code{sprintf} to create output strings in the tests. It can cause hard to find bugs in the test programs themselves. The functions @code{gsl_test_...} support format string arguments so use these instead. @node Compilation, Thread-safety, Test suites, Design @section Compilation Make sure everything compiles cleanly. Use the strict compilation options for extra checking. @smallexample make CFLAGS="-ansi -pedantic -Werror -W -Wall -Wtraditional -Wconversion -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wstrict-prototypes -fshort-enums -fno-common -Wmissing-prototypes -Wnested-externs -Dinline= -g -O4" @end smallexample @noindent Also use @code{checkergcc} to check for memory problems on the stack and the heap. It's the best memory checking tool. If checkergcc isn't available then Electric Fence will check the heap, which is better than no checking. There is a new tool @code{valgrind} for checking memory access. Test the code with this as well. Make sure that the library will also compile with C++ compilers (g++). This should not be too much of a problem if you have been writing in ANSI C. @node Thread-safety, Legal issues, Compilation, Design @section Thread-safety The library should be usable in thread-safe programs. All the functions should be thread-safe, in the sense that they shouldn't use static variables. We don't require everything to be completely thread safe, but anything that isn't should be obvious. For example, some global variables are used to control the overall behavior of the library (range-checking on/off, function to call on fatal error, etc). Since these are accessed directly by the user it is obvious to the multi-threaded programmer that they shouldn't be modified by different threads. There is no need to provide any explicit support for threads (e.g. locking mechanisms etc), just to avoid anything which would make it impossible for someone to call a GSL routine from a multithreaded program. @node Legal issues, Non-UNIX portability, Thread-safety, Design @section Legal issues @itemize @bullet @item Each contributor must make sure her code is under the GNU General Public License (GPL). This means getting a disclaimer from your employer. @item We must clearly understand ownership of existing code and algorithms. @item Each contributor can retain ownership of their code, or sign it over to FSF as they prefer. There is a standard disclaimer in the GPL (take a look at it). The more specific you make your disclaimer the more likely it is that it will be accepted by an employer. For example, @smallexample Yoyodyne, Inc., hereby disclaims all copyright interest in the software `GNU Scientific Library - Legendre Functions' (routines for computing Legendre functions numerically in C) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice @end smallexample @item Obviously: don't use or translate non-free code. In particular don't copy or translate code from @cite{Numerical Recipes} or @cite{ACM TOMS}. Numerical Recipes is under a strict license and is not free software. The publishers Cambridge University Press claim copyright on all aspects of the book and the code, including function names, variable names and ordering of mathematical subexpressions. Routines in GSL should not refer to Numerical Recipes or be based on it in any way. The ACM algorithms published in TOMS (Transactions on Mathematical Software) are not public domain, even though they are distributed on the internet -- the ACM uses a special non-commercial license which is not compatible with the GPL. The details of this license can be found on the cover page of ACM Transactions on Mathematical Software or on the ACM Website. Only use code which is explicitly under a free license: GPL or Public Domain. If there is no license on the code then this does not mean it is public domain -- an explicit statement is required. If in doubt check with the author. @item I @strong{think} one can reference algorithms from classic books on numerical analysis (BJG: yes, provided the code is an independent implementation and not copied from any existing software. For example, it would be ok to read the papers in ACM TOMS and make an independent implementation from their description). @end itemize @node Non-UNIX portability, Compatibility with other libraries, Legal issues, Design @section Non-UNIX portability There is good reason to make this library work on non-UNIX systems. It is probably safe to ignore DOS and only worry about windows95/windowsNT portability (so filenames can be long, I think). On the other hand, nobody should be forced to use non-UNIX systems for development. The best solution is probably to issue guidelines for portability, like saying "don't use XYZ unless you absolutely have to". Then the Windows people will be able to do their porting. @node Compatibility with other libraries, Parallelism, Non-UNIX portability, Design @section Compatibility with other libraries We do not regard compatibility with other numerical libraries as a priority. However, other libraries, such as Numerical Recipes, are widely used. If somebody writes the code to allow drop-in replacement of these libraries it would be useful to people. If it is done, it would be as a separate wrapper that can be maintained and shipped separately. There is a separate issue of system libraries, such as BSD math library and functions like @code{expm1}, @code{log1p}, @code{hypot}. The functions in this library are available on nearly every platform (but not all). In this case, it is best to write code in terms of these native functions to take advantage of the vendor-supplied system library (for example log1p is a machine instruction on the Intel x86). The library also provides portable implementations e.g. @code{gsl_hypot} which are used as an automatic fall back via autoconf when necessary. See the usage of @code{hypot} in @file{gsl/complex/math.c}, the implementation of @code{gsl_hypot} and the corresponding parts of files @file{configure.in} and @file{config.h.in} as an example. @node Parallelism, Precision, Compatibility with other libraries, Design @section Parallelism We don't intend to provide support for parallelism within the library itself. A parallel library would require a completely different design and would carry overhead that other applications do not need. @node Precision, Miscellaneous, Parallelism, Design @section Precision For algorithms which use cutoffs or other precision-related terms please express these in terms of @code{GSL_DBL_EPSILON} and @code{GSL_DBL_MIN}, or powers or combinations of these. This makes it easier to port the routines to different precisions. @node Miscellaneous, , Precision, Design @section Miscellaneous Don't use the letter @code{l} as a variable name --- it is difficult to distinguish from the number @code{1}. (This seems to be a favorite in old Fortran programs). Final tip: one perfect routine is better than any number of routines containing errors. @node Bibliography, Copying, Design, Top @chapter Bibliography @section General numerics @itemize @item @cite{Numerical Computation} (2 Volumes) by C.W. Ueberhuber, Springer 1997, ISBN 3540620583 (Vol 1) and ISBN 3540620575 (Vol 2). @item @cite{Accuracy and Stability of Numerical Algorithms} by N.J. Higham, SIAM, ISBN 0898715210. @item @cite{Sources and Development of Mathematical Software} edited by W.R. Cowell, Prentice Hall, ISBN 0138235015. @item @cite{A Survey of Numerical Mathematics (2 vols)} by D.M. Young and R.T. Gregory, ISBN 0486656918, ISBN 0486656926. @item @cite{Methods and Programs for Mathematical Functions} by Stephen L. Moshier, Hard to find (ISBN 13578980X or 0135789982, possibly others). @item @cite{Numerical Methods That Work} by Forman S. Acton, ISBN 0883854503. @item @cite{Real Computing Made Real: Preventing Errors in Scientific and Engineering Calculations} by Forman S. Acton, ISBN 0486442217. @end itemize @section Reference @itemize @item @cite{Handbook of Mathematical Functions} edited by Abramowitz & Stegun, Dover, ISBN 0486612724. @item @cite{The Art of Computer Programming} (3rd Edition, 3 Volumes) by D. Knuth, Addison Wesley, ISBN 0201485419. @end itemize @section Subject specific @itemize @item @cite{Matrix Computations} (3rd Ed) by G.H. Golub, C.F. Van Loan, Johns Hopkins University Press 1996, ISBN 0801854148. @item @cite{LAPACK Users' Guide} (3rd Edition), SIAM 1999, ISBN 0898714478. @item @cite{Treatise on the Theory of Bessel Functions 2ND Edition} by G N Watson, ISBN 0521483913. @item @cite{Higher Transcendental Functions satisfying nonhomogeneous linear differential equations} by A W Babister, ISBN 1114401773. @end itemize @node Copying, GNU Free Documentation License, Bibliography, Top @unnumbered Copying The subroutines and source code in the @value{GSL} package are "free"; this means that everyone is free to use them and free to redistribute them on a free basis. The @value{GSL}-related programs are not in the public domain; they are copyrighted and there are restrictions on their distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of these programs that they might get from you. Specifically, we want to make sure that you have the right to give away copies of the programs that relate to @value{GSL}, that you receive source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things. To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of the @value{GSL}-related code, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights. Also, for our own protection, we must make certain that everyone finds out that there is no warranty for the programs that relate to @value{GSL}. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation. The precise conditions of the licenses for the programs currently being distributed that relate to @value{GSL} are found in the General Public Licenses that accompany them. @node GNU Free Documentation License, , Copying, Top @unnumbered GNU Free Documentation License @include fdl.texi @c @printindex cp @c @node Function Index @c @unnumbered Function Index @c @printindex fn @c @node Variable Index @c @unnumbered Variable Index @c @printindex vr @c @node Type Index @c @unnumbered Type Index @c @printindex tp @bye