Automatic differentiation

30 May, 2005

Upgrading from a previous version? - see History for changes.

We want to calculate a value f(t1,...,tn) that depends on parameters t1,...,tn. But, in addition, we want to calculate the first and possibly second derivatives of f(t1,...,tn) with respect to t1,...,tn. For example, we want to find the maximum or minimum of f(t1,...,tn) and want to use an optimisation method involving derivatives.

The aim of the automatic differentiation package is to allow us to calculate the derivates using a program that is just a minor modification of the program that calculates f. This greatly reduces the work involved and greatly increases the chances of getting the correct answer.

The present version of the automatic differentiation library is a concept testing version. It really works, but is neither complete nor very efficient.

The major reference about Automatic Differentiation is

Griewank, Andreas (2000) Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, Philadelphia.

The VWD and VWD2 classes in my library represent what Griewank describes as forward mode. I think RVWD represents reverse mode although the organisation is slightly different.

The general approach

At the beginning of the program you need to define the parameter set. Suppose you have three parameters: alpha, beta and gamma.

Then you define a parameter set object, PS, as follows.

ParameterSet PS;
PS.AddParameter("alpha");
PS.AddParameter("beta");
PS.AddParameter("gamma"); 

The parameter set holds the names of the parameters and is used for simplifying moving the values in and out of arrays, printing results and making sure the user doesn't confuse two different parameter sets.

Do not try to define the same parameter set twice as the program will not recognise these as the same parameter set. But you can copy parameter sets either with = or as a parameter in a function.

Now consider the calculation loops of the program. Suppose, following the convention of newmat10, you have Real variables alpha, beta and gamma. Then, you need to define corresponding VWD objects (assuming you want just first derivatives)

Real alpha, beta, gamma;

... put values in alpha, beta and gamma

VWD Alpha(PS, alpha, "alpha");
VWD Beta(PS, beta, "beta");
VWD Gamma(PS, gamma, "gamma");

If you also want second derivatives use VWD2 objects.

If you want to use reverse mode, use RVWD objects - only first derivatives at present.

Then write your program at though you were just calculating f, but use Alpha, Beta and Gamma in place of alpha, beta and gamma and declare any intermediate variables to be of type VWD. But be careful not to use alpha when you mean to use Alpha.

Then both the value and the derivatives can be extracted from the result of your calculation.

At present you can use the following operators and functions with VWD, VWD2 and RVWD objects: =, +, -, *, /, +=, -=, *=, /=, pow, exp, log, sin, cos, tan.

You must not add parameters to the parameter set after you have defined VWD, VWD2 or RVWD objects based on it.

The classes and functions

ParameterSet

ParameterSet();

the main constructor.

int AddParameter(const String& name);

add another parameter to the set.

int LocateParameter(const String& name) const;

find the index number of a parameter (the first one has value 1).

String& operator()(int i) const;

find the name corresponding to an index.

int Size() const;

the number of parameters.

friend bool operator==(const ParameterSet& ps1, const ParameterSet& ps2);

check that two parameter sets are copies of the same set.

friend bool operator!=(const ParameterSet& ps1, const ParameterSet& ps2);

check that two parameter sets are not copies of the same set.

friend void AssertEqual(const ParameterSet& ps1, const ParameterSet& ps2);

throw an exception if two parameter sets are not copies of the same set.

The ParameterSet object is really just a pointer to a ParameterSetClass object that contains the real information. The ParameterSetClass object contains a reference counter to see whether anything is referencing it and will destroy itself when nothing references it. Users should never refer directly to the ParameterSetClass object.

VWD

VWD();

default constructor. You can't use a VWD object created with the default constructor since the ParameterSet is not defined. It must first be set equal to some other VWD object. You should be wary of using the default constructor in a global statement. This is intended to allow one to build arrays or lists of VWDs.

VWD(Real v);

construct a VWD object with value v, derivatives = 0. This version creates an object that initially can be used only with +=. The parameter set value is set on the first use of +=.

VWD(const ParameterSet& ps, Real v);

construct a VWD object with parameter set ps and value v, derivatives = 0.

VWD(const ParameterSet& ps, Real v, const RowVector& d);

construct a VWD object with parameter set ps and value v, derivatives d.

VWD(const ParameterSet& ps, Real v, int k);

construct a VWD object with value v, derivative with k-th index = 1, other derivatives = 0.

VWD(const ParameterSet& ps, Real v, const String& name);

construct a VWD object with value v, derivative with respect to parameter name = 1, other derivatives = 0.

Real GetValue() const;

return the value.

ReturnMatrix GetDerivatives() const;

return the derivatives (as RowVector).

Real GetDerivative(int i) const;

return the i-th derivative.

ParameterSet GetParameterSet();

return the parameter set.

VWD2

See the corresponding functions under VWD.

VWD2();
VWD2(Real v);
VWD2(const ParameterSet& ps, Real v);
VWD2(const ParameterSet& ps, Real v, const RowVector& d, const Matrix& d2);
VWD2(const ParameterSet& ps, Real v, int k);
VWD2(const ParameterSet& ps, Real v, const String& name);
Real GetValue() const;
ReturnMatrix GetDerivatives() const;         // as RowVector
ReturnMatrix GetSecondDerivatives() const;   // as Matrix
Real GetDerivative(int i) const;
Real GetSecondDerivative(int i, int j) const;
ParameterSet GetParameterSet();

TAPE

In order to use the reverse mode one first sets up a TAPE object.

TAPE(const ParameterSet& ps, int n, int m);

The parameter n defines the length of the tape and must be large enough to contain details of each calculation (+, -, *, /, sin, cos, etc) carried out in reverse mode, but not so large that there is insufficient memory to store the tape. The parameter m should be set equal to or larger than the number of VWDs to be turned into RVWDs or RVWDs built directly - see next section. (Ideally, the TAPE object should be replaced by something like the STL vector class that can increase its length during the course of a calculation.)

Note that you can add additional parameters to the parameter set after defining the TAPE up until the first RVWD is defined.

The TAPE object is really just a pointer to a TAPE_Class object that contains the real information. The TAPE_Class object contains a reference counter to see whether anything is referencing it and will destroy itself when nothing references it. Users should never refer directly to the TAPE_Class object.

RVWD

You use reverse mode by doing your calculations with RVWD objects rather than VWD objects and then converting to a VWD object at the end of the calculation. You start by setting up the initial VWD objects as before and then building RVWD objects from them.

You can define a RVWD based on a VWD with the constructor:

RVWD(const TAPE& t, const VWD& vwd);

or you can build it directly:

RVWD(const TAPE& t, Real v, const String& name);
RVWD(const TAPE& t, Real v, int k);

You can also initialise a RVWD to a constant

RVWD(const TAPE& t, Real v);

There is also the default constructor, which like the default constructor for VWD cannot be used until set equal to a valid value. There is also a constructor RVWD(Real) which like VWD(Real) must initially be used with +=.

Then RVWDs can be manipulated in the same way as Reals, just as VWDs can be manipulated in the same way as Reals. Because the information needed to recover the derivatives is stored on the TAPE rather than being manipulated at each operation there is the potential for a much faster operation at the expense of the TAPE requiring a large amount of storage.

You can convert a RVWD back to a VWD with the constructor (or automatic conversion)

VWD(const RVWD& rvwd);

Use this to get the final result of a calculation. This conversion is slow; comparable in speed to the original calculation since it requires a sweep through the TAPE. Hence the use of reverse mode is most effective when you want only one or two numbers (and their derivatives) from your calculation. Be careful not to cause accidental conversions from RVWD to VWD through including VWD and RVWD objects in the same formula.

Alternatively you can recover the value and derivatives from a RVWD with

Real GetValue() const;
ReturnMatrix GetDerivatives() const;    // as RowVector

GetDerivatives uses the sweep through the TAPE so is slow in the same way that converting to a VWD is slow. 

When a calculation is complete you can delete all the information on the TAPE object with the TAPE::reset() member function. Before this function is called all RVWDs referring to the TAPE object must either be destroyed or assigned to a constant.

Alternatively use the TAPE::purge() member function to delete obsolete material following a calculation. Use at the beginning of a new C++ block rather than at the end of an old one so that all unwanted RVWDs will have been destroyed. Alternatively you can deactivate an unwanted RVWD with a command like rvwd = RVWD(Tape, 0).

If you are running more than one calculation in a program it is important to run reset or purge or start a new tape between calculations. Otherwise the reverse sweep operation will waste time by repeating the previous calculations as well as the current ones.

Use  TAPE::purge(RVWD& rvwd) when you have just calculated a value for rvwd and a number of  RVWDs required to build rvwd have been destroyed. For example, rvwd has just been returned from a function. This will recover the space on the tape used by the destroyed RVWDs. To recover space you need to have tape-elements corresponding to destroyed RVWDs followed just by the tape-elements currently corresponding to rvwd.

The essence of effective use of reverse mode seems to be to avoid requiring too much TAPE space and currently purge is the main tool in this library for doing this.

You can see roughly how reverse mode - as I have implemented it -  works by looking at the corresponding code for *, +, sin etc for the VWD and RVWD classes. In the VWD class the derivatives are calculated immediately and involve linear operations on the row vectors of derivatives; in the RVWD class the multipliers of these derivatives are simply loaded onto the TAPE. When the time comes to extract the derivative the TAPE_Class::reverse_sweep() function is able to amalgamate the results with a minimum of vector calculation.

Numerical integration

Suppose the calculation of f involves a one dimensional numerical integration. This package enables you simultaneously to calculate the integrals of the derivatives. For example, we might want to find

       2
   Integral exp(ax + b) dx
      x=1

together with its derivatives with respect to a and b. (Of course, in this case, we can do it all analytically).

At present I use 32 bit Gaussian numerical integration. This works remarkably well in many cases. However, it is up to you to tell whether it is appropriate for your problem (and you need to consider both the values and the derivatives).

Derive the function you need to integrate from the class VWDOfReal, VWD2OfReal or RVWDOfReal. You will probably need a new constructor and you will need to override operator()().

Then the numerical integration is carried out by

VWD GaussianIntegration32(VWDOfReal& function, Real Lower, Real Upper);

or

VWD2 GaussianIntegration32(VWD2OfReal& function, Real Lower, Real Upper);

or

RVWD GaussianIntegration32(RVWDOfReal& function, Real Lower, Real Upper);

Compiling

The program requires newmat and my string library.

You will need to #include vwd.h or rvwd.h.

The program has been tested under Borland C++ version 5.0, 5.5, 5.6  Microsoft C++ version 6,7, Gnu G++ 3.3, 3.4, Intel compilers, 8.1, for Windows and Linux, and Sun C++ version 6.

I have included make files for compiling the test program with a variety of compilers. You can use the genmake program for generating make files for a number of other compilers.

I have not implemented the clean-up functions required by my simulated exceptions package so if you are using my simulated exceptions there may be memory leaks if you throw an exception. 

I have not set up the namespace options used in newmat.

Testing

A test program, test_vwd.cpp, is included. This uses templates so won't work with compilers that don't allow simple templates. With templates, we can use the same source code with Real, VWD, VWD2 and RVWD. The test program calculates the same expression using two rather different functions and we can check that the answers are the same. Output from Borland 5.6 is in test_vwd.txt. Note that some compilers will give different printouts of the contents of the tape, presumably because of a different order in which expressions are evaluated.

History

Files

vwd.h c++ definition file
vwd.cpp c++ bodies
rvwd.h c++ definition file - reverse AD
rvwd.cpp c++ bodies - reverse AD
test_vwd.cpp test program
test_vwd.txt output from test program
autodiff.htm this file
rbd.css style sheet used by autodiff.htm
vwd_b55.mak make file for use with Borland C++ 5.5
vwd_b56.mak make file for use with Borland C++ 5.6
vwd_gnu.mak make file for use with GCC
vwd_cc.mak make file for use with CC
vwd_m6.mak make file for use with Microsoft VC++ 6 and 7
vwd_i8.mak make file for use with Intel C++ 8.1 for Windows
vwd_il8.mak make file for use with Intel C++ 8.1 for Linux
vwd_ow.mak make file for use with Open Watcom