HELP MLP David Young August 1998 MULTI-LAYER PERCEPTRONS This help file describes facilities in ___LIB * ___MLP, which implements multi-layer perceptrons, a class of artificial neural network popularised by the book "Parallel Distributed Processing" (D.E. Rumelhart, J.L. McClelland & the PDP Research Group, MIP Press, 1986), Vol. 1, Chapter 8 (referred to below as the PDP book). The back-propagation algorithm is used to carry out gradient descent training. For a tutorial introduction with examples, see _____TEACH * ___MLP. Where the terms "lower" and "higher" are used, the net is to be pictured with the inputs at the bottom. CONTENTS - (Use g to access required sections) 1 Net records 1.1 Net creation and resetting 1.2 Simple weight and bias access and update 1.3 Net parameter access and update 1.4 Net array access 1.5 Transfer function specification 2 Data records 2.1 Record creation 2.2 Data record access 3 Net training and testing 3.1 Training 3.2 Testing 4 Other facilities 4.1 Printing 4.2 Copying 4.3 Random number generation -------------- 1 Net records -------------- 1.1 Net creation and resetting ------------------------------- mlp_makenet(___nin, ______nunits, _______wtrange, ___eta) -> ___net [_________procedure] mlp_makenet(___nin, ______nunits, _______wtrange, ___eta, _____alpha) -> ___net mlp_makenet(___nin, ______nunits, _______wtrange, ___eta, _____alpha, _____decay) -> ___net Creates and returns a network record. The arguments are: ___nin An integer giving the number of inputs. ______nunits A vector giving the number of units in each layer of the network, starting with the lowest hidden layer (if there is one) and finishing with the output layer. If the all the units are to use the logistic output function as in the PDP book, then each entry in ______nunits can be simply an integer. Alternative output functions can be specified. If the entry for a layer is a 2-element vector, then the first element of this inner vector must be an integer giving the number of units in the layer, and the second element may be either a word giving the transfer function to be used by every unit in the layer, or a vector of words giving the individual transfer functions to be used by each unit in turn (see the examples in _____TEACH * ___MLP). The length of this innermost vector must be equal to the number of units specified. The words available are o "identity" for the identity function o "tanh" for the hyperbolic tangent o "logistic" for the logistic function o "logistic_fast" for a fast approximation The "logistic_fast" option gives a substantial speedup over "logistic" by using an approximation to the exponential function which relies on the hardware representation of floating-point quantities. See N.N. Schraudolph (1998) 'A fast, compact approximation of the exponential function', Technical report IDSIA-07-98, IDSIA, Lugano, Switzerland, submitted to Neural Computation. It works on Sun Solaris machines with the gcc compiler, and should be tested before being tried on other machines. The approximation is fairly close but on delicate problems the results should be compared with those from the ordinary logistic function. _______wtrange A number specifying the range of the initial random weights and biases. A uniform distribution from -_______wtrange/2 to +_______wtrange/2 is used. If _______wtrange is negative, the weights and biases are set to zero. ___eta A number specifying the learning rate as in the PDP book. The basic learning rule for a given weight or bias (provided it is unclamped) is __w' = _w - ___eta * __Ew where _w is a weight value, __w' is its value after update, and __Ew is an estimate of the derivative of the error with respect to this weight. In continuous training, this equation is applied after every example, whilst in batch training it is applied after each batch of examples. See the note in _____TEACH * ___MLP about how ___eta's effective value is affected by _____alpha. _____alpha A number specifying the momentum as in the PDP book, when training is continuous. If omitted, defaults to 0.0, i.e. no momentum. In continuous training, the derivative estimate used to update a weight is updated after each example using ___Ew' = ___Ewc + _____alpha * __Ew where ___Ewc is the derivative estimate for the current example, obtained using the backpropagation algorithm. This amounts to passing ___Ewc values through a smoothing filter with exponentially decreasing values to obtain ___Ewc. In batch training, _____alpha is ignored, and __Ew is obtained by averaging the values of ___Ewc over the batch. _____decay A number specifying the amount of weight decay to carry out after each example is presented during learning. If specified, then _____alpha must also be specified. If omitted or negative, no weight decay occurs. If non-negative, then after an example has been presented and the weights and biases all updated, every unclamped weight and bias is multiplied by _____decay. _______wtrange -> mlp_resetnet(___net) [_________procedure] Resets a net to an untrained state as if it had just been created. The weights and biases are set to random values from a uniform distribution from -_______wtrange/2 to +_______wtrange/2, unless _______wtrange is negative, in which case they are set to zero. The activations and rate of change arrays are all set to zero. 1.2 Simple weight and bias access and update --------------------------------------------- mlp_weight(_____level, _____unit1, _____unit2, ___net) -> _____float [_________procedure] ___num -> mlp_weight(_____level, _____unit1, _____unit2, ___net) Returns or updates a specified weight or bias. _____level is an integer: 1 means the weights from the inputs to the lowest hidden layer, 2 means those from the lowest hidden layer to the next layer, and so on. _____unit1 and _____unit2 are integers specifying the unit in the lower layer and the unit in the upper layer respectively. _____unit1 may be 0 or to specify the bias for _____unit2. ___net is a network record. For frequent large-scale weight setting or access direct access to the weight arrays is possible using the procedures below. mlp_clamp(_____level, _____unit1, _____unit2, ___net) -> ____bool [_________procedure] ____bool -> mlp_clamp(_____level, _____unit1, _____unit2, ___net) Returns or updates the clamped status of a weight or bias. _____level, _____unit1, _____unit2 and ___net are as for mlp_weight. means that the weight or bias is not changed during training. (Note that the use of this procedure does not involve mlp_clamped, described below, which is only needed if the clamped status is changed by direct access to the arrays, as described under mlp_etas and mlp_etbs below.) 1.3 Net parameter access and update ------------------------------------ mlp_nlevels(___net) -> ___int [_________procedure] Returns the number of layers of units, not including "input units". For the most common architecture of 1 hidden layer and 1 output layer, this returns 2. mlp_ninunits(___net) -> ___int [_________procedure] Returns the number of inputs. mlp_nhunits(___net) -> ___vec [_________procedure] Returns a vector of integers giving the number of units in each layer, starting with the lowest hidden layer and ending with the output layer. mlp_noutunits(___net) -> ___int [_________procedure] Returns the number of output units (same as last(mlp_nhunits(___net))). mlp_ntunits(___net) -> ___int [_________procedure] Returns the total number of units (not including "input units") in the net. mlp_nweights(___net) -> ___int [_________procedure] Returns the total number of weights in the net (not counting biases). mlp_eta(___net) -> ___num [_________procedure] ___num -> mlp_eta(___net) The forward procedure returns the global value of the learning rate. This may have been overriden for individual biases or weights. The updater sets the learning rate for all biases or weights that have not been clamped. mlp_alpha(___net) -> ___num [_________procedure] ___num -> mlp_alpha(___net) Returns or updates the momentum constant for learning. This applies to all unclamped weights and biases. mlp_clamped(___net) -> ____________bool_or_word [_________procedure] ____________bool_or_word -> mlp_clamped(___net) This is used to record whether weights or biases have been clamped, but calling the updater does not actually cause anything to be clamped. Returns if it is certain that a weight or bias has been clamped, if it is certain that none has, and "maybe" if it is possible that a weight or bias has been clamped somewhere in the network. Users who clamp or unclamp weights or biases by direct access to the ____etas and ____etbs arrays must also assign the correct value to mlp_clamped. It is always safe to assign the word "maybe" to this updater. 1.4 Net array access --------------------- mlp_activs(___net) -> ___vec [_________procedure] Returns a vectorclass object of arrays containing the current activations of the units after a call to mlp_response, or the current errors after a call to mlp_learn. ___vec(1) is the array for the lowest hidden layer (if there is one) up to last(___vec) which is for the output layer. For a given array ___wts in the vector, ___wts(_i) is the activation or error of the _i'th unit in the layer, numbered from 1. The elements of ___vec can not be updated, though the elements of the arrays can. mlp_actvec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_actvec(___net) ________floatvec is a vector of single precision floats which combines all the arrays in the vector returned by mlp_activs. It consists in effect of the arrayvectors of the individual level arrays concatenated in order. Thus the first element stores the activation of the first unit in the lowest layer; the last element is the activation of the last unit in the output layer. The length of the vector is the number returned by mlp_ntunits. If the updater is used, the new vector must be a packed vector of floats (not a full vector) and must be the correct length. This following restriction applies to the updaters of all the combined vector access procedures, unless otherwise stated. mlp_biases(___net) -> ____list [_________procedure] mlp_bsvec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_bsvec(___net) As mlp_activs and mlp_actvec but containing the unit biases. mlp_bschange(___net) -> ____list [_________procedure] mlp_bschvec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_bschvec(___net) As mlp_activs, but containing the most recent adjustments made by the learning algorithm to the unit biases. mlp_etbs(___net) -> ____list [_________procedure] mlp_etbvec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_etbvec(___net) As mlp_activs, but containing the learning rates associated with the unit biases. If a learning rate is made negative, then the associated bias is clamped, i.e. it is not changed during training. If the sign of a learning rate is changed, the appropriate value must be assigned to mlp_clamped(___net). mlp_tranfns(___net) -> list [_________procedure] mlp_tranfnvec(___net) -> ______intvec [_________procedure] ______intvec -> mlp_tranfnvec(___net) As mlp_activs, but containing the integers indexing the transfer (output) functions associated with the units. See mlp_transfuncs below for the meaning of the integer codes. If the updater is used, the vector must be an integer vector and its contents must all be in the range of integers returned by mlp_transfuncs. mlp_weights(___net) -> ___vec [_________procedure] Returns a vectorclass object of weight arrays. ___vec(1) has the weights from the inputs to the lowest hidden layer (if there is one), ___vec(2) from the lowest hidden layer to the second hidden layer, and so on up to last(___vec) which has the weights from the top hidden layer to the output layer. For a given array ___wts in the vector, ___wts(_i, _j) is the weight from the _i'th unit in the lower layer to the _j'th unit in the higher layer (the opposite of the subscript ordering used in the matrix notation of the PDP and other books). The elements of ___vec cannot be updated, though the elements of the arrays can. mlp_wtvec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_wtvec(___net) ________floatvec contains all the weights in the network. It is a vector which is in effect the concatenation of the arrayvectors of the arrays in the list returned by mlp_weights, in order. Its length is the value returned by mlp_nweights. mlp_wtchange(___net) -> ____list [_________procedure] mlp_wtchvec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_wtchvec(___net) As mlp_weights but containing the latest adjustments made to the weights after a call to mlp_learn. mlp_etas(___net) -> ____list [_________procedure] mlp_etavec(___net) -> ________floatvec [_________procedure] ________floatvec -> mlp_etavec(___net) As mlp_weights but containing the learning rates for the individual weights. If a learning rate is negative, the corresponding weights is clamped, i.e. it is not changed during training. If the sign of a learning rate is changed, the appropriate value must be assigned to mlp_clamped(___net). 1.5 Transfer function specification ------------------------------------ mlp_transfuncs(____word) -> ___int [________constant] This is a property which returns the integer codes corresponding to the words used to specify transfer (output) functions. See _____TEACH * ___MLP for how to find what words are available. --------------- 2 Data records --------------- 2.1 Record creation -------------------- mlp_makedata(____data, _____niter, ______ransel) -> _______datarec [_________procedure] mlp_makedata(____data, ____mask, ______pstart, ____pinc, ____pend, _____niter, ______ransel) -> _______datarec mlp_makedata(____data, ______nunits, ______nstart, _____nstep, ____nend, _____niter, ______ransel) -> _______datarec This procedure produces a data record suitable for training or testing the networks defined above. The same procedure is used to produce records for the inputs, targets and outputs of a network. All the arguments except ____data are optional. Most uses of the procedure do not require the full complexity, and examples are given in _____TEACH * ___MLP. In the first form, the arguments are as follows: ____data An array of data. If this was created with *newsfloatarray or some other procedure that returns a packed array of single precision floats, and the array is not offset in its arrayvector, mlp_makedata puts a pointer to the array into _______datarec. Otherwise a new array is created and the data are copied to it. Putting a pointer into the record uses less memory and is faster, and also means that the results of testing can be accessed by reference to the original variable. If ____data is 1-dimensional, then it is taken to hold a single example at any time (of input data, targets or output results). The value in ____data(_i) corresponds to the _i'th input or output unit (depending on whether _______datarec is used as input, or target or output). The length of ____data will match the number of inputs or number of outputs of some net. If ____data is 2-dimensional, then each column (in the matrix convention that the first subscript is the row number and the second is the column number) refers to one example. That is, ____data(_i,_j) is the value for the _i'th input or output unit and the _j'th example. The array's dimensions are (no. of input or output units) x (no. of examples). _____niter This is optional, but if present then ______ransel must also be given. When _______datarec is used as a target record, _____niter sets the number of iterations for which the net is to be trained. It defaults to 1. If _____niter is an integer or , then continuous learning is used (the weights and biases are updated after every example), and _____niter examples are shown to the network on a single call to mlp_learn. The value means that only a single back propagation of errors is carried out. If _____niter is a structure (such as a pair, a vector or a list) batch learning is carried out. The first value in _____niter gives the number of batches to be presented, whilst the second gives the number of examples in each batch. If the second element is , the number of examples in each batch is set equal to the number of examples in the training data. Weight updating occurs at the end of each batch. ______ransel This is optional, but if present then _____niter must also be given. When _______datarec is used as a target record, the boolean ______ransel specifies whether training examples are to be selected in a random order from the example set (), or cyclically (). In the second form, the arguments are: ____data As for the first form, but the array may have any number of dimensions. Each example is represented by a group of elements in the array; these elements have a fixed set of offsets relative to one another, but may have a variety of absolute coordinates in the array. ____mask This is a list of vectors. The length of the list is equal to the number of input or output units, and the length of each vector is equal to the number of dimensions of ____data. Each vector gives the offset of an array element relative to an arbitrary origin. A set of elements of ____data defined relative to a valid origin forms a single example. If ____mask is omitted, it defaults to the set of offsets which allows ____data to be tesselated without overlap given the set of origins specified by ____pinc, which may not also be omitted. ______pstart This is a vector of integers, with length equal to the number of dimensions of ____data, giving the coordinates of the origin for the example with the lowest values for its coordinates. The element specified by ______pstart must lie inside the bounds of ____data. If ______pstart is omitted, it defaults to the set of smallest possible values given the boundslist of ____data and the offsets in ____mask. If ______pstart is omitted, ____pend must also be omitted. ____pinc This is a vector of positive integers which specifies the amount to jump along each dimension of ____data to move the origin from one valid example to the next. If ____pinc is omitted, it defaults to {1 1 1 ...} where there are as many 1's as ____data has dimensions, i.e. the data are sampled as densely as possible and if the examples involve more than a single element then they overlap. If ____pinc is omitted, then ______pstart and ____pend must also be omitted. If ____mask is omitted, then the product of the elements of ____pinc must equal the number of input or output units. ____pend Like ______pstart, but giving the origin position with the highest-valued coordinates. If omitted, defaults to the highest values possible and ______pstart must also be omitted. Each element of ____pend must be greater than or equal to the corresponding element of ______pstart. _____niter, ______ransel As for the first form. The third form is shorthand for a particular case of the second form, where time series data of a simple sort are involved. The arguments are: ____data As for the first form, but must be a 1-dimensional array. Examples are taken from contiguous sections of the array. ______nunits An integer giving the number of input or output units. ______nstart An integer giving the start point in the array of the example with the lowest coordinates. If omitted, it defaults to the lower array bound, and ____nend must be omitted also. ____ninc An integer giving the amount to move along the array between examples. If this is equal to 1, the examples overlap as much as possible; if it equals ______nunits then the examples abut one another but do not overlap. ____nend An integer giving the start point in the array of the example with the highes coordinates. If omitted, it defaults to the upper array bound minus ______nunits, and ______nstart must be omitted also. _____niter, ______ransel As for the first form. mlp_fullindex -> ____bool [________variable] ____bool -> mlp_fullindex This variable determines how mlp_makedata encodes the example positions in the data array. If it is then index arrays are built which substantially increase the size of the data record, but which permit the training and response procedures to run as fast as possible. If it is then a more concise but slightly slower representation is used. This variable must have the same value for the creation of all the records used in a single call to mlp_learn, mlp_response or mlp_target. 2.2 Data record access ----------------------- mlpdata_data(_______datarec) -> ___arr [_________procedure] ___arr -> mlpdata_data(_______datarec) Returns or updates the data array held in the record. On update ___arr must have the same *boundslist as the original ____data argument to mlp_makedata. If ___arr is a packed single float array (i.e. one created by *newsfloatarray or the like), and is not offset in its arrayvector, then a pointer to it will be placed in _______datarec. Otherwise its contents will be copied to a new array. mlpdata_niter(_______datarec) -> ___int [_________procedure] ___int -> mlpdata_niter(_______datarec) Returns or updates the number of iterations to be used if the record is a target for training. mlpdata_nbatch(_______datarec) -> ___int [_________procedure] ___int -> mlpdata_nbatch(_______datarec) Returns or updates the number of examples to be taken in a batch if the record is a target for training. The value 1 means that continuous learning is to be used. mlpdata_datvec(_______datarec) -> ___arr [_________procedure] Returns the arrayvector of the data array held in the record. mlpdata_ransel(_______datarec) -> ___int [_________procedure] ____bool -> mlpdata_ransel(_______datarec) Returns or updates the flag as to whether examples are selected randomly when the record is a target for training. The value returned is 0 for false or 1 for true. mlpdata_nunits(_______datarec) -> ___int [_________procedure] The number of input or output units the net is expected to have when the data are used for training or testing. mlpdata_negs(_______datarec) -> ___int [_________procedure] The number of different examples of data held in the record. mlpdata_offset_mask [_________procedure] mlpdata_mask_origs [_________procedure] mlpdata_ndim [_________procedure] These refer to the fields that carry the internal coding of the positions of the examples in the data array, and are not useful to users. --------------------------- 3 Net training and testing --------------------------- 3.1 Training ------------- (_________input_rec, __________target_rec) -> mlp_learn(___net) -> (___err, ______errvar) [_________procedure] mlp_learn(_________input_rec, __________target_rec, ______nunits, _______wtrange, ___eta, _____alpha, _____decay) -> (___net, ___err, ______errvar) Trains a network. In the first form, _________input_rec and __________target_rec must have been created with mlp_makedata and ___net with mlp_makenet. The parameters for training are as set up by those procedures. The number of units for _________input_rec must match the number of inputs for ___net, and the number of units for __________target_rec must match the number of outputs for ___net. The number of examples in _________input_rec and __________target_rec must match. The weights and biases of ___net are updated. Repeated calls can be used to continue training until the error is acceptable. Training parameters and data can be changed between calls. The results ___err and ______errvar are numbers which record the mean error and the variance of the error during the training run. In the second form, a net is created, then trained and returned. ______nunits, _______wtrange, ___eta, _____alpha and _____decay are as for mlp_makenet and the other arguments and results are as above. The number of training iterations (i.e. the number of batches presented to the network) is given by the _____niter and ______nbatch fields of the target record. Each example involves a forward pass of data through the network followed by backpropagation of errors. If ______nbatch is 1, training is continuous and weight updating occurs after every example. Otherwise, the weight changes are averaged over ______nbatch examples and then the weights are updated. In either case, ______nbatch*_____niter examples are presented to the network in a single call to mlp_learn, and a total of _____niter weight updates takes place. If _____niter is 0, however, a single backward pass but no forward pass is carried out - it is assumed that the activations have been set by a previous call to mlp_response, and the weights are updated in continuous mode. This is useful in conjunction to mlp_target - see the example in _____TEACH * ___MLP. ___net -> mlp_target(_______datarec) [_________procedure] This allows separate nets to be cascaded for training. A call to this procedure must be preceded by a call to mlp_learn. _______datarec must be a data record suitable for use as an input record for ___net, containing a single example. It is updated so that it provides suitable training data for any network which had supplied the most recent input to ___net. 3.2 Testing ------------ mlp_response(_________input_rec, ___net) -> __________output_rec [_________procedure] (_________input_rec, ___net) -> mlp_response(__________output_rec) This applies the net to each example in _________input_rec and stores the results in the corresponding part of __________output_rec. In the first form an appropriate output record is created and returned. The form of the output record will be that corresponding to the first kind of call to mlp_makedata. In the updating form the contents of __________output_rec are updated; it can be produced by any kind of call to mlp_makedata. The number of units for _________input_rec must match the number of inputs for ___net. For the updater, the number of units for __________output_rec must match the number of outputs for ___net and the number of examples in _________input_rec and __________output_rec must match. ------------------- 4 Other facilities ------------------- 4.1 Printing ------------- mlp_printactivs(___net) [_________procedure] Prints the current activation values of the net. Should be called after a call to mlp_response, as the activation arrays store errors rather than activations after a call to mlp_learn. mlp_printweights(___net) [_________procedure] Prints the current weights and biases of the net. 4.2 Copying ------------ mlp_copypart(____net1, _____level, _____unit1, _____unit2) -> ____net2 [_________procedure] (____net1, _____level, _____unit1 _____unit2) -> mlp_copypart(____net2, _________net2_unit) The forward procedure copies the subtree of ____net1 that is below the units from _____unit1 to _____unit2 in the given level into a new net which is returned. The updater copies the same subtree of ____net1 into ____net1, updating the weights and biases that lie below the units from _________net2_unit onwards in the given layer. 4.3 Random number generation ----------------------------- (_____int_1, _____int_2, _____int_3) -> mlp_random_seed [______active ________variable] mlp_random_seed -> (_____int_1, _____int_2, _____int_3) Returns or updates the three values which represent the state of the random number generator, which is used to produce random weights on net creation and also to select examples at random from the training set. Saving and restoring these values allows training runs to be repeated exactly. If any of the values assigned to mlp_random_seed is , then the seeds are taken from varying system variables such as the real-time clock. This is done automatically when the random number generator is first used, and should not normally be done by a user's program, as the distribution is better if the generator is allowed simply to continue running. --- ___________________$popvision/help/mlp --- _________Copyright __________University __of ______Sussex _____1998. ___All ______rights _________reserved.