TEACH MLP David Young August 1998 MULTI-LAYER PERCEPTRONS This teach file describes facilities in ___LIB * ___MLP, which implements multi-layer perceptrons, a class of artificial neural network popularised by the book "Parallel Distributed Processing" (D.E. Rumelhart, J.L. McClelland & the PDP Research Group, MIP Press, 1986), Vol. 1, Chapter 8 (referred to below as the PDP book). The back-propagation algorithm is used to carry out gradient descent training. For a more systematic description, see ____HELP * ___MLP. CONTENTS - (Use g to access required sections) 1 Introduction 2 Loading the library 3 A simple example 3.1 Creating a net 3.2 A note on momentum 3.3 Creating an input array 3.4 Converting the input array to MLP format 3.5 Creating targets 3.6 Training the net 3.7 Testing the net 3.8 Printing the net 3.9 Untraining the net 4 Some further basic facilities 4.1 Updating and non-updating procedures 4.2 The error results 4.3 Changing net and data parameters 4.4 Batch learning 4.5 Access to weights and biases 4.6 Training and testing on single examples 5 Examples using different data formats 5.1 Time series data 5.2 Image data 5.3 Saving space with large data sets 6 More facilities 6.1 Saving and restoring nets on disc 6.2 Different transfer functions 6.3 Clamping weights and biases 6.4 Accessing and updating weight and bias arrays 6.5 Back propagation between networks 6.6 Calculation precision 6.7 Repeatable training runs --------------- 1 Introduction --------------- The basic computational unit of a perceptron takes data from an input vector and computes a single number. The unit's internal data consists of a weight vector and a bias. It forms the dot product of the weights and the inputs and adds the bias. A function, usually nonlinear, is then applied to this value to give the output. In classical perceptrons, the nonlinear function is the step function, but in perceptrons trained using gradient descent a smooth sigmoidal function is used. Multi-layer perceptrons are feedforward (i.e. nonrecurrent) artificial neural networks made of layers of these computational units. Supervised learning can be carried out through gradient descent implemented via the backpropagation algorithm. ___LIB * ___MLP provides a procedural interface to an implementation of multi-layer perceptrons. The training algorithm is the standard backpropagation algorithm with no frills other than momentum and weight decay, with a choice of continuous or batch mode. There is a small choice of output functions and weights can be trained at differential rates. A particular strength of the library is the set of facilities for operating on time series and image data. This file introduces the library through examples. These can be run by marking them in Ved and then loading them (usually with ). They are intended to be executed in order. They can be modified for incorporation into your own programs, usually by building your own procedures round these fragments rather than by using them at top level. Inside procedures, variables which are here declared as "vars" should probably be declared "lvars". ---------------------- 2 Loading the library ---------------------- You should have the popvision libraries available on your machine. To load mlp, use the commands  uses popvision ;;; access to the popvision directories  uses mlp You will also need a library for creating arrays of floating point numbers, loaded with  uses newsfloatarray ------------------- 3 A simple example ------------------- 3.1 Creating a net ------------------- A net is initially built using mlp_makenet by specifying: o The number of input units. o The number of units in each of the higher layers, from the first hidden layer to the output, as a vector. o The range of the initial weights, _______wtrange. The weights are intially set at random, and are uniformly distributed from -_______wtrange/2 to +_______wtrange/2. o The learning rate, ___eta. o The momentum, _____alpha. Thus to create a little network with 2 inputs, one hidden layer of two units, and a single output unit, the following call can be given:  vars net;  mlp_makenet(2, {2 1}, 2.0, 1.4, 0.6) -> net; Here the weights and biases range from -1 to +1, the learning rate is 1.4 and the momentum 0.6. 3.2 A note on momentum ----------------------- The arguments to mlp_makenet correspond to ___eta and _____alpha as used in the PDP book, p. 330. It is worth noting that a nonzero value for the momentum _____alpha increases the effective value of the learning rate ___eta. On a smooth error surface, the effective learning rate is given by ___eta/(1-_____alpha). For the net created above, the effective rate is 3.5. You may wish to specify the effective learning rate _______eta_eff and use _______eta_eff*(1-_____alpha) as the argument to mlp_makenet. 3.3 Creating an input array ---------------------------- The input data must be initially stored in an array. The array should normally be created using *newsfloatarray (or some other procedure that produces packed floating point arrays), though other array types will be copied if necessary. Arrays are needed for the inputs, the targets, and the outputs of the network. There are various ways to lay out the data in the array. The simplest is to use a 2-D array, with one column for each example and one row for each input unit. For example, suppose we wish to set up four input patterns, like this:   input pattern 1: 0 0   input pattern 2: 0 1   input pattern 3: 1 0   input pattern 4: 1 1 The input array should look like: col 1 col 2 col 3 col 4 row 1 0 0 1 1 row 2 0 1 0 1 So there's a column for each pattern and a row for each input unit. The convention used in the program is that the first index of an array identifies the row, the second index the column, of the data as laid out on the page. (This is the same as for mathematical matrix notation. Unfortunately in image processing the opposite convention is adopted, so you should always check which one is being followed.) So to actually create a suitable array and fill it, do:  vars inputs;  newsfloatarray([1 2 1 4]) -> inputs; ;;; array with 2 rows, 4 cols  0 -> inputs(1,1); 0 -> inputs(1,2); 1 -> inputs(1,3); 1 -> inputs(1,4);  0 -> inputs(2,1); 1 -> inputs(2,2); 0 -> inputs(2,3); 1 -> inputs(2,4); Of course, you may be able to find neater ways to fill this array with these patterns, and for most applications you will wish to write a specialised program to generate the arrays, or use a higher-level interface. 3.4 Converting the input array to MLP format --------------------------------------------- To provide flexibility for other formats, the array just created cannot be used directly with the net, but must first be incorporated into an MLP data record. This is done with a call to mlp_makedata, thus:  vars input_rec;  mlp_makedata(inputs) -> input_rec; Note that a new copy of the array has not been made. It was simply incorporated into the record, because having been created with newsfloatarray it was already of the right type. 3.5 Creating targets --------------------- The process for creating the targets for supervised learning is similar. The target array needs one column for each example, and must have one row for each output unit. For the XOR test the targets go 0, 1, 1, 0 for the four input patterns respectively, and so can be set up as follows:  vars targets;  newsfloatarray([1 1 1 4]) -> targets; ;;; array with 1 row, 4 cols  0->targets(1,1); 1->targets(1,2); 1->targets(1,3); 0->targets(1,4); The array now needs to be incorporated into an MLP record. This will also include information about how to train the net. It is necessary to specify how many presentations of the patterns to give, and also whether to select examples randomly from the set of patterns, or whether to cycle through them. Suppose we wish to train for 2000 presentations of individual patterns and to select patterns randomly. This is set up as follows:  vars target_rec;  mlp_makedata(targets, 2000, true) -> target_rec; The number of iterations and whether to apply random selection can be changed later if necessary. 3.6 Training the net --------------------- One procedure, mlp_learn, is used to train the network. As it updates values inside the network structure, it is called as an updater. It needs the inputs, the targets, and the net, and is used like this:  (input_rec, target_rec) -> mlp_learn(net) -> (,); Loading this line will cause the network to be trained on the examples above, updating its weights and biases. Loading it again will do another 2000 presentations, and so on. The procedure also returns two results. As these are not important now, they are assigned to an empty expression (i.e. they are ignored). There is a short cut to creating and training new networks with one procedure call - see below. 3.7 Testing the net -------------------- We now wish to see how the trained network behaves. We can apply it to each of the set of input patterns, using mlp_response. This creates and returns a data record containing the responses of the net to each pattern.  vars output_rec;  mlp_response(input_rec, net) -> output_rec; Outputs is one of the special data records, and as such the results are not immediately accessible. We need, in effect, to reverse mlp_makedata. This is done with mlpdata_data (an odd name, but it is one of a family of mlpdata_ routines).  vars outputs;  mlpdata_data(output_rec) -> outputs; Now outputs is just an array with the same layout as the targets (one row for the single output unit and 4 columns for the 4 examples):  outputs => ;;; this prints ** and printing its contents gives the results of applying the trained net to each of the 4 input patterns:  outputs(1, 1) => ;;; normally you would use a loop for this  outputs(1, 2) =>  outputs(1, 3) =>  outputs(1, 4) => which when this file was produced gave   ** 0.032794 ;;; see below if your results look   ** 0.960624 ;;; different!   ** 0.97033   ** 0.028014 We see that the network has learnt the rule on the occasion this teach file was produced - the outputs correspond closely to the targets. If you try running these examples yourself (as you should), you will find that the results are different each time, as the random initialisation of the weights affects the success of the learning. In particular, the net will quite often get stuck in a local minimum of the error, and the result will not look at all like the targets. It is well worth creating and training a network several times over to see how the behaviour varies. It gets stuck less often if you use -1 instead of 0 as the "off" value on the inputs, although this cannot be produced as an output using the default activation function. 3.8 Printing the net --------------------- The procedure mlp_printweights allows you to inspect the biases and weights of the network:  mlp_printweights(net); which printed   WEIGHTS   bias 1 2   Level 2   unit 1: 3.75 -8.00 8.20     Level 1   unit 1: 2.54 5.33 -5.13   unit 2: -3.37 6.16 -6.41 This shows the network trained when this file was created. Your network will have different values. Level 1 is the hidden layer, and level 2 is the output layer. The value -8.00, for example, is the weight from unit 1 in the hidden layer to the output unit, whilst -5.13 is the weight from the second input to the first hidden unit. Incidentally, we can easily see how this particular net has learnt to do the task. If we call the inputs A and B, then in the hidden layer unit 1 is biased on, and turns off for A=0 and B=1, and for no other combination of inputs, so it implements (A or not B). Hidden unit 2 is biased off, and comes on only for A=1 and B=0, that is (A and not B). The output unit is biased on, and turns off only if hidden unit 1 is on and hidden unit 2 is off, that is it does (not H1 or H2). Thus overall the net implements   output = not (A or not B) or (A and not B) which is just a formula for XOR. On other training runs different schemes might be found, for example   output = (A and not B) or (B and not A) It is also possible to look at the current state of the activations using mlp_printactivs. This must be used after a call to mlp_response, and not right after a call to mlp_learn, as at that point the activations are replaced by error values. 3.9 Untraining the net ----------------------- To start again, you can simply create a new network with mlp_makenet - as the inputs and targets have not been updated, you can re-use them. If you do not want to change the architecture, you can reset the existing net to a new random state with a call like this  2.0 -> mlp_resetnet(net); The value assigned to this updater sets the range of the random weights and biases, like the third argument in mlp_makenet. If you now return to the call to mlp_response above, and look at the results, you will find that the net no longer works, and you need to call mlp_learn again to train it. -------------------------------- 4 Some further basic facilities -------------------------------- 4.1 Updating and non-updating procedures ----------------------------------------- We have seen an updating version of mlp_learn. There is a non-updating version, which creates a network from scratch and then trains it. It needs all the information given to mlp_makenet, so to create and train a net as we did above, the following call will work:  mlp_learn(input_rec, target_rec, {2 1}, 2.0, 1.4, 0.6) -> (net,,); Again, there are a couple of extra results, which we are ignoring. The variable net receives the trained net. More training on this net could be done using the updater of mlp_learn, described above. We have seen a non-updating version of mlp_response. There is a version which updates an existing record to avoid creating new arrays and records. To demonstrate it at this point, we could re-use the output record returned from the earlier call to mlp_response. However, we start from scratch and make a new structure, by creating an array the right size and using mlp_makedata.  newsfloatarray([1 1 1 4]) -> outputs;  mlp_makedata(outputs) -> output_rec; Normally we would do this once, then re-use the output record many times. Here is the updater of mlp_response in use:  (input_rec, net) -> mlp_response(output_rec); and the outputs can then be inspected. There is no need to call mlpdata_data to get the output array, because we already have a reference to it, and __________output_rec has a pointer to it, not a copy of it.  outputs(1,1) => ;;; which prints, for example, ** 0.051058 etc. 4.2 The error results ---------------------- The two mysterious results returned by mlp_learn, and so far thrown away, are actually the mean error over the training session, and its variance. These can be useful in assessing the progress of training. The error is defined as half the sum of the squares of the differences between the net's outputs and the targets, averaged over all the examples presented during the call to mlp_learn. (For a net with a single output unit, of course, there is just one output-target difference, and so the error is just half the average square of this.) We can see the use of these variables if we train the small network as before, but this time we do the training in smaller bursts, looking at the error each time. First, the number of presentations per call to mlp_learn needs to be reduced, say to 100. This is done like this:  100 -> target_rec.mlpdata_niter; Now create a new network, and train it with 100 examples at a time, printing out the error and its variance each time.  vars err errvar;  mlp_makenet(2, {2 1}, 2.0, 1.4, 0.6) -> net; ;;; new network   repeat 12 times  (input_rec, target_rec) -> mlp_learn(net) -> (err,errvar);  [error is ^err, with variance ^errvar] =>  endrepeat; which prints for example   ** [error is 0.140667 , with variance 0.008352]   ** [error is 0.130057 , with variance 0.008421]   ** [error is 0.090799 , with variance 0.013267]   ** [error is 0.104345 , with variance 0.014969]   ** [error is 0.108968 , with variance 0.012709]   ** [error is 0.072952 , with variance 0.014788]   ** [error is 0.078096 , with variance 0.010277]   ** [error is 0.043794 , with variance 0.004719]   ** [error is 0.019029 , with variance 0.000763]   ** [error is 0.006743 , with variance 0.000031]   ** [error is 0.003847 , with variance 0.000003]   ** [error is 0.002659 , with variance 0.000002] Note how the mean error and the error variance decrease (mostly) as training proceeds over the 2000 trials. This can be useful in deciding when to stop training and how to set ___eta and _____alpha. 4.3 Changing net and data parameters ------------------------------------- We have already seen how to change the number of presentations built into the targets data structure, with the call  100 -> target_rec.mlpdata_niter; The decision as to whether to select examples at random or to cycle through them can also be changed. For example, to switch to cyclic sampling, do  false -> target_rec.mlpdata_ransel; You can look at the current values with the same procedures, e.g.  target_rec.mlpdata_niter => ;;; which prints ** 100 The network's _____alpha and ___eta parameters can likewise be inspected and changed.  net.mlp_eta => ;;; which prints ** 1.4  net.mlp_alpha => ;;; which prints ** 0.6  0.2 -> net.mlp_eta;  0.9 -> net.mlp_alpha; or if you prefer to use the effective learning rate  0.9 -> net.mlp_alpha;  2.0*(1-net.mlp_alpha) -> net.mlp_eta; 4.4 Batch learning ------------------- So far, continuous training has been used - that is, the weights have been updated immediately after every example has been presented to the net. Batch training, in which the weight adjustments from a set of examples are combined before being applied, can be done by modifying the target record creation, thus:  mlp_makedata(targets, {500 ^true}, false) -> target_rec; The new argument "{500 ^true}" means that the net should carry out 500 iterations through the whole training set (in this case, the 4 examples). The weights will be updated once on each iteration, after averaging the weight changes from the different examples. Since all 4 examples should be included in each batch, the final argument is to indicate that cyclical rather than random selection should be used. As the averaged changes are more reliable, we can increase ___eta when the net is created, which we will do along with training it:  mlp_learn(input_rec, target_rec, {2 1}, 2.0, 12.0, 0.0) -> (net,,); and the results are obtained as before with  (input_rec, net) -> mlp_response(output_rec);  arrayvector(outputs) => ;;; quick alternative to printing elements which when run twice during creation of this file gave   **   ** where the second result shows the network getting stuck in the wrong place. 4.5 Access to weights and biases --------------------------------- You can access and update specific weights and biases using mlp_weight. This requires you to specify the level of the net, with level 1 referring to the weights from the inputs to the lowest hidden layer, the unit where the connection starts, and the unit where the connection ends. Units are numbered starting at 1 within each layer. So to get the weight from input 1 to hidden unit 2, you would do  mlp_weight(1, 1, 2, net) => and for the weight from hidden unit 2 to the output unit, you would do  mlp_weight(2, 2, 1, net) => To update these weights, simply use the updater, e.g.  -7 -> mlp_weight(2, 2, 1, net); Biases are specified by giving the unit from which the signal is coming as 0 or . So this gets the bias for the first unit in the hidden layer:  mlp_weight(1, false, 1, net) => For another method of accessing weights and biases, which is more efficient if you need to access or update many at one time, see the section on access to weight and bias arrays below. 4.6 Training and testing on single examples -------------------------------------------- If you wish to generate patterns one at a time to test or train a network, you can do so simply by applying mlp_makedata to a 1-D array. This will give a data record containing a single example, rather than a set as above. If you update the original array, the contents of the record will also be updated, provided that the array was created using *newsfloatarray or a similar procedure. For example, to test the net developed above on two particular examples, you could do this:  vars input_array, output_array;  newsfloatarray([1 2]) -> input_array;  newsfloatarray([1 1]) -> output_array;  mlp_makedata(input_array) -> input_rec;  mlp_makedata(output_array) -> output_rec;   ;;; set up a pattern  0 -> input_array(1); 1 -> input_array(2);  (input_rec, net) -> mlp_response(output_rec);  output_array(1) =>   ;;; and another  1 -> input_array(1); 1 -> input_array(2);  (input_rec, net) -> mlp_response(output_rec);  output_array(1) => Note that whilst the arrays are being updated and accessed, it is the data records are passed to mlp_response. This works because the data records contain pointers to the arrays. Clearly care is needed in keeping track of the variables if you use this technique. It is generally much less efficient to generate individual patterns and call mlp_response or mlp_learn for each one, than it is to generate a whole set of patterns at once and call the routines to process the whole set, as was done in the earlier sections. ---------------------------------------- 5 Examples using different data formats ---------------------------------------- 5.1 Time series data --------------------- The mlp_makedata procedure allows data to be passed to a net in a greater variety of ways than the format used above. For full details see ____HELP * ___MLP. This section gives one example, and the next section another. Suppose you have a time series, and you wish to train a network to predict each point in it from the _N preceding points. If you used the 2-D format above, each example would contain much the same data as the preceding one, only shifted slightly. This would use a lot of memory for a large series. You can avoid this with a different way of using mlp_makedata, in which you provide a 1-D array, and some further information about how it should be handled. As an example, we will first generate a time series consisting of a sine wave plus random noise (not very interesting, but it will do).  vars data;  false -> popradians; ;;; just to be sure  newsfloatarray([1 1000],  procedure(i);  0.45 + 0.25 * sin(3*i) + random(0.1)  endprocedure) -> data; If you are operating in an X-windows environment you can use *rc_graphplot to look at this as follows:  uses rc_graphplot  uses rci_show  1 -> rci_show_scale;  rci_show([1 500 1 300]) -> rc_window;  rc_graphplot(1, 1, 1000, 't', data, 'f(t)') -> rcg_usr_reg; Click on the graph if you want to get rid of the graphics window. Now suppose we want to predict each point from the preceding 10 points. The first chunk of input data will be points 1-10, the next will be points 2-11, and the last chunk will be points 990-999 in the array. We set up the inputs record by specifying the number of inputs, the start point of the first chunk, the step to move to get to the next data chunk, and the start point of the last chunk.  mlp_makedata(data, 10, 1,1,990) -> input_rec; Note the order of the last 3 arguments - it's like the from...by...to ...do sequence in a Pop-11 numerical for loop. For the targets, we use the ____same data array, but now there is only one unit. The first point we want to predict is the 11th and the last is the 1000th.  mlp_makedata(data, 1, 11,1,1000) -> target_rec; And as this example is about data formats and not networks, we will just have the simplest possible network, which has just one unit above the inputs. (Putting in a hidden layer would make no difference to the rest of the example, but the network has to have 10 inputs and 1 output.) We set the weight range to -0.005 to +0.005, ___eta to 0.01 and _____alpha to 0.9.  mlp_makenet(10, {1}, 0.01, 0.01, 0.9) -> net; And train it!  2000 -> target_rec.mlpdata_niter;  true -> target_rec.mlpdata_ransel;   repeat 8 times  (input_rec, target_rec) -> mlp_learn(net) -> (err, errvar);  err =>  endrepeat; which prints something like   ** 0.007553   ** 0.002451   ** 0.001488   ** 0.00125   ** 0.001173   ** 0.00101   ** 0.000977   ** 0.000936 The error has decreased satisfactorily. If we take the square root of the last value, we obtain the root mean square error as about 0.03. Since random noise uniformly distributed between 0 and 0.1 was added, and this has a standard deviation of about 0.03 and is clearly unpredictable, the result is about as good as we can expect. We can see the results if we create a suitable data record and put the responses into it.  newsfloatarray([1 1000]) -> outputs;  mlp_makedata(outputs, 1,11,1,1000) -> output_rec; Note that the arguments are as for the targets. Now fill it.  (input_rec, net) -> mlp_response(output_rec); If you plot the results with *rc_graphplot, you will get a smoothed version of the original data, as expected:  false -> rcg_newgraph;  'red' -> rc_window("foreground");  rc_graphplot(1, 1, 1000, false, outputs, false) -> ; An interesting exercise is to get this "network" to predict a point some distance ahead of the input region, by changing the start points in the calls to mlp_makedata. The inputs to the net do not have to be sequences of consecutive points - they could be every second point, for instance. For how to set this and more complex inputs and targets up, see ____HELP * ___MLP. 5.2 Image data --------------- The second example of more complex data is in 2-D pattern recognition. The problem is to train a network to recognise a fragment of a straight lines in a binary image, given a 3x3 patch of the image as input. In other words, the network will have 9 input units, which we imagine as laid out in a square, and it is to respond with 1 whenever it is presented with one of these four patterns:   010 000 100 001   010 111 010 010   010 000 001 100 and with 0 for any other pattern. We will train the net by presenting it with a binary image as input, and another image in which the points corresponding to the patterns above are marked as target. There will be considerable preliminaries setting all this up, before we can train the network, so you can go quickly through the procedures that follow, as for any real application you will have your own way of establishing the input and target arrays. For the inputs, we create an array with a pattern that happens to contain a lot of lines, so there are plenty of positive and negative examples for the net to learn from. We throw in a little random noise for variety, and load a useful library first.  uses boundslist_utils   vars arrsize = 70;  newsfloatarray([1 ^arrsize 1 ^arrsize],  procedure(x,y) -> result;  if x mod 8 == 0 or  y mod 8 == 0 or  (x+y) mod 8 == 0 or  (x-y) mod 8 == 0 then  1.0  else  0.0  endif -> result;  if random(1.0) < 0.05 then 1.0 - result -> result endif  endprocedure) -> inputs; If you are working in an X-windows environment you can inspect this image with:  3 -> rci_show_scale; ;;; see ____HELP *rci_show  rci_show(inputs) -> ; To set up the targets, we start from a mechanical way of doing the task. The following procedure does the work, taking as arguments a 2-D array and the coordinates of a position in it, and returning 1 if the position is at the centre of one of the patterns above, and 0 otherwise. We allow the data arrays to be floating point rather than integer arrays, and round in case the values are not exact.  define isonline(x, y, arr) -> result;  lconstant  patts = [{0 0 0 1 1 1 0 0 0} ;;; each pattern on one line  {0 1 0 0 1 0 0 1 0}  {1 0 0 0 1 0 0 0 1}  {0 0 1 0 1 0 1 0 0}],  arrpatt = initv(9),  reglist = initl(4);  (x-1, x+1, y-1, y+1) -> explode(reglist);  lvars d;  for d in_array arr in_region reglist do round(d) endfor  -> explode(arrpatt);  if member(arrpatt, patts) then 1 else 0 endif -> result  enddefine; To generate a targets array, we simply apply this procedure (which our net is going to try to learn) to each position of the input array. It is convenient to make the target array slightly smaller than the input array, as the 3x3 region of interest means that we cannot usefully define a target adjacent to the edge of the input array. This is easily done with an adjustment to the boundslist. A closure of the answer procedure given above provides initialisation.  newsfloatarray(region_expand(inputs, -1), isonline(%inputs%) ) -> targets; This can be inspected with  rci_show(targets) -> ; While we're creating arrays, we can make an output array too.  newsfloatarray(boundslist(targets)) -> outputs; Now we need to convert the arrays into records using mlp_makedata. This is really the point of the example; everything so far has just been setting up some arrays to work with, and that could have been done in a large variety of ways. To do the conversion, we need to specify what part of the array the net is to be presented with on each example. This is done using lists of vector offsets, relative to some arbitrary point in the array. For the 3x3 region we want to use for each input, the mask is defined like this:  vars inmask;  [ {-1 -1} { 0 -1} { 1 -1}  {-1 0} { 0 0} { 1 0}  {-1 1} { 0 1} { 1 1} ] -> inmask; Each vector gives an offset relative to the centre of the region of interest. The corresponding mask for the targets is simpler, as there is only one output from the net in this case, and it is at the centre of the region of interest.  vars outmask;  [ {0 0} ] -> outmask; Now we can pass these masks to mlp_makedata to build the records. We will also set the number of iterations and choose random selection, by passing extra arguments at this stage.  mlp_makedata(inputs, inmask) -> input_rec;  mlp_makedata(targets, outmask, 10000, true) -> target_rec;  mlp_makedata(outputs, outmask) -> output_rec; And at last, a network to work on it all. It has to have 9 inputs and one output; we will try using one hidden layer of 3 units.  mlp_makenet(9, {3 1}, 0.1, 0.25, 0.9) -> net; And train it:  repeat 5 times  (input_rec, target_rec) -> mlp_learn(net) -> (err, errvar);  err =>  endrepeat; and check it out:  (input_rec, net) -> mlp_response(output_rec); Did it work? A simple visual check suggests the trained net works correctly on the training data. There is no need to call mlpdata_data to get the output array, because we already have a reference to it.  rci_show(outputs) -> ; We can test it further with a simple procedure, which just counts the number of times the net was on the correct side of 0.5 (a rather weak criterion).  define binary_check(out_arr, targ_arr) -> no_correct;  0 -> no_correct;  lvars output, target;  for output, target in_array out_arr, targ_arr do  if (output-0.5) * (target-0.5) > 0 then  no_correct + 1 -> no_correct  endif  endfor  enddefine;   binary_check(outputs, targets) => which on the occasion this file was made printed   ** 4606 Since there are 4624 possible targets, the net didn't do too badly. Training it some more might make it perfect. You could try different numbers of hidden units, and different ___eta and _____alpha to try to speed it up. Note that the input mask can define any set of offsets into the image, so you are not restricted to looking at square regions. And the method is not restricted to 2-D arrays - it generalises naturally to 1-D and to higher dimensions. All you have to worry about are the offsets, and the boundslists of the arrays. More complex sampling patterns are possible - see ____HELP * ___MLP. 5.3 Saving space with large data sets -------------------------------------- By default, mlp_makedata builds special index arrays that point to all the starting points of patterns in the data array. For a densely sampled array like the one above, the index array will be almost as large as the data array itself. If this is a problem, you can avoid this behaviour, at a small cost in speed, by changing the variable mlp_fullindex to :  false -> mlp_fullindex; You need to do this before calling mlp_makedata, and it must not be changed between creating the input, target and output records for a set of patterns. However, a given network does not mind whether the data it is passed for training or responding has been created with mlp_fullindex true or false - indeed the format can be changed in mid-training, as long as the targets are consistent with the inputs. ------------------ 6 More facilities ------------------ 6.1 Saving and restoring nets on disc -------------------------------------- Nets and data records can be saved and restored using the * _________DATAINOUT library. 6.2 Different transfer functions --------------------------------- It is possible to specify alternative transfer functions when the net is set up. The default is the "logistic" function 1/(1+exp(-_x)). For a list of the functions currently available, load the following line:  appproperty(mlp_transfuncs, erase <> npr); which will print their names. The transfer function can be specified in the list of units passed to mlp_makenet, either by level or by individual unit. The following example sets up a network with a hidden layer of 4 logistic function units, and an output layer of 2 linear units (which simply pass out the dot product of their weights and inputs):  mlp_makenet(9, {{4 logistic} {2 identity}}, 0.1, 0.25, 0.9) -> net; To specify the functions by individual unit, the word for a level is replaced by a vector equal in length to the number of units for that level, with a word for each unit. For instance to have 2 linear and 2 logistic units in the hidden layer, we could do:  mlp_makenet(9, {{4 {logistic logistic identity identity}}  {2 identity}}, 0.1, 0.25, 0.9) -> net; 6.3 Clamping weights and biases -------------------------------- Weights and biases can be protected from training. To do this, update the network by assigning to mlp_clamp applied to a particular weight, specified as in mlp_weight above. For example, to clamp the bias for the third unit in the lowest hidden layer, do:  true -> mlp_clamp(1, false, 3, net); To unclamp it again, assign false to this:  false -> mlp_clamp(1, false, 3, net); You can clamp and unclamp as many weights as you wish at any stage of training. This may be particularly useful when weights have been set explicitly using mlp_weight. For instance, you can effectively remove a connection by setting a weight to zero and clamping it. See the next section for another way of clamping weights and biases. 6.4 Accessing and updating weight and bias arrays -------------------------------------------------- It may sometimes be more efficient to get hold of the array which contains the weights or biases for a whole layer. The procedures mlp_weights and mlp_biases return vectors containing such arrays. Each entry in the vector relates to one level of the network. Thus for the last net created,  net.mlp_weights => shows that the weights vector contains 2-D arrays   ** > and  net.mlp_biases => shows that the biases vector contains 1-D arrays   ** > The first entry in the weights vector is an array of the weights from the input units to the first hidden layer. Within this array, the first index refers to position in the input layer, the second to position in the hidden layer. So  (net.mlp_weights)(1)(1,2) = mlp_weight(1, 1, 2, net) => prints   ** The second array in the vector contains the weights from the hidden layer to the output layer. The bias list is similar, except that it contains 1-D arrays. You can update the weights in the net by updating elements of the arrays in the vector. You can do this even if you have assigned the array to another variable, because assigning an array to a variable does not copy it. However, there are no updaters for mlp_weights and mlp_biases themselves, and you cannot assign arrays to the elements of the weights and biases vectors. You should not try referring to the arrayvector of the weights and biases arrays (unless you are careful to check the arrayvector bounds). It will not be what you probably expect, since the arrays that appear in the vectors all share a single *arrayvector, which is passed out to external procedures. If you use mlp_weights and mlp_biases rather than mlp_weight to access the net's parameters, you may wish also to access the clamping control arrays explicitly in the same way. You can do this with the procedures mlp_etas and mlp_etbs, which return vectors of weight and bias learning rates in the same format as the weight and bias lists themselves. If you assign a negative value to an element of an array in this vector, the corresponding weight or bias will be clamped. To unclamp it, assign the current value of ___eta to the array element. If you use this method, then after clamping a weight or bias you should assign to mlp_clamped applied to the net (unless you know it is already true); if you unclamp all the weights and biases assign to this field; and if you unclamp a weight or bias but don't know if any others are clamped or not, assign "maybe" to this field, e.g.  "maybe" -> net.mlp_clamped; If you do not do this, updating ___eta using mlp_eta may result in a change to the clamped status of the weights. You can assign individual learning rates to weights and biases by assigning positive values to the elements of the arrays in the vectors returned by mlp_etas and mlp_etbs, but using this flexibility to good effect is well beyond the scope of this teach file. At the time of writing, the momentum _____alpha must be set globally for the network. 6.5 Back propagation between networks -------------------------------------- Occasionally, you may wish to stack one network above another, and train the lower one on the basis of errors back propagated through the upper one. This is useful, for example, if you wish to set up an architecture where two networks feed into a single upper layer, or vice versa. The routine mlp_target allows you to do this, though with some cost in speed and an increase in programming complexity. mlp_target updates an input vector with new values which would have produced a lower error at the output of the net. If the input to the higher net is the output from the lower net, then the updated input provides a suitable target for training the lower net. mlp_target must be called after mlp_learn, and never right after mlp_response. As an example, a network which is split into separate lower and upper parts will be trained on the XOR problem. This is a complex example and uses some slightly obscure constructs to set up the data records concisely. The crucial line is "net2 -> mlp_target(intermediate)", where the intermediate data record is updated to provide a target for the lower network.  /* Set up individual data records for each pattern. Note number of  iterations for each target is 1 */   vars input_recs, intermediate, output_rec, target_recs;  {%  mlp_makedata(newanyarray([1 2], {0 0})),  mlp_makedata(newanyarray([1 2], {0 1})),  mlp_makedata(newanyarray([1 2], {1 0})),  mlp_makedata(newanyarray([1 2], {1 1}))  %} -> input_recs;  {%  mlp_makedata(newanyarray([1 1], {0}), 1, false),  mlp_makedata(newanyarray([1 1], {1}), 1, false),  mlp_makedata(newanyarray([1 1], {1}), 1, false),  mlp_makedata(newanyarray([1 1], {0}), 1, false)  %} -> target_recs;   /* Set up a single data record for the middle output/input.  Note no. of iterations is false - this gives a backward pass  only */   mlp_makedata(newsfloatarray([1 2]), false, false) -> intermediate;  mlp_makedata(newsfloatarray([1 1])) -> output_rec;   /* Two networks */   vars net1 net2;  mlp_makenet(2, {2}, 2.0, 1.4, 0.6) -> net1; ;;; lower net  mlp_makenet(2, {1}, 1.0, 1.4, 0.6) -> net2; ;;; upper net   /* Train one example at a time */   vars eg; ;;; the no of the current pattern   repeat 2000 times  random(4) -> eg;  ;;; forward through net1  (input_recs(eg), net1) -> mlp_response(intermediate);  ;;; forward and back through net2  (intermediate, target_recs(eg)) -> mlp_learn(net2) -> (,);  ;;; provide a target for net1  net2 -> mlp_target(intermediate);  ;;; back through net1  (input_recs(eg), intermediate) -> mlp_learn(net1) -> (,);  endrepeat;   /* Check the results */   for eg from 1 to 4 do  (input_recs(eg), net1) -> mlp_response(intermediate);  (intermediate, net2) -> mlp_response(output_rec);  (output_rec.mlpdata_data)(1) =>  endfor; This is, of course, slower than training the combined net with a single call to mlp_learn, so the method should only be used when the architecture demands it. At present, propagation across networks assumes continuous rather than batch training. 6.6 Calculation precision -------------------------- The current version uses single precision floating point arithmetic. A change to double precision requires editing and recompilation of the C sources, as well as modification of the Pop-11 code. 6.7 Repeatable training runs ----------------------------- The routines all generate their random numbers from an externally loaded pseudo-random number generator (see ___LIB * _____MLP.C for details). This is initialised from varying system variables such as the real-time clock after loading ___LIB ___MLP. You can obtain the current state of the generator by accessing mlp_random_seed, an active variable which returns 3 integers, and you can set the generator to a given state by assigning 3 integers to the same variable. (Note that the generator runs independently of the one used by *array_random, and they have separate seeds.) This example creates a net with random weights, then a second net with new random weights, then recreates the first net by resetting the state of the random number generator.  vars (s1, s2, s3) = mlp_random_seed; ;;; save current state  mlp_makenet(2, {2 1}, 0.1, 0.25, 0.9) -> net;  mlp_printweights(net); ;;; random weights  mlp_makenet(2, {2 1}, 0.1, 0.25, 0.9) -> net;  mlp_printweights(net); ;;; different weights   (s1, s2, s3) -> mlp_random_seed; ;;; restore state  mlp_makenet(2, {2 1}, 0.1, 0.25, 0.9) -> net;  mlp_printweights(net); ;;; original net recreated If the creation of each net was followed by some training, then this would still result in identical results for the first and final nets, since ___all random variability is obtained from the same generator. Although assigning 3 values to mlp_random_seed will in fact set the seed using values from system variables such as the real-time clock, do ___not do this to try to make the tests "more random" - it will actually make the results less well distributed. The only sensible reason to access or update mlp_random_seed is to get repeatable results. --- ____________________$popvision/teach/mlp --- _________Copyright __________University __of ______Sussex _____1998. ___All ______rights _________reserved.