Standard backpropagation Vs Batch backpropagation

Introduction

The standard method of training a feed-forward neural network is to use online learning. This method take the error estimate from that resulting in presenting just the current pattern. This method introduces noise into the learning process and it is known that an accurate calculation of the error gradient is possible only when all training patterns have been presented. We will look here at this latter method, the 'batch training' method comparing to online training.

Learning rates: 0.01, 0.1, 0.2

Learning rate: 0.01

 

Learning rate: 0.1

Learning rate: 0.2

Conclusion

N.B. No statistics were calculated as the results are qualitatively clear just from the training error graphs.

Batch mode backpropagation takes a much longer time to converge as can be seen from the fact that in each of the graphs the top five error curves belong to the runs of batch mode back prop. This is because it takes the total training error over all the patterns into account whereas standard backprop makes estimates of the error based on individual error training patterns. This latter method results in faster convergence in most cases, with batch mode it is very smooth and slow. Interestingly, note the case where the learning rate is 0.2: In standard mode, once the error plateaus to a good minimum there is considerable 'thrashing', which sees the error rate rise and fall. This tells us that the learning rate is far too big at this point, or that we have trained too long. A strategy for varying the learning rate would be useful here if we wish to seek global minima. Either that, or simply stop training when the error rate plateaus at around 150 cycles.

Next: Back-propagation with momentum

Home