Several problems exist with the original back-propagation algorithm:
Fixed learning rate
Big learning rates are useful in the early training cycles for fast convergence but result in 'thrashing' or oscillation later on. Conversely, small learning rates converge slowly but are potentially capable of reaching better optima. It has therefore been suggested that a way of varying learning rate during training would be beneficial.
Method of estimating error function
Standard back-propagation works by making estimates of the partial first derivative of the overall error with respect to each weight. Although this has been shown to reach good minima, it is not an ideal method of estimating the local shape of the error function, nor is it a fast way of facilitating convergence in the shortest amount of time. In this respect, several other techniques have been used to get a better idea of the local shape of the error function. Particularly, use of the second derivative has been shown as a way of improving on the original technique (Fahlaman88). Here we will look at two other techniques:
1. Resilient Backpropagation, or 'Rprop' (Riedmiller93), which uses information about the temporal direction of the error gradient of each weight. Each update rule for each weight is therefore allowed to evolved over the course of training.
2. Another more intuitive stratgey will be used based on that presented by Fernando and Almeida (1990) which looks at previous changes in the errors. Basically, if the error is seen to decrease from the last two measurements, the learning rate is increase by 0.05. Conversely, if the direction of the gradient of the mean-squared error changes (fluctuating behaviour), then the learning rate is decreased by 0.05.
| Method | Run |
||||||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Mean | S.D. | ||
| 1. Standard Backprop | MSE (Training) | 0.3898 | 0.2734 | 0.1506 | 0.1090 | 0.1059 | 0.0497 | 0.0456 | 0.0398 | 0.0364 | 0.0404 | 0.0525 | 0.1191 |
| 2. R-prop | MSE (Training) | 0.0411 | 0.0674 | 0.0542 | 0.0486 | 0.0527 | 0.0571 | 0.0467 | 0.0464 | 0.0496 | 0.0809 | 0.0545 | 0.0117 |
| 3. Custom (Fernando90) | MSE (Training) | 0.05941 | 0.0440 | 0.0436 | 0.0409 | 0.0649 | 0.0314 | 0.1471 | 0.0531 | 0.0477 | 0.0763 | 0.0608 | 0.0329 |
Mean MSE graph with standard deviation ranges for each learning method

Both methods 1 and 2 appear more successful at speeding up convergence over a fixed time period than standard backpropagation. Statistically this is not conclusive though as a t-test analysis shows that there is never less than 9% probability that any two sets of results come from the same distribution. The graph plot shows that although methods 2 and 3 result in nearly the same good performance each time, the mean is still only similar to that of standard backprop. The results could possibly be improved if more tests were carried out or if the network was trained for longer under each of the schemes.
(Roberts88) : Jacobs, Robert A. : Increased Rates of Convergence Through Learning Rate Adaption,
Neural Networks, Vol. 1, page 295-307,1988
(Fahlaman88) : Scott E. Fahlman. : An Empirical Study of Learning Speed in Back-Propagation Networks, September 1988 CMU-CS-88-162
(Riedmiller93) : Martin Riedmiller and Heinrich Braun : A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP algorithm. Proc. of the {IEEE} Intl. Conf. on Neural Networks. Pages 586-591, year = 1993.
(Fernando90) : Silva, Fernando M. and Almeida, Luis B.: Speeding up Backpropagation, Advanced
Neural Computers, Eckmiller R. (Editor), page 151-158, 1990