Standard backpropagation: Variation of Momentum Rate

Method

Use different momentum rates in experiments performed several times in order to ascertain whether momentum speeds up training. A small learning rate (0.03) was chosen for the experiments; one that would normally not result in much convergence after 100 cycles. The use of momentum is hypothesised to speed up convergence.

Results

Base (null hypothesis) experiment (no momentum term)

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.3490 0.2107 0.3518 0.2074 0.2480 0.3668 0.2612 0.3411 0.2409 0.3539 0.3271 0.0649

Momentum rate : 0.01

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.0994 0.1090 0.1888 0.1181 0.1208 0.1058 0.3322 0.0996 0.1068 0.1544 0.1434 0.0720

Momentum rate: 0.02

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.1507 0.3321 0.1313 0.1014 0.1424 0.1179 0.3321 0.0838 0.1193 0.1192 0.1630 0.0911

Momentum rate: 0.03

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.1470 0.1209 0.1120 0.1156 0.1622 0.1300 0.1011 0.1108 0.099 0.0946 0.1193 0.0216

Momentum rate: 0.04

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.1640 0.0767 0.0911 0.0970 0.1118 0.1653 0.0843 0.1062 0.0718 0.2947 0.1262 0.0676

Momentum rate: 0.05

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.1221 0.2592 0.1240 0.1152 0.1300 0.0994 0.1170 0.1363 0.1050 0.1202 0.1427 0.0456

Momentum rate: 0.1

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.1860 0.1247 0.0712 0.1089 0.0838 0.0893 0.1104 0.1338 0.1270 0.1593 0.1194 0.0349

Momentum rate: 0.2

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.0781 0.1108 0.1051 0.1590 0.0946 0.0901 0.1334 0.0972 0.0994 0.1066 0.1074 0.0232

Momentum rate: 0.3

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.1015 0.1054 0.0984 0.3368 0.0827 0.0964 0.0633 0.0730 0.0933 0.0951 0.1145 0.0791

Momentum rate: 0.4

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.0609 0.2346 0.0894 0.1314 0.1213 0.0649 0.1129 0.1302 0.1023 0.1027 0.1150 0.04742

Momentum rate: 0.5

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.0680 0.1019 0.1035 0.1190 0.0925 0.0894 0.1473 0.0970 0.1168 0.1548 0.1090 0.0235

Momentum rate: 0.6

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.0741 0.1312 0.1798 0.1232 0.0877 0.0754 0.0713 0.0767 0.1159 0.2331 0.1168 0.0535

Momentum rate: 0.7

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.0834 0.2212 0.2166 0.1010 0.0971 0.1028 0.2451 0.3718 0.09540 0.2311 0.1765 0.0955

Momentum rate: 0.8

Experiment Number 1 2 3 4 5 6 7 8 9 10 Mean Standard Deviation
Mean Squared Error (M.S.E) 0.2985 0.3345 0.0969 0.0923 0.1272 0.7081 0.3726 0.1343 0.1752 0.1633 0.2502 0.1898

Summary

Momentum
  None 0.01 0.02 0.03 0.04 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Average MSE 0.3271 0.1434 0.1630 0.1193 0.1262 0.1427 0.1194 0.1074 0.1145 0.1150 0.1090 0.1168 0.1765 0.2502
Standard Deviation of MSE 0.0649 0.0720 0.0911 0.0216 0.0676 0.0456 0.0349 0.0232 0.0791 0.04742 0.0235 0.0535 0.0955 0.1898

Mean MSE (training set) graph for each momentum value with standard deviation ranges.

Conclusion

Clearly momentum speeds up convergence of training a feed-forward neural network trained using backpropagation. It prevents large oscillations from occuring which can lead to slow convergence when the error gradient is steep. Several momentum rates appear favourable, particularly 0.03 and 0.2. T-tests show there is no significant difference between the results obtained in both cases(at 5%). More tests could help produce a more decisive result.

Next: Hidden neuron number

Home