Standard backpropagation: Variation of Momentum Rate
Use different momentum rates in experiments performed several times in order to ascertain whether momentum speeds up training. A small learning rate (0.03) was chosen for the experiments; one that would normally not result in much convergence after 100 cycles. The use of momentum is hypothesised to speed up convergence.
- Learning rate chosen throughout: 0.03
- Momentum rates chosen: 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8
- 4 input units > 4 hidden units > 3 output units.
- 100 cycles.
- each experiment repeated 10 times.
- Training set of 100 instances.
Base (null hypothesis) experiment (no momentum term)
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.3490 |
0.2107 |
0.3518 |
0.2074 |
0.2480 |
0.3668 |
0.2612 |
0.3411 |
0.2409 |
0.3539 |
0.3271 |
0.0649 |
Momentum rate : 0.01
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.0994 |
0.1090 |
0.1888 |
0.1181 |
0.1208 |
0.1058 |
0.3322 |
0.0996 |
0.1068 |
0.1544 |
0.1434 |
0.0720 |
Momentum rate: 0.02
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.1507 |
0.3321 |
0.1313 |
0.1014 |
0.1424 |
0.1179 |
0.3321 |
0.0838 |
0.1193 |
0.1192 |
0.1630 |
0.0911 |
Momentum rate: 0.03
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.1470 |
0.1209 |
0.1120 |
0.1156 |
0.1622 |
0.1300 |
0.1011 |
0.1108 |
0.099 |
0.0946 |
0.1193 |
0.0216 |
Momentum rate: 0.04
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.1640 |
0.0767 |
0.0911 |
0.0970 |
0.1118 |
0.1653 |
0.0843 |
0.1062 |
0.0718 |
0.2947 |
0.1262 |
0.0676 |
Momentum rate: 0.05
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.1221 |
0.2592 |
0.1240 |
0.1152 |
0.1300 |
0.0994 |
0.1170 |
0.1363 |
0.1050 |
0.1202 |
0.1427 |
0.0456 |
Momentum rate: 0.1
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.1860 |
0.1247 |
0.0712 |
0.1089 |
0.0838 |
0.0893 |
0.1104 |
0.1338 |
0.1270 |
0.1593 |
0.1194 |
0.0349 |
Momentum rate: 0.2
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.0781 |
0.1108 |
0.1051 |
0.1590 |
0.0946 |
0.0901 |
0.1334 |
0.0972 |
0.0994 |
0.1066 |
0.1074 |
0.0232 |
Momentum rate: 0.3
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.1015 |
0.1054 |
0.0984 |
0.3368 |
0.0827 |
0.0964 |
0.0633 |
0.0730 |
0.0933 |
0.0951 |
0.1145 |
0.0791 |
Momentum rate: 0.4
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.0609 |
0.2346 |
0.0894 |
0.1314 |
0.1213 |
0.0649 |
0.1129 |
0.1302 |
0.1023 |
0.1027 |
0.1150 |
0.04742 |
Momentum rate: 0.5
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.0680 |
0.1019 |
0.1035 |
0.1190 |
0.0925 |
0.0894 |
0.1473 |
0.0970 |
0.1168 |
0.1548 |
0.1090 |
0.0235 |
Momentum rate: 0.6
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.0741 |
0.1312 |
0.1798 |
0.1232 |
0.0877 |
0.0754 |
0.0713 |
0.0767 |
0.1159 |
0.2331 |
0.1168 |
0.0535 |
Momentum rate: 0.7
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.0834 |
0.2212 |
0.2166 |
0.1010 |
0.0971 |
0.1028 |
0.2451 |
0.3718 |
0.09540 |
0.2311 |
0.1765 |
0.0955 |
Momentum rate: 0.8
| Experiment Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
Mean |
Standard Deviation |
| Mean Squared Error (M.S.E) |
0.2985 |
0.3345 |
0.0969 |
0.0923 |
0.1272 |
0.7081 |
0.3726 |
0.1343 |
0.1752 |
0.1633 |
0.2502 |
0.1898 |
Momentum |
| |
None |
0.01 |
0.02 |
0.03 |
0.04 |
0.05 |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
0.6 |
0.7 |
0.8 |
| Average MSE |
0.3271 |
0.1434 |
0.1630 |
0.1193 |
0.1262 |
0.1427 |
0.1194 |
0.1074 |
0.1145 |
0.1150 |
0.1090 |
0.1168 |
0.1765 |
0.2502 |
| Standard Deviation of MSE |
0.0649 |
0.0720 |
0.0911 |
0.0216 |
0.0676 |
0.0456 |
0.0349 |
0.0232 |
0.0791 |
0.04742 |
0.0235 |
0.0535 |
0.0955 |
0.1898 |
Mean MSE (training set) graph for each momentum value with standard deviation ranges.
Clearly momentum speeds up convergence of training a feed-forward neural network trained using backpropagation. It prevents large oscillations from occuring which can lead to slow convergence when the error gradient is steep. Several momentum rates appear favourable, particularly 0.03 and 0.2. T-tests show there is no significant difference between the results obtained in both cases(at 5%). More tests could help produce a more decisive result.
Next: Hidden neuron number
Home