A range of learning rates were looked at of roughly uniform intervals. This was done in order to give a good idea of what learning rates are most suitable for training a FFNN on the iris dataset using back-propagation. The test set error is also looked at to see how well the trained network generalises.This is a fairly broad investigation though and so although it was the intention to find good leaning rates, this is not essential, and the following data will show other characteristics related resulting from training.

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.3663 + 0.4267 + 0.3835 + 0.3594 + 0.4129) / 5 = 0.3898
Standard Deviation after 100 cycles = 0.0292
Average M.S.E. rate on test set after 100 cycles = (0.3068 + 0.3670 + 0.3288 + 0.2769 + 0.3190) / 5 = 0.3197
Standard Deviation after 100 cycles = 0.0328

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.3490 + 0.2107 + 0.3518 + 0.2074 + 0.2480) / 5 = 0.2734
Standard Deviation after 100 cycles = 0.0721
Average M.S.E. rate on test set after 100 cycles = (0.3404 + 0.1456 + 0.3412 + 0.1458 + 0.1621) / 5 = 0.2270
Standard Deviation after 100 cycles = 0.1041

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.1189 + 0.3354 + 0.080 + 0.1544 + 0.0645) / 5 = 0.1506
Standard Deviation after 100 cycles = 0.1189
Average M.S.E. rate on test set after 100 cycles = (0.1221 + 0.3638 + 0.0865 + 0.1465 + 0.0824) / 5 = 0.1603
Standard Deviation after 100 cycles = 0.1168

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.0642 + 0.0474 + 0.0760 + 0.3028 + 0.0548) / 5 = 0.1090
Standard Deviation after 100 cycles = 0.1088
Average M.S.E. rate on test set after 100 cycles = (0.0881 + 0.0716 + 0.1169 + 0.4889 + 0.0744) / 5 = 0.1680
Standard Deviation after 100 cycles = 0.1803

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.3177 + 0.0530 + 0.0597 + 0.0480 + 0.0512) / 5 = 0.1059
Standard Deviation after 100 cycles = 0.1185
Average M.S.E. rate on test set after 100 cycles = (0.2092 + 0.1003 + 0.0785 + 0.0834 + 0.1001) / 5 = 0.1143
Standard Deviation after 100 cycles = 0.0539

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.0557 + 0.0466 + 0.0500 + 0.0501 + 0.0460) / 5 = 0.0497
Standard Deviation after 100 cycles = 0.0038
Average M.S.E. rate on test set after 100 cycles = (0.0839 + 0.0895 + 0.0671 + 0.0879 + 0.0856) / 5 = 0.0828
Standard Deviation after 100 cycles = 0.00903

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.0450 + 0.0441 + 0.0507 + 0.0514 + 0.0366) / 5 = 0.0456
Standard Deviation after 100 cycles = 0.0060
Average M.S.E. rate on test set after 100 cycles = (0.1099 + 0.1129 + 0.1181 + 0.1282 + 0.0752) / 5 = 0.1087
Standard Deviation after 100 cycles = 0.0201

^ Error graph for training set runs.
Average M.S.E. after 100 cycles = (0.0394 + 0.0365 + 0.0372 + 0.0430 + 0.0429) / 5 = 0.0398
Standard Deviation after 100 cycles = 0.0031
Average M.S.E. rate on test set after 100 cycles = (0.0779 + 0.0624 + 0.0590 + 0.1065 + 0.0843) / 5 = 0.0780
Standard Deviation after 100 cycles = 0.0191
Learning rate: 0.26

Average M.S.E. after 100 cycles = (0.0484 + 0.0314 + 0.0328 + 0.0408 + 0.0286) /5 = 0.0364
Standard Deviation after 100 cycles = 0.0081
Average M.S.E. rate on test set after 100 cycles = (0.107 + 0.1515 + 0.1205 + 0.1407 + 0.1119) / 5 = 0.1263
Standard Deviation after 100 cycles = 0.0191
Learning rate: 0.3

Average M.S.E. after 100 cycles = (0.0346 + 0.0534 + 0.0353 + 0.0316 + 0.0473) / 5 = 0.0404
Standard Deviation after 100 cycles = 0.0094
Average M.S.E. rate on test set after 100 cycles = (0.1289 + 0.1006 + 0.0958 + 0.1133 + 0.2141) / 5 = 0.1305
Standard Deviation after 100 cycles = 0.0484
Learning rate: 0.4

Average M.S.E. after 100 cycles = (0.0602 + 0.0705 + 0.0283 + 0.0378 + 0.0661) / 5 = 0.0525
Standard Deviation after 100 cycles = 0.0185
Average M.S.E. rate on test set after 100 cycles = (0.0602 + 0.1078 + 0.2016 + 0.1813 + 0.0516) / 5 = 0.1205
Standard Deviation after 100 cycles = 0.0686
Learning rate |
|||||||||||
| 0.01 | 0.03 | 0.06 | 0.10 | 0.13 | 0.16 | 0.20 | 0.23 | 0.26 | 0.3 | 0.4 | |
| Average MSE (Training) | 0.3898 | 0.2734 | 0.1506 | 0.1090 | 0.1059 | 0.0497 | 0.0456 | 0.0398 | 0.0364 | 0.0404 | 0.0525 |
| Average MSE (Testing) | 0.3197 | 0.2270 | 0.1603 | 0.1680 | 0.1143 | 0.0828 | 0.1087 | 0.0780 | 0.1263 | 0.1305 | 0.1205 |
| S.D. MSE (Training) | 0.0292 | 0.0721 | 0.1189 | 0.1088 | 0.1185 | 0.0038 | 0.0060 | 0.0031 | 0.0081 | 0.0094 | 0.0185 |
| S.D. MSE (Testing) | 0.0328 | 0.1041 | 0.1168 | 0.1803 | 0.0539 | 0.00903 | 0.0201 | 0.0191 | 0.0191 | 0.0484 | 0.0686 |
Mean MSE on training set (with standard deviations)
Mean MSE on training set (with standard deviations)
At very low learning rates, the error on both datasets is quite high after 100 cycles. The error tends to converge more quickly as the learning rate is increased, although this increases the deviation of results obtained at around 0.1 learning rate. Examining the results for when the learning rate equals 0.13 shows how local minima is reached in one case. Higher learning rates mean that although the MS error rates increase in some cases early on, by 100 cycles it is relatively stable at around 0.05. Towards 0.3, and particularly 0.4, the learning is more unstable in when converging and the error flucatuates greatly at in some cases. However, statistically a t-test analysis shows that there is no significant difference between any of the sets of results above and including those obtained after a learning rate of 0.16 was set (probability of null hypothesis is always greater than 10%).
The results could be made more significant by looking at more runs of the network for each learning rate, and by examining more learning rates within certain ranges. The experiments looked at a fairly broad range of values for the learning rate (which was the aim of this). Additionally, it would be interesting to quantitatively examine what happens after 100 cycles. It is assumed that overtraining would occur in many cases, and in cases with large learning rates, 'thrashing', or oscillation would be exhibited.