The activation function of neurons allows non-linearity to be introduced into neural network training and determines the elasticity of weight changes. It therefore can improve convergence in training. In standard back-propagation, the logistic sigmoid function is used. Here this will be compared with three other activation functions experimentally.
1. Logistic sigmoid(x) :: = 
2. Tanh(x) :: =
3. Elliot(x) :: =

| Method | Run |
||||||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Mean | S.D. | ||
| 1.Logistic Sigmoid |
MSE (Training) | 0.0352 | 0.0391 | 0.1593 | 0.0330 | 0.0528 | 0.0390 | 0.0293 | 0.0353 | 0.0300 | 0.0305 | 0.0483 | 0.0396 |
| 2. Tanh | MSE (Training) | 0.0286 | 0.0616 | 0.0321 | 0.0402 | 0.0358 | 0.0343 | 0.0302 | 0.0485 | 0.3468 | 0.1805 | 0.0838 | 0.1030 |
| 3. Elliot | MSE (Training) | 0.0664 | 0.0855 | 0.2522 | 0.1832 | 0.1024 | 0.0618 | 0.2642 | 0.0589 | 0.0671 | 0.1457 | 0.1287 | 0.0792 |

The results show that on average, the logistic sigmoid function performs better than the other two methods (Tanh and Elliot). The difference between these best two functions is not statistically significant though (with a probability of 56% in a two-tailed T-test (5% or less is generally accepted as significant)). In some cases, tanh was able to perform better than the other functions in terms of convergence, but it has a high variance and statistically is not different enough from the other two better functions. Logistic sigmoid on average reaches lower MSE rates than the other methods. Some researchers however have reported that tanh can provide faster convergence as it produces values in a different range (-1 to 1) to that of the logistic sigmoid (0 to 1).
Kalman B. L., & Kwasny S. C. (1992). Why Tanh: Choosing a Sigmoidal Function . Proceedings of the International Joint Conference on Neural Networks. Baltimore: Morgan Kaufmann.