- Linear separability and inseparability
- The Perceptron learning rule
- The Least Mean Squares (LMS) learning rule

We can extend this idea to any number of dimensions, though 4 or 5 dimensions is harder to visualise! Planes and straight lines are just hyperplanes of dimension 1 and 2. There are hyperplanes of higher dimension. In higher dimensional cases two classes in an N dimensional space are termed "linearly separable" if they are separable by a hyperplane of dimension N-1.

More precisely a linearly separable decision problem is one for which there exists a solution of the form

(3) |

The solution is simply the combination of weights and threshold which minimises the error function that has been defined. The error function can be defined in a number of ways. In the simplest case the error could simply be the difference between the actual value of

So we can see that a TLU implements a linear decision boundary. In the previous tutorial we saw that by changing the weights and the threshold of the TLU we can move the decision boundary. For the TLU to solve a decision problem all the patterns from one class should be on one side of the decision boundary, and all the patterns from the other class should be on the other side.

Learning in a TLU is concerned with using an automatic procedure for adjusting the weights and threshold so that the decision boundary minimises the error function.

If you don't understand the concept of linear separability and inseparability try plotting points in the input space in the demonstration below. Then try and separate the two classes by altering the position of the decision boundary by hand.

(4) |

where

- Loop until the error on the training set E=0
- Take the next training example
- Calculate the output (classification)
**y**of the TLU - Compare it to the desired output
**d** - If d=1 and y=-1 then adjust each weight
**w**using Eq.(5)_{i }

- If d=-1 and y=1 then adjust each weight
**w**using Eq.(6)_{i}

(5) |

(6) |

The training set is simply the set of input patterns that the TLU is learning to classify correctly. The value of alpha is in the interval (0,1], and is often set to 1. The algorithm will converge to a correct solution for any positive value of alpha if the problem is linearly separable. This procedure can be used to adjust the threshold as well by turning the threshold into another weight with a input value of -1. This is the way it is implemented in the demonstration below.

To see how the Perceptron learning rule works use the Demonstration. Set the training regime to be incremental. This means that the weights and threshold will be updated after each input pattern. The batch training regime means that weight changes are saved up and only made at the end of an epoch. An epoch is a complete presentation of all the input patterns in the training set. The error signals for a sequence of patterns will be different for the two training regimes. Can you explain why?

(7) |

Where

(8) |

Where

(9) |

If the weights are updated after every input pattern then the decision boundary will continue to move around the point with the lowest error. If batch training is used then the procedure will settle on the solution providing the minimum error. Strictly both the perceptron learning rules and the LMS rule were only designed to operate incrementally. In this tutorial we've added batch versions to help you understand the way in the which the original (incremental) rules work.

- Place two clusters of red and blue points in two diagonally
opposite corners of the input space. Now setting the training regime
to incremental, learn using the Perceptron rule. Record the final
position of the weight vector. Repeat your experiment from several
different starting positions. Why do the finishing positions of the
decision boundary differ? When does the TLU stop learning. Why?
- Now move the weights so that the decision boundary no longer
correctly classifies all the exemplars. Apply the LMS rule, again
using the incremental training regime. You will have to gradually
decline the learning rate to obtain convergence. What is the final
position of the weight vector? Reset the weights and repeat the
experiment. Why does the LMS rule keep on adjusting the weights even
after the patterns appear to be correctly classified?
- Explain why the final position of the decision boundary differs
between the perceptron learning rule and the LMS rule. What does this
tell you about the likely generalisation abilities of the two rules?
- Now run the batch version of the LMS rule on the same
data. Explain why the decision boundary now settles to a steady
position.
- Explain why the LMS rule is unstable if the learning rate is set
high.
- Now devise an experiment to compare the different convergence
times of the rules. Write up your experiment, reporting the aim,
method, results and your conclusions.
- Now reset the training data and create two classes that are
linearly separable but close to one another. Devise experiments to
compare the behaviour of the two learning rules in this
situation. Make sure you examine the behaviour of each rule under each
training regime (incremental and batch).
- Now devise a linearly inseparable set of training data. Devise experiments to investigate the behaviour of the learning rules now. Explain why the perceptron learning rule doesn't allow the decision boundary to settle, whereas the LMS rule does (in the batch case), but to an incorrect classification.