Learning in TLUs
Introduction
In this tutorial you will learn about:
- Linear separability and inseparability
- The Perceptron learning rule
- The Least Mean Squares (LMS) learning rule
Linear separability
The simplest case of a linearly separable decision problem is one
consisting of two sets of points (patterns) in a 2-d vector space
that belong to different classes, where the two classes can be
separated by a straight line. In 3 dimensions the equivalent problem
is one where the points are separable by a plane.
We can extend this idea to any number of dimensions, though 4 or 5
dimensions is harder to visualise! Planes and straight lines are just
hyperplanes of dimension 1 and 2. There are hyperplanes of higher
dimension. In higher dimensional cases two classes in an N dimensional
space are termed "linearly separable" if they are separable by a
hyperplane of dimension N-1.
More precisely a linearly separable decision problem is one for which
there exists a solution of the form
 |
(3) |
The solution is simply the combination of weights and threshold which
minimises the error function that has been defined. The error function
can be defined in a number of ways. In the simplest case the error
could simply be the difference between the actual value of a
for each pattern x and the desired value of a . In a
TLU a simple way of defining the error is by thresholding a ,
and defining the output of the threshold function as the
classification of the input pattern. So our single TLU can distinguish
between two classes. The error measure now could simply be the number
of incorrectly classified patterns.
So we can see that a TLU implements a linear decision boundary. In the
previous tutorial we saw that by changing the weights and the
threshold of the TLU we can move the decision boundary. For the TLU to
solve a decision problem all the patterns from one class should be on
one side of the decision boundary, and all the patterns from the other
class should be on the other side.
Learning in a TLU is concerned with using an automatic procedure for
adjusting the weights and threshold so that the decision boundary
minimises the error function.
Linear inseparability
Clearly not all decision problems are linearly separable: they cannot
be solved using a linear decision boundary. Problems like these are
termed linearly inseparable. XOR is a linearly inseparable problem.
If you don't understand the concept of linear separability and
inseparability try plotting points in the input space in the demonstration below. Then try and separate the two
classes by altering the position of the decision boundary by hand.
The Perceptron learning rule
Now we know what sort of problem can be solved using a TLU let's think
about how a TLU can learn to solve such a problem. A simple learning
procedure for a TLU is called the Perceptron learning rule. First of
all we need to define an error function precisely. The error function
used for the perceptron training rule is based on the number of
incorrectly classified points in the training set.
 |
(4) |
where M is the set of misclassified input patterns, and d
is the desired output of the TLU. This is the perceptron error
criterion. If all input vectors are correctly classified the error
E=0. The contribution of a misclassified input pattern to the
error is its absolute distance from the decision boundary. The
perceptron learning procedure minimises this error function. The
learning procedure is as follows:
- Loop until the error on the training set E=0
- Take the next training example
- Calculate the output (classification) y of the TLU
- Compare it to the desired output d
- If d=1 and y=-1 then adjust each weight wi using Eq.(5)
- If d=-1 and y=1 then adjust each weight wi using Eq.(6)
Where Eqs.(5) and (6) are
 |
(5) |
 |
(6) |
The training set is simply the set of input patterns that the TLU is
learning to classify correctly. The value of alpha is in the interval
(0,1], and is often set to 1. The algorithm will converge to a correct
solution for any positive value of alpha if the problem is linearly
separable. This procedure can be used to adjust the threshold as well
by turning the threshold into another weight with a input value of
-1. This is the way it is implemented in the demonstration below.
To see how the Perceptron learning rule works use the
Demonstration. Set the training regime to be incremental. This
means that the weights and threshold will be updated after each input
pattern. The batch training regime means that weight changes are saved
up and only made at the end of an epoch. An epoch is a complete
presentation of all the input patterns in the training set. The error
signals for a sequence of patterns will be different for the two
training regimes. Can you explain why?
The LMS learning rule
The Least Mean Squares rule minimises an error measure called the mean
squared error.
 |
(7) |
Where e_x is the error for the input pattern x . In
the TLU implementation of this rule the error for each pattern is
calculated by finding the difference between the desired and actual
activation, i.e. the error is calculated using the activation of the
TLU before it is passed through the threshold function.
 |
(8) |
Where d_x is the desired output (+1 or -1) and a_x
is the actual activation. The update procedure is simple. The weights
are updated for every input pattern using the rule
 |
(9) |
If the weights are updated after every input pattern then the decision
boundary will continue to move around the point with the lowest
error. If batch training is used then the procedure will settle on the
solution providing the minimum error. Strictly both the perceptron
learning rules and the LMS rule were only designed to operate
incrementally. In this tutorial we've added batch versions to help you
understand the way in the which the original (incremental) rules work.
Demonstration
Exercises
- Place two clusters of red and blue points in two diagonally
opposite corners of the input space. Now setting the training regime
to incremental, learn using the Perceptron rule. Record the final
position of the weight vector. Repeat your experiment from several
different starting positions. Why do the finishing positions of the
decision boundary differ? When does the TLU stop learning. Why?
- Now move the weights so that the decision boundary no longer
correctly classifies all the exemplars. Apply the LMS rule, again
using the incremental training regime. You will have to gradually
decline the learning rate to obtain convergence. What is the final
position of the weight vector? Reset the weights and repeat the
experiment. Why does the LMS rule keep on adjusting the weights even
after the patterns appear to be correctly classified?
- Explain why the final position of the decision boundary differs
between the perceptron learning rule and the LMS rule. What does this
tell you about the likely generalisation abilities of the two rules?
- Now run the batch version of the LMS rule on the same
data. Explain why the decision boundary now settles to a steady
position.
- Explain why the LMS rule is unstable if the learning rate is set
high.
- Now devise an experiment to compare the different convergence
times of the rules. Write up your experiment, reporting the aim,
method, results and your conclusions.
- Now reset the training data and create two classes that are
linearly separable but close to one another. Devise experiments to
compare the behaviour of the two learning rules in this
situation. Make sure you examine the behaviour of each rule under each
training regime (incremental and batch).
- Now devise a linearly inseparable set of training data. Devise
experiments to investigate the behaviour of the learning rules now.
Explain why the perceptron learning rule doesn't allow the decision
boundary to settle, whereas the LMS rule does (in the batch case), but
to an incorrect classification.