TEACH RECURSE1 Steven Hardy August 1978
Updated A.Sloman Nov 1989
RECURSIVE PROCEDURE DEFINITIONS
(Illustrated with POP11 examples)
CONTENTS - (Use g to access required sections)
-- Introduction, and prerequisites
-- The example of factorial
-- Why are circular definitions permitted?
-- Example: adding the numbers in a list
-- Expressing sum in Pop-11
-- Finding the largest number in a list
-- A Pop-11 definition of largest(list)
-- An inefficient definition of "largest"
-- Introduction, and prerequisites ------------------------------------
This teach file discusses 'recursive' definitions, that is defining some
concept in terms of itself (or some group of concepts in terms of each
other).
For example, does society condition the way people think or is it the
way people think that shapes society? Both may be true, but many people
feel intuitively suspicious of such circular arguments.
The examples below assume that you are familiar with procedure
definitions in POP-11 (see TEACH * DEFINE), including the definition of
procedures with inputs and outputs.
You should also be familiar with the basic arithmetical operations
+ (plus)
- (minus)
* (multiplied by)
and the elements of list processing, as introduced in TEACH * LISTS.
In particular, you need to know that
(a) [] is an empty list
(b) HD is a built-in operation to find the first element of a list
(c) TL is a built-in operation to find the tail of a list, where
the tail of L is a list containing all the elements of L except
the first.
-- The example of factorial -------------------------------------------
One way of defining things recursively is to treat them numerically. An
example which may be familiar to you is the definition of a factorial, a
concept often used in mathematics. The factorial of a number is got by
multiplying together all the numbers from 1 up to that number.
So (using "*" for multiplication) we have the following examples:
Factorial of 5 is
5 * 4 * 3 * 2 * 1
which is 120. Factorial of 9 is
9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1
which is 362880!
It is easy to give examples of this concept, but how can we define it
in general for any number. Here is one way:
(a) factorial 1 is 1.
(b) factorial of a number n greater than 1 is n multiplied by the
factorial of n-1,
We represent the factorial of n as "factorial(n)", though mathematicians
often represent it as "n!" and read it as "n factorial".
From (a) and (b) you can work out the factorial of any number. For
example, because 3 is greater than 1, definition (b) implies that
factorial of 3 is :
3 * factorial(2)
Because 2 is also greater than 1, factorial(2) is 2 * factorial(2 - 1),
so we can replace "factorial(2)" above giving:
3 * 2 * factorial(1)
and, because factorial 1 is 1 (using definition (a)), that is
3 * 2 * 1
Algebraically the definition reads:
(a) factorial(1) = 1
(b) if n > 1 then factorial(n) = n * factorial(n-1)
-- Why are circular definitions permitted? ----------------------------
That seond clause, (b) defines 'factorial' in terms of factorial - how
is that possible without endless circularity?
Modern programming languages permit the programmer to define operations
and predicates in terms of themselves. Some people find this a little
disturbing (or at least confusing) and so it might be instructive to
consider how a computer can do this. The key idea, which is often hard
to remember, is that once a procedure P has been defined and the
definition has been compiled, any procedure can use P. I.e. any
procedure can "call" P, provided that it gives P the appropriate inputs.
The caller will simply wait till P's instructions have been obeyed, and
then continue.
If ANY procedure can use P, then P can also use P. I.e. it can use
itself to solve some sub-problems. While writing the instructions for P
you may find it hard to believe that P can be used in the middle,
because the definition has not yet been completed. However, once the
procedure has been compiled, all the instructions for doing P are in the
computer. So when P calls P, the computer will do exactly the same as if
anything else calls P: i.e. it will run the instructions for doing P.
-- Example: adding the numbers in a list ------------------------------
We'll use a slightly different example, as some readers will be
unfamiliar with factorial and may find the concept unmotivated.
You often have to add up a set of numbers. Suppose we represent a set
of numbers by a list, e.g.
[3 5 13 6 22 4]
You probably know how to add up the numbers presented in such a list,
but can you tell a computer how to do it?
Consider the following definition of the 'the sum of a set of numbers':
(a) The sum of a set of numbers is zero if the set is empty
(b) otherwise it is the first element of the set plus the
sum of the rest of the set.
(b) Implies that the sum of the numbers in [3 5 13 6 22 4], which we
can represent as sum([3 5 13 6 22 4]) is
3 + sum([5 13 6 22 4])
since [5 13 6 22 4] is the tail of the list. Once again we can use
the definition (b) to "expand this" giving
3 + 5 + sum([13 6 22 4])
using (b) repeatedly we eventually get to
3 + 5 + 13 + 6 + 22 + sum([4])
and then, (since [] is the tail of [4]):
3 + 5 + 13 + 6 + 22 + 4 + sum([])
To finish this off, we have to use definition (a), giving
3 + 5 + 13 + 6 + 22 + 4 + 0
We can define "sum" more formally thus:
(a) sum([]) = 0
(b) if L is not empty then
sum(L) = hd(L) + sum(tl(L))
where "hd" is the Pop-11 procedure that gives the first element of the
list, and "tl" is the procedure that gives the tail, i.e. a list
containing all elements apart from the first. Notice that hd and tl are
not defined for empty list: these Pop-11 procedures give an error if
applied to []. But this is not a problem, since case (b) excludes the
possibility of L being the empty list.
-- Expressing sum in Pop-11 -------------------------------------------
When translated into POP11 the above definition becomes:
define sum(set) -> total;
if set = [] then
0 -> total
else
hd(set) + sum(tl(set)) -> total
endif
enddefine;
You should be able to test that procedure on a number of examples.
sum([]) =>
** 0
sum([15]) =>
** 15
sum([1 3 5 7]) =>
** 16
Use the Pop-11 tracer to help you see what is going on when sum runs.
Trace "sum" and "tl" and "+".
trace sum tl +;
(Becuase "tl" and "+" are "system" procedures you will have to
recompile "sum" to make the tracing work for them.
sum([1 3 5]) =>
> sum [1 3 5] ;;; Going into sum with [1 3 5]
!> tl [1 3 5] ;;; Calling tl to get the tail
!< tl [3 5] ;;; tl returns [3 5], for the recursive call
!> sum [3 5] ;;; Calling sum recursively with the tail
!!> tl [3 5] ;;; The new call of sum needs the tail of [3 5]
!!< tl [5] ;;; tl returns [5]
!!> sum [5] ;;; Call sum recursively with the new tail [5]
!!!> tl [5] ;;; The new call of sum needs tail of [5]
!!!< tl [] ;;; which is the empty list
!!!> sum [] ;;; Call sum recursively with the empty list
!!!< sum 0 ;;; At last sum knows what to return - 0
!!!> + 5 0 ;;; Call + with the previous head and 0
!!!< + 5 ;;; Plus returns the result 5
!!< sum 5 ;;; which is then used as the result of sum
!!> + 3 5 ;;; Call plus with previous head and previous
!!< + 8 ;;; result: Plus returns 8
!< sum 8 ;;; which is then returned by sum
!> + 1 8 ;;; At last the inital call of sum can
!< + 9 ;;; call + with 1 and 8, getting back 9
< sum 9 ;;; which is the final result of sum
** 9 ;;; And "=>" prints it out
Notice how NO addition is done until after the recursion has got to the
end of the list, and produced the first result, the number 0.
We can see the same thing without the tracing of "+" and "tl".
untrace + tl;
sum([1 3 5]) =>
> sum [1 3 5]
!> sum [3 5]
!!> sum [5]
!!!> sum []
!!!< sum 0
!!< sum 5
!< sum 8
< sum 9
** 9
-- Finding the largest number in a list -------------------------------
A second example is the definition of the largest number in a list of
numbers. We can specify what largest means along the same lines as
before, except that now we say that if the list is empty there is no
largest number. If there's only one element then that is the largest.
Otherwise, find the largest element in the tail of the list, and compare
that with the head of the list. The larger of the two is the largest in
the original list.
We can use the procedure "max" to specify this more precisely. max(x,y),
where x and y are numbers, is always one of x and y, the larger one if
one is larger, or x if they are the same number E.g.
max(3, 55) =>
** 55
max(55, 3) =>
** 55
max(55, 55) =>
** 55
max(-3, -2) =>
** -2
We therfore have three cases to consider:
(a) If L = [] then largest(L) is undefined, and produces an error
(b) If L has only one element, then largest(L) is that element
(c) If L has more than one element then
largest(L) = max( hd(L) , largest( tl(L) ) )
Notice that because "max" takes two numbers as input, this definition
assumes that hd(L) is always a number and that largest(tl(L)) is always
a number, which it must be if "largest" is defined properly!
-- A Pop-11 definition of largest(list) -------------------------------
One way of saying this in POP11 is:
define largest(list) -> result;
if tl(list) = [] then
hd(list) -> result
elseif length(list) > 1 then
max( hd(list), largest(tl(list)) ) -> result
endif
enddefine;
Test this definition:
largest([33]) =>
** 33
largest([2 66 4 8]) =>
** 66
vars ages;
[23 42 17 9 16 32 15 12 7] -> ages;
largest(ages) =>
** 42
Try some of your own examples.
What happens if you give it the empty list?
largest([]) =>
If the list given is empty, then the call of "tl" in the first condition
will produce an mishap, with the error message
;;; MISHAP - NON-EMPTY LIST NEEDED
;;; INVOLVING: []
;;; DOING : tl largest ...
You can try compiling that procedure and testing it on some examples of
lists of numbers, including the empty list. If you trace it, and trace
max, then you can see how it works.
-- Does largest really need to test if the length exceeds 1? ----------
If you think hard about it, you should be able to work out that
the test for the length being more than 1 is redundant. The length
cannot be 0 at that stage because an empty list would not have got
past the call of "tl" in the first test: it would have produced an
error. The length cannot be 1 because that would involve having a tail
that is empty, so the first condition
tl(list) = []
would be true, and the first action would be done, givng the result
hd(list).
So if the length is not 0 and not 1, then it must be greater than 1,
by the time it gets to the "elseif" instruction. Hence the test is
redundant.
We can therefore have a more economical definition without the second
condition, thus:
define largest(list) -> result;
if tl(list) = [] then
hd(list) -> result
else
max( hd(list), largest(tl(list)) ) -> result
endif
enddefine;
-- An inefficient definition of "largest" -----------------------------
We can also define largest differently, without using "max". Instead
we can use the arithmetic operator ">", i.e. "greater than", thus:
define largest(list) -> result;
if tl(list) = [] then
hd(list) -> result
elseif hd(list) > largest(tl(list)) then
hd(list) -> result
else
largest(tl(list)) -> result
endif
enddefine;
Try compiling and testing that one, to make sure it works (using trace
if necessary). Notice that if the "elseif" condition is FALSE, then
it will call largest on the tail TWICE. So you may find, if you trace
this, that it has more recursive calls than you expect. This means
that it is a wasteful definition. Look back at the previous definition
to see why it did not need to call largest(tl(list)) twice.
We can avoid calling it twice by calling it once and storing the result
temporarily in the output local variable "result", and use that to
compare with the head of the list. So
define largest(list) -> result;
if tl(list) = [] then
hd(list) -> result
else
largest(tl(list)) -> result;
;;; now check whether to use this as the final result or not
if hd(list) > result then
;;; The head is bigger, so replace the result
hd(list) -> result
endif
endif
enddefine;
Compile and test that. Notice that unlike the procedure sum, defined
above, the procedure largest sometimes DISCARDS the result of the
recursive call on the tail of list, after using the result to compare
with the head of the list.
Previously, in the definition of "sum", we added the result from the
recursive call to the head of the list. This is because the task of SUM
was different: it must add up all the numbers, not just find one number
that is bigger than the rest.
-- The importance of recursion ----------------------------------------
In each of these procedure definitions - SUM and LARGEST - we see a
mention of the procedure being defined. It is this feature that makes
the definition recursive. Definitions of procedures, like definitions of
almost anything else typically involve mentioning something that isn't
part of the definition. Dictionary definitions always use words that
aren't defined in that definition - how could they avoid it?
Recursive definitions are just a very special case ot this more general
phenomenon - a phenomenon bound up with our capacity to NAME things -
objects, procedures, events and so on.
What we find however is that there are always special cases e.g.
factorial(1), sum([]), or largest([3]) that don't involve recursion.
That is how the recursion avoids going on forever. All recursive
procedures must have a "stopping condition".
What the stopping condition is depends on the problem. Often it is the
empty list, but we saw with "largest" that it was a list of length 1.
Often the stopping condition in numerical recursive procedures is an
input of 0, but with factorial, as we defined it, the stopping
condition was an input value of 1.
The significance of recursion lies in its economy, a fact of great
significance not just in programming but in many intelligent processes.
For example, one approach to a definition of complex sentences uses
recursion in such a way as to 'embed' simple sentences inside another.
(See TEACH *ISASENT, and John Lyons, CHOMSKY, Fontana Modern Masters
paperback).
Such an approach to language structures makes it possible to encompass a
knowledge of potentially infinitely many sentences in a brain of only
finite capacity.
-- How recursion works ------------------------------------------------
A metaphor can be used to explain how the computer is able to understand
these recursive definitions.
Imagine that the inside of the computer houses a library of procedure
definitions and other such things. Close by the library is a waiting
room full of men sitting around doing nothing. When I type in a request
to run a procedure, for example:
sum(ages) =>
one of the men in the waiting room is instructed by a monitoring process
to work this out for me. He gets from the library a photocopy of the
instructions defining the procedure SUM, and a copy of the list AGES
(which has been previously stored in the library). This list is then
taken to be what is named by "set" in the instructions for SUM
define sum(set) -> total;
As the man in the computer works through the definition of SUM he gets a
friend (from the waiting room) to handle any subsidiary procedure calls
including calls of HD, TL, + and SUM so that when evaluating the line
hd(set) + sum(tl(set))
he asks for a HD to be done (passing over the collection of ages he
associates with SET to the man performing the HD operation). He then
asks for a TL to be done and then passes the result of that 'call' of TL
(a diminished set of ages) over to a man doing a second SUM process.
Our man has to WAIT while the other men do their task, one at a time.
Some of them may have to call other men to do further sub-tasks.
Eventually the recursive call of SUM is finished, and the second little
man hands back the result obtained. Our first man does not know or care
how it is done, as long as the instructions have been followed properly.
When the result comes back, he can give it, with the result produced by
HD, to a little man to run the instructions for "+", and hand back the
number to be used for the final result of the first little man's task.
Notice that our first man does not get involved with evaluating the
recursive call. A second man comes from the waiting room, with his own
photocopy of the SUM instructions and his own idea of the value
associated with the name SET. This second man will in turn ask a third
man to perform a SUM operation. At one stage, there will be several men
working on slightly different versions of a SUM process, though most of
them will be waiting patiently for the next one along the chain to
finish his task, as they cannot get on until that is done.
The last little man will associated the name SET with the empty list [],
and by following his copy of the instructions for SUM, he will be able
to 'return' the result zero without 'invoking' yet another SUM
operation. Once this last man has finished the previous one continues
(by getting a friend to do the actual addition) and then returns his
result to the second last man and so on until the whole process is
complete.
Here's an interaction with POP-11 in which the TRACE facility was used to
show the behaviour of SUM. Each line with a ">" represents a new little
man starting his job with a certain value of SET -- a list of numbers.
Each line with a "<" represents a man finishing his task of carrying out
the SUM instructions, and it shows the result he achieved, which will then
be used by the man who asked him to do it. The vertical lines help you
relate the start and the finish of the job done by each man.
trace sum; ;;; make pop11 show all the "activations"
sum(ages) =>
>sum [23 42 17 9 16 32 15 12 7]
!>sum [42 17 9 16 32 15 12 7]
!!>sum [17 9 16 32 15 12 7]
!!!>sum [9 16 32 15 12 7]
!!!!>sum [16 32 15 12 7]
!!!!!>sum [32 15 12 7]
!!!!!!>sum [15 12 7]
!!!!!!!>sum [12 7]
!!!!!!!!>sum [7]
!!!!!!!!!>sum []
!!!!!!!!!"
in the original command.
The advantage of this approach to evaluating recursive definitions is
that each of the men involved has a straightforward task - merely to
remember what his 'local variables' are and to remember how far through
the procedure definition he is. He doesn't have to understand the
process as a whole - just his part in it (which is simple, and
completely specified by HIS copy of the instructions).
Each little man who waits for others will have to remember where he had
got to. So he keeps his finger on the next instruction he has to obey as
soon as the next one has finished. At any one time different little men
may have their fingers waiting on different instructions.
-- The use of "local" variables ---------------------------------------
The man carrying out an operation can be asked to remember values other
than the 'arguments' (or 'inputs'). Consider the following:
define largest(set) -> result;
vars temp;
if tl(set) = [] then
hd(set) -> result
else
largest(tl(set)) -> temp;
if hd(set) > temp then
hd(set) -> result
else
temp -> result
endif
endif
enddefine;
In this example, the man performing a LARGEST process has been asked to
remember an additional local variable' called TEMP (that's what the VARS
statement is for). Initially the man doesn't know what value to associate
with this name, but if he has to obey the instructions on the sixth line
of the definition he encounters an 'assignment statement'
largest(tl(set)) -> temp;
which can be roughly taken to mean 'get another little man to work out
the largest element of the rest of the set, and when he has finished and
given back a result remember it by the name TEMP'.
Different little men will have different values for TEMP, just has they
have different values for SET, and RESULT. These are called "local"
variables because their values are private to each activation, or each
little man. You can think of this by supposing that when each little man
gets his sheet of paper with the set of instructions he has to obey,
there are some named boxes on the sheet, and what is written into the
boxes may be different for all of them.
So each activation of a procedure needs a special workspace for its
"local" variables, including the input locals and output locals.
-- Further work -------------------------------------------------------
This teach file by no means exhausts what can be accomplished using
recursive definitions. For example 'mutually recursive' definitions -
where two or more operations are defined in terms of each other - can be
handled by the POP11 system as easily as the more straightforward
'simple recursion' shown above.
Moreover, the examples given given above all relied on your
understanding of numbers. POP11 has many structure building and
structure editing facilities. Recursive definitions are often the ONLY
sensible way of manipulating the complex and varied objects one can
create using these facilities.
-- Some things to do: -------------------------------------------------
(a) Write out English versions of SUM and LARGEST, and
get some friends to help you enact their behaviour.
(b) Write a POP11 definition of your own to find the
SMALLEST number in some set, and run it on the computer.
(c) What does the following procedure do:
define sillyprint(set);
if set = [] then
"finished" =>
else
hd(set) =>
sillyprint(tl(set));
hd(set) =>
endif
enddefine;
sillyprint([a b c d]);
(d) Try TEACH * SETS, TEACH * SETS2
The file TEACH * RECURSE2 discusses some recursive procedures for drawing
pictures.
TEACH * RECURSE3 analyses some of the problems of dealing with the
"terminating" condition, e.g. recursing down a list till you
reach the empty list.
--- C.all/teach/recurse1
--- Copyright University of Sussex 1990. All rights reserved. ----------