Introduction to AI - Worksheet 3

In this tutorial we are going to explore our second spam filtering technique: decision tree classifiers. There are several algorithms that automatically generate decision trees, of which ID3 and C4.5 are the most well-known. We are going to compare a hand-generated decision tree against the ID3 algorithm. This tutorial builds on the work done in the previous tutorial. The rules you created last week will form the set of predicates from which nodes of the decision tree will be created. If you didn't get very far last week some rules are provided for you. You will probably do better if you create your own rules. The installation instructions and code are in the same place as last week.

There are many online resources for decision tree-based spam filtering, as a search will show. The commercial applications show the utility of this approach, but the more useful for our purpose are research papers, as they actually give details of the implementation. A good place to start in the literature is By following the bibliographic links you will find much of the existing literature on spam filtering.


Your first task is to get the decision tree spam filter up and running. To do so, change to the directory where you installed the software and run SISC:

  > sisc
  SISC (1.8.8) - main
Now load the file spam-filter.scm, which runs the decision tree spam filter. You should see output similar to that below:

  #;> (load "spam-filter.scm")
  Classifying Spam
  Spam 1: Classified by (Example Tree)
  Spam 2: Classified by (Example Tree)
  Spam 3: Not classified (spam #f) 
  Spam 4: Classified by (Example Tree)
  spam classification done
  Classifying email
You should recognise this from last week - the basic framework hasn't changed. We're now ready to start extending the decision tree spam filter.


Your next task is to create a new decision tree. Take a look at the example decision tree (or example-tree.scm in Scheme). Notice that the class extends AbstractTree. Your class must do the same. Now look at the value assigned to head - this is the actual tree of nodes that defines the decision tree. The actual decision is made by recursively calling the Nodes that make up the decision tree till a boolean value is returned. If true is returned the email is classified as spam; otherwise it is classified as genuine email. A BinaryNode is constructed from 3 arguments: a Rule that forms the predicate of the node, a Node that is called if the predicate returns true, and a Node that is called if the predicate returns false. A TrueNode returns true in all situations. A FalseNode returns false in all situations. Now write your own decision tree using the rules we have supplied and the rules you have created on your own. As before you simply need to compile the file and re-load spam-filter.scm for the decision tree to be loaded.


Compare your decision tree to that created by ID3. To do this un-comment the line (load "id3.scm") in spam-filter.scm and re-load spam-filter.scm. How does your decision tree compare to ID3? Can you improve your decision tree to equal or better the performance of ID3? Can you create a smaller tree than the one created by ID3 that performs just as well?


The way that data is represented is critical to the performance of any machine learning algorithm. We have a very flexible representation, as our rules can match just about anything. However it requires substantial human effort to create the rules. The more common representation is the "bag of words" representation. In this representation each email is represented by a vector of booleans. The vector is as long as the number of words in the corpus - 30,000 words is not unusual. An element is true if the word corresponding to that word is present in the email, and false otherwise. Implement the ID3 algorithm using this representation. How does this compare to the decision trees using our hand crafted rules?


A decision tree is a conjunction of disjunctions of unary predicates. This representation is quite limiting. The expert system from last week allows a much more flexible representation - in fact any logical form is allowed. Can you use this extra flexibility to out-perform the decision tree? How could a machine learning algorithm make use of the extra flexibility?

© Noel Welsh & Manfred Kerber, 2004, Introduction to AI
The URL of this page is
URL of module