Introduction to AI - Week 3

Learning
Decision Trees
Reinforcement Learning

Decision Trees


Example

Problem to wait for a table at a restaurant. A decision tree decides whether to wait or not in a given situation.

Attributes:

  1. Alternate: alternative restaurant nearby
  2. Bar: bar area to wait
  3. Fri/Sat: true on Fridays and Saturdays
  4. Hungry: whether we are hungry
  5. Patrons: how many people in restaurant (none, some, or full)
  6. price: price range (£ , £ £ , £ £ £ )
  7. raining: raining outside
  8. reservation: whether we made a reservation
  9. type: kind of restaurant (French, Italian, Thai, or Burger)
  10. WaitEstimate: estimated wait (<10, 10-30,30-60,>60)

Original Decision Tree


Expressiveness of Decision Trees


Some Examples

 Ex  Atrributes                     Goal
   Alt  Bar   Fri  Hun   Pat  Price   Rain  Res  Type  Est  Wait
-------- ------------- -------------
 X1  Yes  No  No  Yes  Some  £ £ £   No  Yes  French   0-10  Yes
 X2  Yes  No  No  Yes  Full  £   No  No  Thai  30-60  No
 X3  No  Yes  No  No  Some  £   No  No  Burger   0-10  Yes
 X4  Yes  No  Yes  Yes  Full  £   Yes  No  Thai  10-30  Yes
 X5  Yes  No  Yes  No  Full  £ £ £   No  Yes  French  >60  No
 X6  No  Yes  No  Yes  Some  £ £   Yes  Yes  Italian   0-10  Yes
 X7  No  Yes  No  No  None  £   Yes  No  Burger   0-10  No
 X8  No  No  No  Yes  Some  £ £   Yes  Yes  Thai   0-10  Yes
 X9  No  Yes  Yes  No  Full  £   Yes  No  Burger  >60  No
 X10  Yes  Yes  Yes  Yes  Full  £ £ £   No  Yes  Italian  10-30  No
 X11  No  No  No  No  None  £   No  No  Thai   0-10  No
 X12  Yes  Yes  Yes  Yes  Full  £   No  No  Burger  30-60  Yes

Different Solutions


Selecting Best Attributes


Selecting Best Attributes (Cont'd)


Recursive Algorithm


DecTreeLearning ID3

function DecTreeL(ex,attr,default);;; returns decision tree
   ;;; set of examples, of attributes, default for goal predicate
  if ex empty then return default
  elseif all ex have same classification then return it
  elseif attributes empty then return MajVal(ex)
  else ChooseAttribute(attr,ex) -> best1
        new decision tree with root test best1 -> tree1
       for each value vi of best1 do
         {elements of ex with best1=vi} -> exi
         DecTreeL(exi,attr,MajVal(ex)) -> subtree1
         add branch to tree with label vi
     subtree subtree1
       end
  return tree

Generated Decision Tree


Discussion of the Result

Comparison of original tree and learned tree:


Assessing the Performance


Assessing the Performance (Cont'd)

Learning curve shows increase of the quality of the prediction, when training set grows.


Applications


Finding Best Attributes

In order to build up small decision trees: select best attributes first (best = most informative)

Logarithm

ld(x) (dual logarithm) is defined for every positive real number x such that
2ld(x) = x

Logarithm (Cont'd)

Some values:

 x  1  2  4  8  10  16  1/2  1/4  1/8
 ld(x)  0  1  2  3  3.32  4  -1  -2  -3

limx-> 0+ld(x)=-∞          limx-> 0+x*ld(x)=0

Remember:

log10x= log102 * ld(x)


Calculations in the Examples

 I(½,½)
 =   SUMi=12 -P(ui)* ld(P(ui))
 =   -½*ld(½)-½*ld(½)
 =   -½* (-1)-½* (-1)
 =   ½+½
 =   1

with P(ui)=½


Calculations in the Examples (Cont'd)

 I(0/ 100,100/ 100)
 =   SUMi=12 -P(ui)* ld(P(ui))
 =   -0/ 100*ld(0/ 100)-100/ 100*ld(100/ 100)
 *=   -0* (-∞)-1* 0
 *=   0+0
 =   0

with P(u1)=0/ 100 and with P(u2)=100/ 100
(*) Strictly you have to use here
limx-> 0+x*ld (x) = 0.


Calculations in the Examples (Cont'd)

 I(1/ 100,99/ 100)
 =   SUMi=12 -P(ui)* ld(P(ui))
 =   -1/ 100*ld(1/ 100)-99/ 100*ld(99/ 100)
 =   -1/ 100* (-6.64386)-99/ 100* (-0.0145)
 =   0.066439 + 0.014355
 =   0.080794

with P(u1)=1/ 100 and with P(u2)=99/ 100

ld(1/ 100) = -6.64386 and ld(99/ 100) = -0.0145


Applied to Attributes

p := number of positive examples
n := number of negative examples

Information contained in correct answer:
I(p/ p+n,n/ p+n)= - p/ p+n ld p/ p+n - n/ p+nldn/ p+n

Restaurant example: p=n=6, hence we need 1 bit of information. A test of one single attribute A will not usually give this, but only some of it. A divides example set E into subsets E1,... ,Eu. Each subset Ei has pi positive and ni negative examples, so we need in this branch an additional I(pi/ (pi+ni),ni/ (pi+ni)) bits of information, weighted by
(pi+ni)/(p+n) (probability of a random example)

HENCE information gain:
Gain(A) = I(p/ p+n,n/ p+n) - SUMi=1u pi+ni/p+n * I(pi/ pi+ni,ni/ pi+ni)


Heuristics

Choose attribute with largest information gain.

In the restaurant example, initially:

 alternative  bar  friday  hungry  patrons
-------- ------------- -------------
 0.0  0.0  0.020721  0.195709  0.540852

 price  rain  reservation  type  estimate
-------- ------------- -------------
 0.195709  0.0  0.020721  0.0  0.207519

Hence: choose "patrons"


Noise and Overfitting


Reinforcement Learning

Assume the following stochastic environment
Each training sequence has the form:

Rewards

Probability for a transition to a neighbouring state is equal among all possibilities, i.e.

Assume utility function is additive, i.e.
U([s0,s1,... ,sn]) = reward(s0) + U([s1,... ,sn])
with e.g. pected utility of a state is the expected reward-to-go of that state.

Utility to be Learned

Can be learned by Least Mean Squares approach, short LMS, (also called adaptive control theory). It assumes that the observed reward-to-go on that sequence provides direct evidence of the actual expected reward-to-go. At end of each sequence: calculate reward-to-go for each state and update utility


Passive Reinforcement Learning

vars U ;;; table of utility estimates
vars N ;;; table of frequencies for states
vars M ;;; table of transition probabilities from state to state
vars percepts ;;; percept sequence, initially empty
function Passive-RL-Agent(e);;; returns an action
  add e to percepts
  increment N(State(e))
  UPDATE(U,e,percepts,M,N) -> U
  if Terminal?(e)  then nil -> percepts
return action Observe

function LMS-Update(U,e,percepts,M,N);;; returns updated U
  if Terminal?(e) then 0 -> reward-to-go;
  for each ei in percepts (starting at end) do
  reward-to-go+Reward(ei) -> reward-to-go;
  Running-Average(U(State(ei)),reward-to-go,N(state(ei)))
    -> U(State(ei));
end

Summary - Decision Tree Learning


Summary - Reinforcement Learning


Further Reading    



© Manfred Kerber, 2004, Introduction to AI
24.4.2005
The URL of this page is http://www.cs.bham.ac.uk/~mmk/Teaching/AI/Teaching/AI/l3.html.
URL of module http://www.cs.bham.ac.uk/~mmk/Teaching/AI/