Introduction to AI - Week 3 Learning Decision Trees Reinforcement Learning * Inductive learning involves making uncertain inferences that go beyond our direct experience. [Anderson95] * Scientific discovery by proof is impossible, except one knows the "first" primary premises ... We must obtain these premises by induction. [Aristotle ~330 BC] _________________________________________________________________________________________________________ Decision Trees * A Decision Tree takes as input an object given by a set of properties, output a Boolean value (yes/no decision). Each internal node in the tree corresponds to test of one of the properties. Branches are labelled with the possible values of the test. * Aim: Learn goal concept (goal predicate) from examples * Learning element: Algorithm that builds up the decision tree. * Performance element: decision procedure given by the tree _________________________________________________________________________________________________________ Example Problem to wait for a table at a restaurant. A decision tree decides whether to wait or not in a given situation. Attributes: 1. Alternate: alternative restaurant nearby 2. Bar: bar area to wait 3. Fri/Sat: true on Fridays and Saturdays 4. Hungry: whether we are hungry 5. Patrons: how many people in restaurant (none, some, or full) 6. price: price range (£ , £ £ , £ £ £ ) 7. raining: raining outside 8. reservation: whether we made a reservation 9. type: kind of restaurant (French, Italian, Thai, or Burger) 10. WaitEstimate: estimated wait (<10, 10-30,30-60,>60) _________________________________________________________________________________________________________ Original Decision Tree 111111111111111111111 0 1 0 PATRONS 1 0 1 111111111111111111111 NONE 11 1 11 110 1 1 SOME 1 FULL 1 1 1 1 11111111111 11111111 1 111111111111111111111111111111111 1 1 1 1 1 1 1 NO 1 1 YES 1 1 WAIT-ESTIMATE 1 1 1 1 1 1 1 11111111111 11111111 111111111111111111111111111111111 1111 1 1 1111 1111 <10 11 1 11 10-30 111111 >60 111 1 30-60 11111 1 111111 11 1 11111 11111111 1 111111111111111111111111 111110001111111111111 1111111111 01111011111 1 1 1 1 1 1 1 1 1 ALTERNATE 1 1 HUNGRY 1 1 YES 1 1 NO 1 1 1 1 1 1 1 1 1 111111111111111111111111 111111111111111111111 1111111111 11111111111 1 11 111 1 1 11111 NO 111 1 YES 1 1111 YES 111 1 NO 11 11111 1 1 1 1 11 1 1 1 111 11111111111111111111111101111111 11111101111111111111 1 1111111111111111111111111 0 0 1 0 11111111111 1 1 0 0 1 0 1 1 1 ALTERNATE 1 0 RESERVATION 0 1 FRI/SAT 0 1 YES 1 1 1 0 0 1 0 1 1 1111111111111111111111111 11111111111111111111110001111111 11111110101111111111 11111111111 1 1 1 11 1 11 1 1 11 11 1 NO 11 1 YES NO 1 11 YES NO 1 1 YES 11 1 1 11 1 1 11 1 0 11 1 1 11 1 1 1 1 11111111111111111111 11111111111 11111111111 11111111111 11111111111 01111101111111111111 1 1 1 1 1 1 1 1 1 1 1 1 1 BAR 1 1 YES 1 1 NO 1 1 YES 1 1 NO 1 1 RAINING 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11111111111 11111111111 11111111111 11111111111 11111111111111111111 11111111111111111111 1 11 11 1 1 1111 YES 1 1 NO NO 1 111 YES 1 1 1 11 1 111 11111111111 11111111111 11111111111 111111111111 1 1 1 1 1 1 1 1 ALL LEAVES YES OR NO 1 YES 1 1 NO 0 1 NO 1 1 YES 1 1 1 1 0 1 1 1 1 11111111111 11111111111 11111111111 111111111111 _________________________________________________________________________________________________________ Expressiveness of Decision Trees * Each path through tree corresponds to implication like FORALL r Patrons(r,Full) & WaitEstimate(r,0-10) & Hungry(r,N) -> WillWait(r) Hence Decision tree corresponds to conjunction of implications. * Cannot express tests that refer to two different objects like: EXISTS r[2] Nearby(r[2]) & Price(r,p) & Price(r[2],p[2]) & Cheaper(p[2],p) * Expressiveness essentially propositional logic (no function symbols, no existential quantifier) * Complexity for n attributes is 2^2^n, since for each function 2^n values have to be defined. (e.g. for n=6 there are 2 x 10^19 different functions) * Functions like parity function (1 for even, 0 for odd) or majority function (1 if more than half of the inputs are 1) end in large decision trees. _________________________________________________________________________________________________________ Some Examples Ex Atrributes Goal Alt Bar Fri Hun Pat Price Rain Res Type Est Wait -------- ------------- ------------- X[1] Yes No No Yes Some £ £ £ No Yes French 0-10 Yes X[2] Yes No No Yes Full £ No No Thai 30-60 No X[3] No Yes No No Some £ No No Burger 0-10 Yes X[4] Yes No Yes Yes Full £ Yes No Thai 10-30 Yes X[5] Yes No Yes No Full £ £ £ No Yes French >60 No X[6] No Yes No Yes Some £ £ Yes Yes Italian 0-10 Yes X[7] No Yes No No None £ Yes No Burger 0-10 No X[8] No No No Yes Some £ £ Yes Yes Thai 0-10 Yes X[9] No Yes Yes No Full £ Yes No Burger >60 No X[10] Yes Yes Yes Yes Full £ £ £ No Yes Italian 10-30 No X[11] No No No No None £ No No Thai 0-10 No X[12] Yes Yes Yes Yes Full £ No No Burger 30-60 Yes _________________________________________________________________________________________________________ Different Solutions * Trivial solution: Construct decision tree that has one path to a leaf for each example Given the examples o.k., but else bad * Occam's razor: The most likely hypothesis is the simplest one that is consistent with all observation. * Finding smallest decision trees is intractable, hence heuristic decisions: test the most important attribute first. Most important = makes most difference to the classification of an example. Short paths in the trees, small trees * Compare splitting the examples by testing on attributes (cf. Patrons, Type) _________________________________________________________________________________________________________ Selecting Best Attributes +: X1,X3,X4,X6,X8,X12 -: X2,X5,X6,X9,X10,X11 11111111111111111111 FIRST TEST: PATRONS 0 1 -> GOOD CLASSIFICATION 0 PATRONS 1 0 1 11111111111111111111 11 1 1111 NONE 111 1 1111 FULL 1111 1 111 1 SOME 11 11 1 111 11 1 11 11 1 11 +: +: X1,X3,X6,X8 +: X4,X12 -: X7,X11 -: -: X2,X5,X9,X10 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 +: X1,X3,X4,X6,X8,X12 -: X2,X5,X6,X9,X10,X11 11111111111111111111 FIRST TEST: TYPE 1 0 -> BAD CLASSIFICATION 1 TYPE 0 1 0 01111111111111111110 1 1 FRENCH 11 1 11111 1 11 0 11 1 111 BURGER 11 11 11 ITALIAN 11 THAI 111 1 11 1 11 0 11 111 +: X1 +: X6 +: X4,X8 +: X3,X12 -: X5 -: X10 -: X2,X11 -: X7,X9 _________________________________________________________________________________________________________ Selecting Best Attributes (Cont'd) +: X1,X3,X4,X6,X8,X12 -: X2,X5,X6,X9,X10,X11 11111111111111111111 FIRST TEST: PATRONS 0 1 0 PATRONS 1 0 1 11111111111111111111 11 1 1111 NONE 111 1 1111 FULL 1111 1 111 1 SOME 11 11 1 111 11 1 11 11 1 11 +: +: X1,X3,X6,X8 +: X4,X12 -: X7,X11 -: -: X2,X5,X9,X10 11111111111111111110 1 0 1 HUNGRY 0 SECOND TEST: HUNGRY 1 0 11111111111111111111 111 11 YES 11 111 NO 11 111 11 11 +: X4,X12 +: -: X2,X10 -: X5,X9 _________________________________________________________________________________________________________ Recursive Algorithm * If there are positive and negative examples, choose "best" attribute to split (i.e., Patrons in the example above) * If all remaining examples positive (or all negative), then done * If no examples left, no information, hence default value * If no attributes left, but both positive and negative examples, then problem. Examples have same description but different classification due to incorrect data (noise), not enough information, or nondeterministic domain Then majority vote. _________________________________________________________________________________________________________ DecTreeLearning ID3 function DecTreeL(ex,attr,default);;; returns decision tree ;;; set of examples, of attributes, default for goal predicate if ex empty then return default elseif all ex have same classification then return it elseif attributes empty then return MajVal(ex) else ChooseAttribute(attr,ex) -> best1 new decision tree with root test best1 -> tree1 for each value v[i] of best1 do {elements of ex with best1=v[i}] -> ex[i] DecTreeL(ex[i],attr,MajVal(ex)) -> subtree1 add branch to tree with label v[i] subtree subtree1 end return tree _________________________________________________________________________________________________________ Generated Decision Tree 0 11111111111111111111111111 1 0 1 0 PATRONS 1 0 PATRONS 1 0 1 0 1 11111111 10000111111111111111 NONE 1 101 1 NONE 111111 1 1111111 111 SOME 11 FULL 111111 111 SOME 0 1 FULL 111111 11 1 01111111000111 1 111111111 1111 1 11111111111101111111111111110 0 0 1 1 0 0 0 0 1 1 0 0 0 NO 0 1 YES 1 0 HUNGRY 0 0 NO 0 1 YES 1 0 HUNGRY 0 0 0 1 1 0 0 0111111111111111 1111111111111111 0111111111111111111111111111110 1 11 11 111 11 11 YES 11 11 NO YES 1 111 NO 1 11 11 11 1 1111111111111111 0111111111111111111111111111110 0 0 0 0 0 0 0 0 0 NO 0 0 TYPE 0 0 NO 0 0 TYPE 0 0 1 0 0 0111111111111110 1111111111111111111111111111110 111 1 01 1 11111 1 1111 1111111 FRENCH 0 0 1 1 111 1111101 FRENCH 11 0 ITALIAN 0 THAI 1 111 111111111 BURGER 111 0 ITALIAN 1 THAI 1111 11111111 BURGER 1111 1 11111 111111 1 11111 1111111110111111 1111 111111 1111111111111111 1 1 011111111111111111111111111110 1111111111111111 1 1 1 1 0 0 1 1 1 1 1 NO 1 0 FRI/SAT 0 1 1 1 YES 1 1 NO 1 0 FRI/SAT 0 1 YES 1 1 YES 1 1 1 0 0 1 YES 1 1 1 1111111111111111 0 0 1 1 1111111111111111 011111111111111001111111111110 1111111111111111 0 11 0 111 0 11 NO 1 11 NO 0 11 YES 0 11 YES 1 11 0 11 1 111 1 11 1111111111111111 1111111111111111 0 0 1 1 0 0 1 1 0 NO 0 1 YES 1 0 NO 0 1 YES 1 0 1 1 1 1 11111111111111 1111111111111111 _________________________________________________________________________________________________________ Discussion of the Result Comparison of original tree and learned tree: * Trees differ. * Learned tree is smaller (no test for raining and reservation, since all examples can be classified without them). * Detects regularities (waiting for Thai food on weekends). * Can make mistakes (e.g. case where the wait is <10, but the restaurant is full and not hungry). * Question: if consistent, but incorrect tree, how correct is the tree? _________________________________________________________________________________________________________ Assessing the Performance * Collect a large set of examples. * Divide it into two disjoint sets: training set and test set. * Learn algorithm with training set and generate hypothesis H. * Measure percentage of examples in test set that are correctly classified by H. * Repeat steps 1 to 4 for different sets. _________________________________________________________________________________________________________ Assessing the Performance (Cont'd) Learning curve shows increase of the quality of the prediction, when training set grows. 1 0 00 000 01 0 100% 0 11 0 11 0 0 101101 0 0 00 0 0101 11 0 1111011 0 111111101 0 1 1 11 0 111101 0 100 0 1001 0 101 1 0 10 0 01 0 10001 0 1 011 0 111 0 11 0 01 0 101 0 00 0 1 0 11 0 1 0 1 0 0 0 0 0 0 0 11 0 1 0 0 0 0 0 0 0 0 1010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0% 0 0001 0 NUMBER OF EXAMPLES _________________________________________________________________________________________________________ Applications * ID3 used to classify boards in a chess endgame. ID3 had to recognise boards that led to a loss within 3 moves. Classification of half a million positive situations from 1.4 million different possible boards. Typical learning curve as result. * Building up an expert system for designing gas-oil separation systems for oil platforms. gasoil XPS of BP with 2500 rules. Building by hand: 10 person-years, using decision-tree learning 100 person-days. * Learning to fly: Flight simulator, generated by watching three skilled human pilots. 90,000 examples and 20 state variables labelled by the action taken. Extract decision tree which was translated into C code. Program could fly better than its teachers. _________________________________________________________________________________________________________ Finding Best Attributes In order to build up small decision trees: select best attributes first (best = most informative) * measure information in bits * One bit of information is enough to answer a yes/no question about which one has no idea (e.g. flip of a fair coin) * If the possible answers u[i] have probabilities P(u[i]), then I(P(u[1]),... ,P(u[n])) = SUM[i=1]^n -P(u[i])* ld(P(u[i])) * e.g., fair coin: I(½,½) = 1 (1 bit) * e.g., if we know already the outcome by 99%, the information of the real outcome has the (expected) information of: I(1/100,99/100)=0.08 bits. If we know outcome by 100%, no additional information, I=0. _________________________________________________________________________________________________________ Logarithm 0 Y 000 000 0 3 101 0 10000000001 0 100000001 0 100000001 0 10000001 0 0000001 0 1000001 0 100001 0 10001 0 0000 Y=LD(X) 0 0001 0 1001 0 100 0 001 0 100 0 0.125 00 0 10 00 0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0 00 8 00 0 00 0 01 X 0 1 0 0 01 0 00 0 0 0 00 0 0 0 00 0 01 0 0 0 0 0 0 001 00 -3 1100 0 0 11 ld(x) (dual logarithm) is defined for every positive real number x such that 2^ld(x) = x _________________________________________________________________________________________________________ Logarithm (Cont'd) Some values: x 1 2 4 8 10 16 1/2 1/4 1/8 ld(x) 0 1 2 3 3.32 4 -1 -2 -3 lim[x-> 0+]ld(x)=-infty lim[x-> 0+]x*ld(x)=0 Remember: log[10]x= log[10]2 * ld(x) _________________________________________________________________________________________________________ Calculations in the Examples I(½,½) = SUM[i=1]^2 -P(u[i])* ld(P(u[i])) = -½*ld(½)-½*ld(½) = -½* (-1)-½* (-1) = ½+½ = 1 with P(u[i])=½ _________________________________________________________________________________________________________ Calculations in the Examples (Cont'd) I(0/ 100,100/ 100) = SUM[i=1]^2 -P(u[i])* ld(P(u[i])) = -0/ 100*ld(0/ 100)-100/ 100*ld(100/ 100) *= -0* (-infty)-1* 0 *= 0+0 = 0 with P(u[1])=0/ 100 and with P(u[2])=100/ 100 (*) Strictly you have to use here lim[x-> 0+]x*ld (x) = 0. _________________________________________________________________________________________________________ Calculations in the Examples (Cont'd) I(1/ 100,99/ 100) = SUM[i=1]^2 -P(u[i])* ld(P(u[i])) = -1/ 100*ld(1/ 100)-99/ 100*ld(99/ 100) = -1/ 100* (-6.64386)-99/ 100* (-0.0145) = 0.066439 + 0.014355 = 0.080794 with P(u[1])=1/ 100 and with P(u[2])=99/ 100 ld(1/ 100) = -6.64386 and ld(99/ 100) = -0.0145 _________________________________________________________________________________________________________ Applied to Attributes p := number of positive examples n := number of negative examples Information contained in correct answer: I(p/ p+n,n/ p+n)= - p/ p+n ld p/ p+n - n/ p+nldn/ p+n Restaurant example: p=n=6, hence we need 1 bit of information. A test of one single attribute A will not usually give this, but only some of it. A divides example set E into subsets E[1],... ,E[u]. Each subset E[i] has p[i] positive and n[i] negative examples, so we need in this branch an additional I(p[i]/ (p[i]+n[i]),n[i]/ (p[i]+n[i])) bits of information, weighted by (p[i]+n[i])/(p+n) (probability of a random example) HENCE information gain: Gain(A) = I(p/ p+n,n/ p+n) - SUM[i=1]^u p[i]+n[i]/p+n * I(p[i]/ p[i]+n[i],n[i]/ p[i]+n[i]) _________________________________________________________________________________________________________ Heuristics Choose attribute with largest information gain. In the restaurant example, initially: alternative bar friday hungry patrons -------- ------------- ------------- 0.0 0.0 0.020721 0.195709 0.540852 price rain reservation type estimate -------- ------------- ------------- 0.195709 0.0 0.020721 0.0 0.207519 Hence: choose "patrons" _________________________________________________________________________________________________________ Noise and Overfitting * Overfitting, problem not to find meaningless regularity in the data (examples: rolling dice characterised according to attributes like hour, day, month result in perfect decision tree, when no two examples have identical description) * possibility: decision tree pruning by detecting irrelevant attributes. Irrelevant = no information gain for an infinitely large sample. null hypothesis, assumes that there is no underlying pattern. Only if significant deviation (e.g more than 5%) attribute considered. * alternative: cross-validation, i.e. take only part of the data for learning and rest for testing the prediction performance. Repeat with different subsets and select best tree. (can be combined with pruning) _________________________________________________________________________________________________________ Reinforcement Learning Assume the following stochastic environment 01111111111111111111111111 0 1111111111111111111111111101111111111111111111111111 0 11111111111111111111111111 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 010000000000000010 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 + 1 0 1 3 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 10000000000000011 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 01111111111111111111111111 01000000000000000000000000000111111111111111111111111110111111111111111111111111111 1 0 11 0 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 110000000000000011 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 0 0 1 2 0 0 111111111111111111111111 11 0 0 - 1 0 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 010000000000000010 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 1 1 0 111111111111111111111111 11 0 1 01111111111111111111111111 01111111111111111111111111100111111111111111111111111110111111111111111111111111111 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 0 1 1 START 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 01111111111111111111111111101111111111111111111111111110111111111111111111111111110111111111111111111111111111 1 2 3 4 Each training sequence has the form: * (1,1)->(1,2)->(1,3)->(2,3) ->(1,3)->(2,3)->(3,3)->(4,3) reward +1 * (1,1)->(2,1)->(1,1)->(2,1) ->(3,1)->(3,2)->(4,2) reward -1 _________________________________________________________________________________________________________ Rewards Probability for a transition to a neighbouring state is equal among all possibilities, i.e. 0.33 1000001 1.0 00 1 10 0.5 0.5 0 11 0 011 010 0 00 0 1000000 01 11111111001 1000000 100101011111 100001 111 1000 10011111 0 01 10 1 101 10 11 00111 101 01 11 00 0 0 1 1 0 11 1 1 10001 0 11 0 0 0 0 0 +1 0 0 1 0 0 0 0 0 0 0 0 0 10 0 0 0 1 11 11000 0110 1000 101110 11 0 11 011 110 01111110 100 100 1111111 01 11 100000000 10 000 0 101 10 00 00001 11 111 0 0.5 0.33 0 1 0 0 0 1 0 0 0.33 0 0 0 1 0 0.5 110 0 0.5 00 0 00001 1.0 1 0 01 0.33 0 0 0.33 00 00 0 1 0 1 111 0 11 101 0 10 11 1 0 0 00 0 01000000 0 0 0000001 011011 11111 0000101 0 11 11 11 00 10001 0 01 0 0 0 1 0 0 101 0 0 0 0 11 0 01 0 0 0 0 0 -1 0 0 0 0 0 0 0 10 0 0 0 0 0 011 101 011 101 10111101 00 1001 11 101 1001 0 10 1 11 0 0 0.33 01 0 0 0.5 0 0 00 0.5 0 0 0.33 0 0 00 010 0 0 0 10 0.5 0 0 0.5 0.5 0 0 1 0 1 0 0 0.33 0 0 00 11 1 0 0 11 00 0 0 00 0100000010 1001111111111 1000000 0111111111 110000001 00 01 011 1 0 1000000 11 1 11 100111 11111 00011 11 1011111111111 11 11 01 0 1 1 1 10 110 0 0 0 0 0 0 0 1 11 0 0 0 0 0 0 0 0 01 1 0 0 0 0 0 11 11 0100 0100 01 0 1 11 11 00111100 11000000010 01 111 001 1001 101 1101000 101 01 101 10001 1111111 111 110111001 10001 0.5 0.33 0.5 Assume utility function is additive, i.e. U([s[0],s[1],... ,s[n]]) = reward(s[0]) + U([s[1],... ,s[n]]) with e.g. pected utility of a state is the expected reward-to-go of that state. _________________________________________________________________________________________________________ Utility to be Learned 01111111111111111111111111 0 1111111111111111111111111101111111111111111111111111 0 11111111111111111111111111 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 010000000000000010 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 + 1 0 1 3 0 -0.0380 0 0.0886 1 0.2152 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 10000000000000011 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 01111111111111111111111111 01000000000000000000000000000111111111111111111111111110111111111111111111111111111 1 0 11 0 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 110000000000000011 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 0 0 1 2 0 -0.1646 0 111111111111111111111111 11 -0.4430 0 0 - 1 0 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 0 0 1 0 0 111111111111111111111111 11 0 010000000000000010 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 1 0 0 111111111111111111111111 11 0 1 1 0 111111111111111111111111 11 0 1 01111111111111111111111111 01111111111111111111111111100111111111111111111111111110111111111111111111111111111 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 0 1 1 -0.2911 0 -0.380 1 -0.5443 0 -0.7722 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 1 0 1 01111111111111111111111111101111111111111111111111111110111111111111111111111111110111111111111111111111111111 1 2 3 4 Can be learned by Least Mean Squares approach, short LMS, (also called adaptive control theory). It assumes that the observed reward-to-go on that sequence provides direct evidence of the actual expected reward-to-go. At end of each sequence: calculate reward-to-go for each state and update utility _________________________________________________________________________________________________________ Passive Reinforcement Learning vars U ;;; table of utility estimates vars N ;;; table of frequencies for states vars M ;;; table of transition probabilities from state to state vars percepts ;;; percept sequence, initially empty function Passive-RL-Agent(e);;; returns an action add e to percepts increment N(State(e)) UPDATE(U,e,percepts,M,N) -> U if Terminal?(e) then nil -> percepts return action Observe function LMS-Update(U,e,percepts,M,N);;; returns updated U if Terminal?(e) then 0 -> reward-to-go; for each e[i] in percepts (starting at end) do reward-to-go+Reward(e[i]) -> reward-to-go; Running-Average(U(State(e[i])),reward-to-go,N(state(e[i]))) -> U(State(e[i])); end _________________________________________________________________________________________________________ Summary - Decision Tree Learning * Decision Tree Learning: very efficient way of non-incremental learning space. + It adds a subtree to the current tree and continues its search. + It does not backtrack. + It is highly dependent upon the criteria for selecting properties to test. + It can be extended to allow more than two values as result of the classification + It can be extended to deal with noise. _________________________________________________________________________________________________________ Summary - Reinforcement Learning * Reinforcement Learning: incremental learning approach. + We could only give a glimpse of reinforcement learning. + We looked only at the example of a passive agent, which observes the world. Typically you will have an active agent, which can make decisions based on its partial knowledge of the world. + An active agent has to decide whether it should exploit its current knowledge, or explore the world. _________________________________________________________________________________________________________ Further Reading books-shelf1.jpg] * S. Russell, P. Norvig. Artificial Intelligence - A Modern Approach. 2nd Edition, Pearson Education, 2003. Sections 18.3 & 21.2. * G. Luger, W. Stubblefield. Artificial Intelligence - Structures and Strategies for Complex Problem Solving. 2nd Edition, The Benjamin/Cummings Publishing Company, 1993. * J.R. Quinlan, Induction of Decision Trees. Machine Learning, 9(1):81-106, 1986. * J.R. Quinlan, The effect of noise on concept learning. In Michalski et al., eds., Machine Learning: An Artificial Intelligence Approach, Vol. 2. Morgan Kaufmann. 1986. _________________________________________________________________________________________________________ _________________________________________________________________________________________________________ © Manfred Kerber, 2004, Introduction to AI 22.12.2004 The URL of this page is http://www.cs.bham.ac.uk/~mmk/Teaching/AI/Teaching/AI/l3.html. URL of module [1]http://www.cs.bham.ac.uk/~mmk/Teaching/AI/ References 1. http://www.cs.bham.ac.uk/~mmk/Teaching/AI/