Introduction to AI - Week 2 Expert Systems * Domain vs case knowledge * Knowledge represented in rules * Recognise-act-cycle * Matching * Forward vs Backward Chaining * Rete algorithm * Conflict resolution * Uncertainty _________________________________________________________________________________________________________ Domain Knowledge vs Case Knowledge Expert knowledge mainly expressed by rules like: (1) Interest rate is low & (2) Demand in ?x is high & (3) Economy is doing well => Price of ?x is high with strong evidence (0.8) Case specific knowledge by facts like knowledge about House prices: Interest rate is low with 0.9 Demand in houses is high with 0.8 Economy is doing well with 0.9 _________________________________________________________________________________________________________ Inference Rules Deductive inference rule: Forward Chaining: Conclude from "A" and "A implies B" to "B". A A -> B -------- ------------- ------------- B Example: It is raining. If it is raining, the street is wet. -------- ------------- ------------- The street is wet. Abductive inference rule: Backward Chaining: Conclude from "B" and "A implies B" to "A". B A -> B -------- ------------- ------------- A Example: The street is wet. If it is raining, the street is wet. -------- ------------- ------------- It is raining. _________________________________________________________________________________________________________ Recognise-Act Cycle A Rule Interpreter can be described as a recognise-act cycle 1. Match the premise patterns of rules against elements in the working memory 2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the conflict resolution. If no rule applicable, stop. 3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If termination condition fulfilled stop, else go to step 1. The termination condition is either defined by a goal state or by a cycle condition (e.g. maximally 100 steps) _________________________________________________________________________________________________________ Matching In general there are variables in the rules, which have to be matched when the rule is applied. Variables stand for arbitrary objects. Consider: a rule (about the value of horses): ?x is-a horse ?x is-a-parent-of ?y ?y is fast => ?x is valuable and the facts Comet is-a horse Prancer is-a horse Comet is-a-parent-of Dasher Comet is-a-parent-of Prancer Prancer is fast Dasher is-a-parent-of Thunder Thunder is fast Thunder is-a horse Dasher is-a horse Aslan is-a lion Find bindings for ?x & ?y so that rule applicable _________________________________________________________________________________________________________ Matching (Cont'd) The bindings for ?x is-a horse: ?x -> Comet, ?x -> Prancer, ?x -> Thunder, and ?x -> Dasher The bindings for ?y is fast ?y -> Prancer and ?y -> Thunder The bindings for ?x and ?y in ?x is-a-parent-of ?y are: ?x -> Comet,?y -> Dasher; ?x -> Comet,?y -> Prancer; and ?x -> Dasher,?y -> Thunder I.e., "Comet is-a horse" matches "?x is-a horse" but "Aslan is-a lion" does not match "?x is-a horse" Rule applicable with bindings ?x -> Comet,?y -> Prancer and ?x -> Dasher,?y -> Thunder That is new facts that can be derived are: valuable(Comet) and valuable(Dasher) _________________________________________________________________________________________________________ Matching (Cont'd) Assume variable ?x, ?y, and ?z and the rule ?x is parent of ?y ?y is parent of ?z -------------------- ?x is grandparent of ?z and the facts: Lydia is parent of Tony's mother Sam's uncle is parent of Anne Anne's cousin is parent of Lydia Sam is parent of John Anne is parent of Luke John is not parent of Barbara Which facts match the preconditions of the rule? What can be derived? _________________________________________________________________________________________________________ Forward Chaining Forward chaining or data-driven inference works from an initial state, and by looking at the premises of the rules (IF-part), perform the actions (THEN-part), possibly updating the knowledge base or working memory. This continues until no more rules can be applied or some cycle limit is met, e.g. forward-rules.jpg] _________________________________________________________________________________________________________ Forward Chaining (Cont'd) In example: no more rules, that is, inference chain for this is: forward-ex.jpg] Problem with forward chaining: many rules may be applicable. The whole process is not directed towards a goal. _________________________________________________________________________________________________________ Backward Chaining Backward chaining or goal-driven inference works towards a final state, and by looking at the working memory to see if goal already there. If not look at the actions (THEN-parts) of rules that will establish goal, and set up subgoals for achieving premises of the rules (IF-part). This continues until some rule can be applied, apply to achieve goal state. Advantage of backward chaining: search is directed Disadvantage of backward chaining: goal has to be known Now look at the example from above with backward chaining _________________________________________________________________________________________________________ Backward Chaining (Cont'd) backward-rules.jpg] _________________________________________________________________________________________________________ Forward or Backward Reasoning? Four major factors More possible start states or goal states? Move from smaller set of states to the larger Assume you are at home and you need some bread. You've one initial state and many possible goal states where to get bread from (Sainsbury's, Tesco, Aldi, Morrison, ASDA, CornerShop, Bakery). That is, in such a situation it is probably a good approach to search in forward chaining fashion, since you have many chances to hit a goal. If you choose backward chaining you would have to commit to one of the goals, e.g. the Bakery and then search backwardly how the get there from home. If, however, there are 5 people Alice, Bob, Claire, Dora, and Edgar at 5 different places A, B, C, D, and E, and one of them should get the Tesco home brand of ketchup, then it would be better to start backward chaining from the Tesco store and stop search when one of the places A, B, C, D, or E is reached. _________________________________________________________________________________________________________ Forward or Backward Reasoning? (Cont'd) * Has program to justify reasoning? Prefer direction that corresponds more closely to the way users think. * What kind of events triggers problem-solving? If it is arrival of a new fact, forward chaining makes sense. If it is a query to which a response is required, backward chaining is more natural. * In which direction is branching factor greatest? Go in direction with lower branching factor, backward chaining better: forward-branching.jpg] forward chaining better: backward-branching.jpg] _________________________________________________________________________________________________________ The Rete-Algorithm Remember: Efficiency problem with matching: Naive approach of recognise-act cycle: In order to find all applicable rules, try to match all elements in the working memory against all premises in all rules, i.e. for n rules with m premises on average and k elements in the working memory, it is necessary to check n* m* k possible matches in each single cycle. Observation: * Rules have parts of the conditions in common. (called "structural similarity") * By the application of a rule the working memory remains almost unchanged (called "temporal redundancy") Method: Compile condition parts of the rule in a net (Italian for net: "rete") _________________________________________________________________________________________________________ The Rete-Algorithm (Cont'd) rete.jpg] * The net encodes the condition parts (IF-parts) of the rules. * The input are the changes of the working memory (i.e. new elements or deleted elements. Modification of elements is simulated by first delete then add modified version) * The output is the conflict set (i.e., the applicable rules). _________________________________________________________________________________________________________ The Rete-Algorithm (Cont'd) - A Simple Example * [Grandfather-Rule:] father(?a,?x) father(?x,?b) -------------------- assert (grandfather(?a,?b)) rete-ex.jpg] _________________________________________________________________________________________________________ The Rete-Algorithm (Cont'd) General Procedure: * Represent all rules in one net (as above) * newly generated working memory elements infiltrate the net from above. If they arrive at the bottom, they enlarge the conflict set with the rule instance, else they are added as internal nodes. * newly deleted working memory elements are treated analogously, but in this case the conflict set is possibly narrowed. Summary: By sharing and considering updates of the working memory elements only, the efficiency problem can be mainly solved. _________________________________________________________________________________________________________ Conflict Resolution recognise-act cycle 1. Match the premise patterns of rules against elements in the working memory 2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the conflict resolution. If no rule applicable, stop. 3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If termination condition fulfilled stop, else go to step 1. Decisions in conflict resolution are crucial since they can dramatically afflict the solution. Conflict set is a set of pairs of the form { production rule, matching working memory elements} Distinguish: general conflict resolution and domain-specific conflict resolution _________________________________________________________________________________________________________ Example * [R1:] engine does not turn battery is not flat -------------------- ask user to test starter motor * [R2:] there is no spark -------------------- ask user to check the points * [R3:] engine turns engine does not start -------------------- ask user to check the spark * [R4:] engine does not turn -------------------- ask user to check the battery * [R5:] battery is flat -------------------- ask user to charge the battery EXIT Initial facts: engine does not turn, battery is not flat Conflict set: {{R1, engine does not turn, battery is not flat}, {R4, engine does not turn}} _________________________________________________________________________________________________________ General Conflict Resolution Strategies in Ops5 MEA: 1. Delete instantiations of rules if they have already fired. If conflict set empty, exit. 2. Compare generation age of elements in working memory, which match the first condition of rules. Prefer younger ones. 3. Order the instantiations by the generation age of all the elements. 4. Prefer the most specific rules (i.e. those with most preconditions) 5. Random choice LEX: as MEA, but without point 2. _________________________________________________________________________________________________________ General Conflict Resolution Strat. in Ops5 (Cont'd) The rationale behind these strategies is: * "1" prevents obvious endless loops (application of the same rule with the same elements from the working memory) * "2" and "3" prefer new elements in the working memory. New elements most likely describe the actual situation. * "2" in MEA gives a particular semantics to the first conditions (efficiency). * "4" preference to most specific rules can be used for modelling exceptions. _________________________________________________________________________________________________________ Problem-Specific Conflict Resolution Standard way in Ops5 : add extra conditions to rules to avoid conflicts. These extra conditions can be related to strategies, e.g. what currently is searched for Example: * [R1:] interested_in starter motor engine does not turn battery is not flat -------------------- ask user to test starter motor Disadvantage: Mixture of heuristic knowledge and factual knowledge. Large knowledge bases are not maintainable. _________________________________________________________________________________________________________ Problem-Specific Conflict Resolution (Cont'd) Hence separate object level and meta-level knowledge Example: * [R1:] engine does not turn battery is not flat -------------------- ask user to test starter motor * [META-RULE-R1:] interested_in starter motor -------------------- select-rule R1 _________________________________________________________________________________________________________ How to model uncertainty? * Randomness: In x out of 100 cases (probability), if the three preconditions are given, the conclusion follows. (Sources: random events: rolling dice; sensor fault; complexity of the domain) * Vagueness: Whenever the preconditions are given, the action part follows, but is appropriate to a degree x only. (Sources: imprecision of the representation language, e.g. "tall" person) * Adequacy: In order to model the expert behaviour appropriately, the rule should be weighted with x. Mere probability may not be adequate, e.g. if a certain infection is unlikely but fatal, it must be adequately considered, the certainty value should be higher than the probability. _________________________________________________________________________________________________________ Uncertainty in Rules and Facts * Uncertainty in facts is expressed by uncertainty values between -1 and 1. Fever(mary) with 0.7 and Spots(mary) with 0.6 * Each rule has a certainty value. Fever(?x) Spots(?x) -------------------- Measles(?x) with 0.95 * When applying a rule multiply the certainty value of the condition which closest to 0 with the certainty value of the rule Assert Measles(mary) with 0.6 * 0.95 = 0.57 _________________________________________________________________________________________________________ Examples Facts: * It is warm * John has a height of 180cm * The power is switched off * The door is open Rules: * IF the temperature is more than 21^oC THEN switch the central heating off. * IF a person is tall THEN he can get the book from the shelf. * IF the power is switched off THEN the machine does not work. * IF the door is open THEN close it. _________________________________________________________________________________________________________ Probability (In finite domains) the probability can be measured as the limit of the ratio of the positive events to all possible events, e.g. with a fair die and positive outcome 5 or 6, in the long run, there is a positive outcome in 1 out of 3 cases, i.e. Probability P(5 or 6)=1/ 3. Probability may involve random variables, e.g. the probability that it will be at least 23^oC on New Year's Eve in Birmingham can be expressed as P(t>= 23^oC,Birmingham, New_Years_Eve)=0.001 Each random variable has a domain of possible values that it can take on. _________________________________________________________________________________________________________ Conditional Probabilities P(a|b) is the conditional probability that a is true given that it is known that b is true. E.g. P( Measles| Spots)=0.94 is the probability that a patient has measles given that the only symptom is spots. If other symptoms like fever, then P( Measles| Spots & Fever) Conditional probabilities can be defined in terms of unconditional probabilities: P(a|b)=P(a & b)/P(b) a and b are called independent if P(a|b)=P(a). In this case P(a & b)=P(a)P(b). _________________________________________________________________________________________________________ Bayes' Rule P(a|b)=P(b|a) P(a)/ P(b) can be used to calculate conditional probabilities, e.g. with a= myocardial infarction and b= chest pain Problem: Where to get the numbers from, e.g. a doctor may not know the probability that a myocardial infarction causes chest pain. Solution: Take subjective rather than objective numbers. _________________________________________________________________________________________________________ Bayesian Belief Networks A Bayesian Belief Network is a directed acyclic graph that represents causality relationships between domain variables which consist of * A set of random variables. * A set of directed arcs linking pairs of nodes; an arc from a node X to a node Y means that X has a direct influence on Y. * Each node has a conditional probability table that quantifies the effects of the parents. bayes.jpg] _________________________________________________________________________________________________________ Fuzzy Logic The value of a proposition like "John is tall" takes truth value in the interval [0;1] (no just 0 or 1). E.g.: At what temperature can a person be said to have a fever? If a person has a temperature of 38^oC then the truth value of Fever is T(Fever)=0.5 fuzzy.jpg] _________________________________________________________________________________________________________ Fuzzy Logic (Cont'd) The truth value of more complex formulae is calculated by: T(p & q) min(T(p),T(q)) T(p | q) max(T(p),T(q)) T(~ p) 1-T(p) T(p -> q) min(1,1-T(p)+T(q)) Advantages: * very successful in commercial applications (control systems) Problems: * not always intuitive, e.g. T(p | ~ p) =/= T( True) _________________________________________________________________________________________________________ The MYCIN Approach Certainty factor CF in range [-1, 1], x=1 (or x=-1) means that the conclusion is certain to be true (or false, respectively) if the conditions are completely satisfied. (Acquired from the expert) This means, in order to derive a fact A, all rules which generate A have to be investigated. For efficiency reasons, values between -0.2 and 0.2 are ignored. If a fact can be derived by independent rules, the certainty is calculated by combining the different certainties: If X and Y are certainties then combined certainty CC is: X+Y-X* Y if X,Y>0 CC = { X+Y+X* Y if X,Y<0 (X+Y)/(1-min(abs(X),abs(Y))) else _________________________________________________________________________________________________________ Certainty Factors to Reduce Search Let's assume single chain of reasoning: certainty0.jpg] For backward chaining, if B were known for sure, the certainty value of E would be 0.4* 0.7* 0.7=0.19<0.2, hence be ignored, so backward chaining can stop here. If this is the an isolated chain, the whole chain needs never to be considered. Different for certainty.jpg] _________________________________________________________________________________________________________ Discussion of Certainty Factors Advantages: * Successful in application * Stable against changes (e.g. in MYCIN four different values sufficient) Problems: * Increased search (efficiency) * What is the meaning (semantics)? * Human-like reasoning (adequacy)? _________________________________________________________________________________________________________ Further Reading books-shelf1.jpg] * F. van Harmelen: Meta-level Inference Systems. Pitman & Morgan Kaufmann Publishers, 1991. * Thomas A. Cooper & Nancy Wogrin, Rule-based Programming with OPS5, Morgan Kaufmann, 1988. * Hubert L. Dreyfus & Stuart E. Dreyfus, Mind over Machine, The Free Press, 1986. * Richard A. Frost, Introduction to Knowledge Base Systems, Collins, 1986. * Peter Jackson, Introduction to Expert Systems, Addison-Wesley, 1990, Second Edition. * Elaine Rich & Kevin Knight, Artificial Intelligence, McGraw Hill, Second Edition, 1991. * Many more in the library and on-line, start e.g., [1]http://www.aaai.org/AITopics/html/expert.html. * Many tools available on-line, e.g. CLIPS, [2]http://www.ghg.net/clips/CLIPS.html. _________________________________________________________________________________________________________ _________________________________________________________________________________________________________ © Manfred Kerber, 2004, Introduction to AI 12.4.2005 The URL of this page is http://www.cs.bham.ac.uk/~mmk/Teaching/AI/Teaching/AI/l2.html. URL of module [3]http://www.cs.bham.ac.uk/~mmk/Teaching/AI/ References 1. http://www.aaai.org/AITopics/html/expert.html 2. http://www.ghg.net/clips/CLIPS.html 3. http://www.cs.bham.ac.uk/~mmk/Teaching/AI/