Introduction to AI - Week 2

Expert Systems

Domain Knowledge vs Case Knowledge

Expert knowledge mainly expressed by rules like:

    (1) Interest rate is low &
    (2) Demand in ?x is high &
    (3) Economy is doing well
  =>   Price of ?x is high with strong evidence (0.8)

Case specific knowledge by facts like knowledge about House prices:
  Interest rate is low with 0.9
  Demand in houses is high with 0.8
  Economy is doing well with 0.9

Inference Rules

Deductive inference rule:
Forward Chaining: Conclude from "A" and "A implies B" to "B".
A ->  B
-------- ------------- -------------


It is raining.
If it is raining, the street is wet.
The street is wet.
-------- ------------- -------------
Abductive inference rule:
Backward Chaining: Conclude from "B" and "A implies B" to "A".
A ->  B
-------- ------------- -------------


The street is wet.
If it is raining, the street is wet.
It is raining.
-------- ------------- -------------

Recognise-Act Cycle

A Rule Interpreter can be described as a recognise-act cycle

  1. Match the premise patterns of rules against elements in the working memory

  2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the conflict resolution. If no rule applicable, stop.

  3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If termination condition fulfilled stop, else go to step 1.

The termination condition is either defined by a goal state or by a cycle condition (e.g. maximally 100 steps)


In general there are variables in the rules, which have to be matched when the rule is applied. Variables stand for arbitrary objects. Consider:
a rule (about the value of horses):

   ?x is-a horse
   ?x is-a-parent-of ?y
   ?y is fast
  =>  ?x is valuable

and the facts
  Comet  is-a  horse
  Prancer  is-a  horse
  Comet  is-a-parent-of  Dasher
  Comet  is-a-parent-of  Prancer
  Prancer  is  fast
  Dasher  is-a-parent-of  Thunder
  Thunder  is  fast
  Thunder  is-a  horse
  Dasher  is-a  horse
  Aslan  is-a  lion

Find bindings for ?x & ?y so that rule applicable

Matching (Cont'd)

The bindings for ?x is-a horse:
?x -> Comet,
?x -> Prancer,
?x -> Thunder, and
?x -> Dasher               

The bindings for ?y is fast
?y -> Prancer and
?y -> Thunder

The bindings for ?x and ?y in ?x is-a-parent-of ?y are:
?x -> Comet,?y -> Dasher; ?x -> Comet,?y -> Prancer; and
?x -> Dasher,?y -> Thunder
I.e., "Comet is-a horse" matches "?x is-a horse" but
"Aslan is-a lion" does not match "?x is-a horse"
Rule applicable with bindings
?x -> Comet,?y -> Prancer and ?x -> Dasher,?y -> Thunder
That is new facts that can be derived are:
valuable(Comet) and valuable(Dasher)

Matching (Cont'd)

Assume variable ?x, ?y, and ?z and the rule

?x is parent of ?y
?y is parent of ?z
?x is grandparent of ?z

and the facts:
Lydia is parent of Tony's mother
Sam's uncle is parent of Anne
Anne's cousin is parent of Lydia
Sam is parent of John
Anne is parent of Luke
John is not parent of Barbara

Which facts match the preconditions of the rule?
What can be derived?

Forward Chaining

Forward chaining or data-driven inference works from an initial state, and by looking at the premises of the rules (IF-part), perform the actions (THEN-part), possibly updating the knowledge base or working memory.
This continues until no more rules can be applied or some cycle limit is met, e.g.

Forward Chaining (Cont'd)

In example: no more rules, that is, inference chain for this is:

Problem with forward chaining:
many rules may be applicable. The whole process is not directed towards a goal.

Backward Chaining

Backward chaining or goal-driven inference works towards a final state, and by looking at the working memory to see if goal already there. If not look at the actions (THEN-parts) of rules that will establish goal, and set up subgoals for achieving premises of the rules (IF-part).

This continues until some rule can be applied, apply to achieve goal state.

Advantage of backward chaining:
search is directed
Disadvantage of backward chaining:
goal has to be known

Now look at the example from above with backward chaining

Backward Chaining (Cont'd)

Forward or Backward Reasoning?

Four major factors
  • More possible start states or goal states?
    Move from smaller set of states to the larger
    Assume you are at home and you need some bread. You've one initial state and many possible goal states where to get bread from (Sainsbury's, Tesco, Aldi, Morrison, ASDA, CornerShop, Bakery). That is, in such a situation it is probably a good approach to search in forward chaining fashion, since you have many chances to hit a goal. If you choose backward chaining you would have to commit to one of the goals, e.g. the Bakery and then search backwardly how the get there from home.
    If, however, there are 5 people Alice, Bob, Claire, Dora, and Edgar at 5 different places A, B, C, D, and E, and one of them should get the Tesco home brand of ketchup, then it would be better to start backward chaining from the Tesco store and stop search when one of the places A, B, C, D, or E is reached.

    Forward or Backward Reasoning? (Cont'd)

    The Rete-Algorithm

    Remember: Efficiency problem with matching:
    Naive approach of recognise-act cycle:
    In order to find all applicable rules, try to match all elements in the working memory against all premises in all rules,
    i.e. for n rules with m premises on average and k elements in the working memory, it is necessary to check
    n* m* k

    possible matches in each single cycle.

    Method: Compile condition parts of the rule in a net
    (Italian for net: "rete")

    The Rete-Algorithm (Cont'd)

    The Rete-Algorithm (Cont'd) - A Simple Example

    The Rete-Algorithm (Cont'd)

    General Procedure:

    Summary: By sharing and considering updates of the working memory elements only, the efficiency problem can be mainly solved.

    Conflict Resolution

    recognise-act cycle
    1. Match the premise patterns of rules against elements in the working memory

    2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the conflict resolution. If no rule applicable, stop.

    3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If termination condition fulfilled stop, else go to step 1.

    Decisions in conflict resolution are crucial since they can dramatically afflict the solution.
    Conflict set is a set of pairs of the form
    { production rule, matching working memory elements}
    Distinguish: general conflict resolution and
    domain-specific conflict resolution


    Initial facts: engine does not turn, battery is not flat

    Conflict set: {{R1, engine does not turn, battery is not flat},
    {R4, engine does not turn}

    General Conflict Resolution Strategies in Ops5


    1. Delete instantiations of rules if they have already fired. If conflict set empty, exit.

    2. Compare generation age of elements in working memory, which match the first condition of rules. Prefer younger ones.

    3. Order the instantiations by the generation age of all the elements.

    4. Prefer the most specific rules (i.e. those with most preconditions)

    5. Random choice

    LEX: as MEA, but without point 2.

    General Conflict Resolution Strat. in Ops5 (Cont'd)

    The rationale behind these strategies is:

    Problem-Specific Conflict Resolution

    Standard way in Ops5 : add extra conditions to rules to avoid conflicts. These extra conditions can be related to strategies, e.g. what currently is searched for


    Disadvantage: Mixture of heuristic knowledge and factual knowledge. Large knowledge bases are not maintainable.

    Problem-Specific Conflict Resolution (Cont'd)

    Hence separate object level and meta-level knowledge


    How to model uncertainty?

    Mere probability may not be adequate, e.g. if a certain infection is unlikely but fatal, it must be adequately considered, the certainty value should be higher than the probability.

    Uncertainty in Rules and Facts




    (In finite domains) the probability can be measured as the limit of the ratio of the positive events to all possible events, e.g. with a fair die and positive outcome 5 or 6, in the long run, there is a positive outcome in 1 out of 3 cases, i.e. Probability P(5 or 6)=1/ 3.

    Probability may involve random variables, e.g. the probability that it will be at least 23oC on New Year's Eve in Birmingham can be expressed as
    P(t>= 23oC,Birmingham, New_Years_Eve)=0.001
    Each random variable has a domain of possible values that it can take on.

    Conditional Probabilities

    P(a|b) is the conditional probability that a is true given that it is known that b is true. E.g. P( Measles| Spots)=0.94 is the probability that a patient has measles given that the only symptom is spots.

    If other symptoms like fever, then P( Measles| Spots &  Fever)

    Conditional probabilities can be defined in terms of unconditional probabilities: P(a|b)=P(a &  b)/P(b)

    a and b are called independent if P(a|b)=P(a). In this case P(a &  b)=P(a)P(b).

    Bayes' Rule

    P(a|b)=P(b|a) P(a)/ P(b)
    can be used to calculate conditional probabilities, e.g. with
    a= myocardial infarction and
    b= chest pain

    Where to get the numbers from, e.g. a doctor may not know the probability that a myocardial infarction causes chest pain.

    Take subjective rather than objective numbers.

    Bayesian Belief Networks

    A Bayesian Belief Network is a directed acyclic graph that represents causality relationships between domain variables which consist of

    Fuzzy Logic

    The value of a proposition like "John is tall" takes truth value in the interval [0;1] (no just 0 or 1).

    E.g.: At what temperature can a person be said to have a fever?

    If a person has a temperature of 38oC then the truth value of Fever is T(Fever)=0.5

    Fuzzy Logic (Cont'd)

    The truth value of more complex formulae is calculated by:

     T(p &  q)  min(T(p),T(q))
     T(p |  q)  max(T(p),T(q))
     T(~ p)  1-T(p)
     T(p ->  q)  min(1,1-T(p)+T(q))



    The MYCIN Approach

    Certainty factor CF in range [-1, 1], x=1 (or x=-1) means that the conclusion is certain to be true (or false, respectively) if the conditions are completely satisfied. (Acquired from the expert)
    This means, in order to derive a fact A, all rules which generate A have to be investigated.
    For efficiency reasons, values between -0.2 and 0.2 are ignored.
    If a fact can be derived by independent rules, the certainty is calculated by combining the different certainties:
    If X and Y are certainties then combined certainty CC is:
       X+Y-X* Y   if X,Y>0
     CC = {  X+Y+X* Y   if X,Y<0
       (X+Y)/(1-min(abs(X),abs(Y)))   else

    Certainty Factors to Reduce Search

    Let's assume single chain of reasoning:

    For backward chaining, if B were known for sure, the certainty value of E would be 0.4* 0.7* 0.7=0.19<0.2, hence be ignored, so backward chaining can stop here. If this is the an isolated chain, the whole chain needs never to be considered.
    Different for

    Discussion of Certainty Factors



    Further Reading    

    © Manfred Kerber, 2004, Introduction to AI
    The URL of this page is
    URL of module