### Introduction to AI - Week 2

Expert Systems
• Domain vs case knowledge
• Knowledge represented in rules
• Recognise-act-cycle
• Matching
• Forward vs Backward Chaining
• Rete algorithm
• Conflict resolution
• Uncertainty

# Domain Knowledge vs Case Knowledge

Expert knowledge mainly expressed by rules like:

 (1) Interest rate is low & (2) Demand in ?x is high & (3) Economy is doing well => Price of ?x is high with strong evidence (0.8)

Case specific knowledge by facts like knowledge about House prices:
 Interest rate is low with 0.9 Demand in houses is high with 0.8 Economy is doing well with 0.9

# Inference Rules

Deductive inference rule:
Forward Chaining: Conclude from "A" and "A implies B" to "B".
A
A ->  B
B
 -------- ------------- -------------

Example:

It is raining.
If it is raining, the street is wet.
The street is wet.
 -------- ------------- -------------

Abductive inference rule:
Backward Chaining: Conclude from "B" and "A implies B" to "A".
B
A ->  B
A
 -------- ------------- -------------

Example:

The street is wet.
If it is raining, the street is wet.
It is raining.
 -------- ------------- -------------

# Recognise-Act Cycle

A Rule Interpreter can be described as a recognise-act cycle

1. Match the premise patterns of rules against elements in the working memory

2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the conflict resolution. If no rule applicable, stop.

3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If termination condition fulfilled stop, else go to step 1.

The termination condition is either defined by a goal state or by a cycle condition (e.g. maximally 100 steps)

# Matching

In general there are variables in the rules, which have to be matched when the rule is applied. Variables stand for arbitrary objects. Consider:
a rule (about the value of horses):

 ?x is-a horse ?x is-a-parent-of ?y ?y is fast => ?x is valuable

and the facts
 Comet is-a horse Prancer is-a horse Comet is-a-parent-of Dasher Comet is-a-parent-of Prancer Prancer is fast Dasher is-a-parent-of Thunder Thunder is fast Thunder is-a horse Dasher is-a horse Aslan is-a lion

Find bindings for ?x & ?y so that rule applicable

# Matching (Cont'd)

The bindings for ?x is-a horse:
?x -> Comet,
?x -> Prancer,
?x -> Thunder, and
?x -> Dasher

The bindings for ?y is fast
?y -> Prancer and
?y -> Thunder

The bindings for ?x and ?y in ?x is-a-parent-of ?y are:
?x -> Comet,?y -> Dasher; ?x -> Comet,?y -> Prancer; and
?x -> Dasher,?y -> Thunder
I.e., "Comet is-a horse" matches "?x is-a horse" but
"Aslan is-a lion" does not match "?x is-a horse"
Rule applicable with bindings
?x -> Comet,?y -> Prancer and ?x -> Dasher,?y -> Thunder
That is new facts that can be derived are:
valuable(Comet) and valuable(Dasher)

# Matching (Cont'd)

Assume variable ?x, ?y, and ?z and the rule

?x is parent of ?y
?y is parent of ?z
--------------------
?x is grandparent of ?z

and the facts:
Lydia is parent of Tony's mother
Sam's uncle is parent of Anne
Anne's cousin is parent of Lydia
Sam is parent of John
Anne is parent of Luke
John is not parent of Barbara

Which facts match the preconditions of the rule?
What can be derived?

# Forward Chaining

Forward chaining or data-driven inference works from an initial state, and by looking at the premises of the rules (IF-part), perform the actions (THEN-part), possibly updating the knowledge base or working memory.
This continues until no more rules can be applied or some cycle limit is met, e.g.

# Forward Chaining (Cont'd)

In example: no more rules, that is, inference chain for this is:

Problem with forward chaining:
many rules may be applicable. The whole process is not directed towards a goal.

# Backward Chaining

Backward chaining or goal-driven inference works towards a final state, and by looking at the working memory to see if goal already there. If not look at the actions (THEN-parts) of rules that will establish goal, and set up subgoals for achieving premises of the rules (IF-part).

This continues until some rule can be applied, apply to achieve goal state.

search is directed
goal has to be known

Now look at the example from above with backward chaining

# Forward or Backward Reasoning?

Four major factors
• More possible start states or goal states?
Move from smaller set of states to the larger
Assume you are at home and you need some bread. You've one initial state and many possible goal states where to get bread from (Sainsbury's, Tesco, Aldi, Morrison, ASDA, CornerShop, Bakery). That is, in such a situation it is probably a good approach to search in forward chaining fashion, since you have many chances to hit a goal. If you choose backward chaining you would have to commit to one of the goals, e.g. the Bakery and then search backwardly how the get there from home.
If, however, there are 5 people Alice, Bob, Claire, Dora, and Edgar at 5 different places A, B, C, D, and E, and one of them should get the Tesco home brand of ketchup, then it would be better to start backward chaining from the Tesco store and stop search when one of the places A, B, C, D, or E is reached.

# Forward or Backward Reasoning? (Cont'd)

• Has program to justify reasoning?
Prefer direction that corresponds more closely to the way users think.
• What kind of events triggers problem-solving?
If it is arrival of a new fact, forward chaining makes sense.
If it is a query to which a response is required, backward chaining is more natural.
• In which direction is branching factor greatest?
Go in direction with lower branching factor,
backward chaining better:

forward chaining better:

# The Rete-Algorithm

Remember: Efficiency problem with matching:
Naive approach of recognise-act cycle:
In order to find all applicable rules, try to match all elements in the working memory against all premises in all rules,
i.e. for n rules with m premises on average and k elements in the working memory, it is necessary to check
n* m* k

possible matches in each single cycle.
Observation:

• Rules have parts of the conditions in common. (called "structural similarity")
• By the application of a rule the working memory remains almost unchanged (called "temporal redundancy")

Method: Compile condition parts of the rule in a net
(Italian for net: "rete")

# The Rete-Algorithm (Cont'd)

• The net encodes the condition parts (IF-parts) of the rules.

• The input are the changes of the working memory (i.e. new elements or deleted elements. Modification of elements is simulated by first delete then add modified version)

• The output is the conflict set (i.e., the applicable rules).

# The Rete-Algorithm (Cont'd) - A Simple Example

• [Grandfather-Rule:]

father(?a,?x)
father(?x,?b)
--------------------
assert (grandfather(?a,?b))

# The Rete-Algorithm (Cont'd)

General Procedure:
• Represent all rules in one net (as above)

• newly generated working memory elements infiltrate the net from above. If they arrive at the bottom, they enlarge the conflict set with the rule instance, else they are added as internal nodes.

• newly deleted working memory elements are treated analogously, but in this case the conflict set is possibly narrowed.

Summary: By sharing and considering updates of the working memory elements only, the efficiency problem can be mainly solved.

# Conflict Resolution

recognise-act cycle
1. Match the premise patterns of rules against elements in the working memory

2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the conflict resolution. If no rule applicable, stop.

3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If termination condition fulfilled stop, else go to step 1.

Decisions in conflict resolution are crucial since they can dramatically afflict the solution.
Conflict set is a set of pairs of the form
{ production rule, matching working memory elements}
Distinguish: general conflict resolution and
domain-specific conflict resolution

# Example

• [R1:]
engine does not turn
battery is not flat
--------------------
ask user to test starter motor

• [R2:]
there is no spark
--------------------
ask user to check the points

• [R3:]
engine turns
engine does not start
--------------------
ask user to check the spark

• [R4:]
engine does not turn
--------------------
ask user to check the battery

• [R5:]
battery is flat
--------------------
ask user to charge the battery
EXIT

Initial facts: engine does not turn, battery is not flat

Conflict set: {{R1, engine does not turn, battery is not flat},
{R4, engine does not turn}
}

# General Conflict Resolution Strategies in Ops5

MEA:

1. Delete instantiations of rules if they have already fired. If conflict set empty, exit.

2. Compare generation age of elements in working memory, which match the first condition of rules. Prefer younger ones.

3. Order the instantiations by the generation age of all the elements.

4. Prefer the most specific rules (i.e. those with most preconditions)

5. Random choice

LEX: as MEA, but without point 2.

# General Conflict Resolution Strat. in Ops5 (Cont'd)

The rationale behind these strategies is:

• "1" prevents obvious endless loops (application of the same rule with the same elements from the working memory)

• "2" and "3" prefer new elements in the working memory. New elements most likely describe the actual situation.

• "2" in MEA gives a particular semantics to the first conditions (efficiency).

• "4" preference to most specific rules can be used for modelling exceptions.

# Problem-Specific Conflict Resolution

Standard way in Ops5 : add extra conditions to rules to avoid conflicts. These extra conditions can be related to strategies, e.g. what currently is searched for

Example:

• [R1:]

interested_in starter motor
engine does not turn
battery is not flat
--------------------
ask user to test starter motor

Disadvantage: Mixture of heuristic knowledge and factual knowledge. Large knowledge bases are not maintainable.

# Problem-Specific Conflict Resolution (Cont'd)

Hence separate object level and meta-level knowledge

Example:

• [R1:]

engine does not turn
battery is not flat
--------------------
ask user to test starter motor

• [META-RULE-R1:]

interested_in starter motor
--------------------
select-rule R1

# How to model uncertainty?

• Randomness: In x out of 100 cases (probability), if the three preconditions are given, the conclusion follows.
(Sources: random events: rolling dice; sensor fault; complexity of the domain)
• Vagueness: Whenever the preconditions are given, the action part follows, but is appropriate to a degree x only.
(Sources: imprecision of the representation language, e.g. "tall" person)
• Adequacy: In order to model the expert behaviour appropriately, the rule should be weighted with x.
Mere probability may not be adequate, e.g. if a certain infection is unlikely but fatal, it must be adequately considered, the certainty value should be higher than the probability.

# Uncertainty in Rules and Facts

• Uncertainty in facts is expressed by uncertainty values between -1 and 1.
Fever(mary) with 0.7 and Spots(mary) with 0.6
• Each rule has a certainty value.

Fever(?x)
Spots(?x)
--------------------
Measles(?x) with 0.95

• When applying a rule multiply the certainty value of the condition which closest to 0 with the certainty value of the rule
Assert Measles(mary) with 0.6 * 0.95 = 0.57

# Examples

Facts:
• It is warm
• John has a height of 180cm
• The power is switched off
• The door is open
Rules:
• IF the temperature is more than 21oC
THEN switch the central heating off.
• IF a person is tall
THEN he can get the book from the shelf.
• IF the power is switched off
THEN the machine does not work.
• IF the door is open
THEN close it.

# Probability

(In finite domains) the probability can be measured as the limit of the ratio of the positive events to all possible events, e.g. with a fair die and positive outcome 5 or 6, in the long run, there is a positive outcome in 1 out of 3 cases, i.e. Probability P(5 or 6)=1/ 3.

Probability may involve random variables, e.g. the probability that it will be at least 23oC on New Year's Eve in Birmingham can be expressed as
P(t>= 23oC,Birmingham, New_Years_Eve)=0.001
Each random variable has a domain of possible values that it can take on.

# Conditional Probabilities

P(a|b) is the conditional probability that a is true given that it is known that b is true. E.g. P( Measles| Spots)=0.94 is the probability that a patient has measles given that the only symptom is spots.

If other symptoms like fever, then P( Measles| Spots &  Fever)

Conditional probabilities can be defined in terms of unconditional probabilities: P(a|b)=P(a &  b)/P(b)

a and b are called independent if P(a|b)=P(a). In this case P(a &  b)=P(a)P(b).

# Bayes' Rule

P(a|b)=P(b|a) P(a)/ P(b)
can be used to calculate conditional probabilities, e.g. with
a= myocardial infarction and
b= chest pain

Problem:
Where to get the numbers from, e.g. a doctor may not know the probability that a myocardial infarction causes chest pain.

Solution:
Take subjective rather than objective numbers.

# Bayesian Belief Networks

A Bayesian Belief Network is a directed acyclic graph that represents causality relationships between domain variables which consist of
• A set of random variables.
• A set of directed arcs linking pairs of nodes; an arc from a node X to a node Y means that X has a direct influence on Y.
• Each node has a conditional probability table that quantifies the effects of the parents.

# Fuzzy Logic

The value of a proposition like "John is tall" takes truth value in the interval [0;1] (no just 0 or 1).

E.g.: At what temperature can a person be said to have a fever?

If a person has a temperature of 38oC then the truth value of Fever is T(Fever)=0.5

# Fuzzy Logic (Cont'd)

The truth value of more complex formulae is calculated by:

 T(p &  q) min(T(p),T(q)) T(p |  q) max(T(p),T(q)) T(~ p) 1-T(p) T(p ->  q) min(1,1-T(p)+T(q))

• very successful in commercial applications (control systems)

Problems:

• not always intuitive, e.g. T(p |  ~ p) =/=  T( True)

# The MYCIN Approach

Certainty factor CF in range [-1, 1], x=1 (or x=-1) means that the conclusion is certain to be true (or false, respectively) if the conditions are completely satisfied. (Acquired from the expert)
This means, in order to derive a fact A, all rules which generate A have to be investigated.
For efficiency reasons, values between -0.2 and 0.2 are ignored.
If a fact can be derived by independent rules, the certainty is calculated by combining the different certainties:
If X and Y are certainties then combined certainty CC is:
 X+Y-X* Y if X,Y>0 CC = { X+Y+X* Y if X,Y<0 (X+Y)/(1-min(abs(X),abs(Y))) else

# Certainty Factors to Reduce Search

Let's assume single chain of reasoning:

For backward chaining, if B were known for sure, the certainty value of E would be 0.4* 0.7* 0.7=0.19<0.2, hence be ignored, so backward chaining can stop here. If this is the an isolated chain, the whole chain needs never to be considered.
Different for

# Discussion of Certainty Factors

• Successful in application
• Stable against changes (e.g. in MYCIN four different values sufficient)

Problems:

• Increased search (efficiency)
• What is the meaning (semantics)?

• F. van Harmelen: Meta-level Inference Systems. Pitman & Morgan Kaufmann Publishers, 1991.

• Thomas A. Cooper & Nancy Wogrin, Rule-based Programming with OPS5, Morgan Kaufmann, 1988.

• Hubert L. Dreyfus & Stuart E. Dreyfus, Mind over Machine, The Free Press, 1986.

• Richard A. Frost, Introduction to Knowledge Base Systems, Collins, 1986.

• Peter Jackson, Introduction to Expert Systems, Addison-Wesley, 1990, Second Edition.

• Elaine Rich & Kevin Knight, Artificial Intelligence, McGraw Hill, Second Edition, 1991.

• Many more in the library and on-line, start e.g.,
http://www.aaai.org/AITopics/html/expert.html.

• Many tools available on-line, e.g. CLIPS,
http://www.ghg.net/clips/CLIPS.html.

© Manfred Kerber, 2004, Introduction to AI
24.4.2005