Introduction to AI  Week 2
Expert Systems
 Domain vs case knowledge
 Knowledge represented in rules
 Recogniseactcycle
 Matching
 Forward vs Backward Chaining
 Rete algorithm
 Conflict resolution
 Uncertainty
Domain Knowledge vs Case Knowledge
Expert knowledge mainly expressed by rules like:

(1) Interest rate is low & 

(2) Demand in ?x is high & 

(3) Economy is doing well 
=> 
Price of ?x is high with strong evidence (0.8) 
Case specific knowledge by facts like knowledge about House prices:
Interest rate is low with 0.9 
Demand in houses is high with 0.8 
Economy is doing well with 0.9 
Inference Rules
Deductive inference rule:
Forward Chaining:
Conclude from "A" and "A implies B" to "B".
A
A > B
B
 
 
 
Example:
It is raining.
If it is raining, the street is wet.
The street is wet.
 
 
 
Abductive inference rule:
Backward Chaining:
Conclude from "B" and "A implies B" to "A".
B
A > B
A
 
 
 
Example:
The street is wet.
If it is raining, the street is wet.
It is raining.
 
 
 
RecogniseAct Cycle
A Rule Interpreter can be described as a recogniseact cycle
 Match the premise patterns of rules against elements in
the working memory
 If there is more than one rule that can be applied (i.e. that can
be "fired"), choose one to apply in the conflict
resolution. If no rule applicable, stop.
 Apply the chosen rule, perhaps by adding a new item to
the working memory or deleting an old one. If termination condition
fulfilled stop, else go to step 1.
The termination condition is either defined by a goal state or
by a cycle condition (e.g. maximally 100 steps)
Matching
In general there are variables in the rules, which have to be
matched when the rule is applied. Variables stand for arbitrary objects.
Consider:
a rule (about the value of horses):

?x isa horse 

?x isaparentof ?y 

?y is fast 
=> 
?x is valuable 
and the facts
Comet 
isa 
horse 
Prancer 
isa 
horse 
Comet 
isaparentof 
Dasher 
Comet 
isaparentof 
Prancer 
Prancer 
is 
fast 
Dasher 
isaparentof 
Thunder 
Thunder 
is 
fast 
Thunder 
isa 
horse 
Dasher 
isa 
horse 
Aslan 
isa 
lion 
Find bindings for ?x & ?y so that rule applicable
Matching (Cont'd)
The bindings for
?x isa horse:
?x > Comet,
?x > Prancer,
?x > Thunder, and
?x > Dasher
The bindings for
?y is fast
?y > Prancer and
?y > Thunder
The bindings for ?x and ?y in
?x isaparentof ?y are:
?x > Comet,?y > Dasher;
?x > Comet,?y > Prancer; and
?x > Dasher,?y > Thunder
I.e., "Comet isa horse" matches "?x isa horse" but
"Aslan isa lion" does not
match "?x isa horse"
Rule applicable with bindings
?x > Comet,?y > Prancer and
?x > Dasher,?y > Thunder
That is new facts that can be derived are:
valuable(Comet) and valuable(Dasher)
Matching (Cont'd)
Assume variable ?x, ?y, and ?z and the rule
?x is parent of ?y
?y is parent of ?z

?x is grandparent of ?z
and the facts:
Lydia is parent of Tony's mother
Sam's uncle is parent of Anne
Anne's cousin is parent of Lydia
Sam is parent of John
Anne is parent of Luke
John is not parent of Barbara
Which facts match the preconditions of the rule?
What can be derived?
Forward Chaining
Forward chaining or datadriven inference works from an
initial state, and by looking at the premises of the rules (IFpart),
perform the actions (THENpart), possibly updating the knowledge base
or working memory.
This continues until no more rules can be applied or some cycle
limit is met, e.g.
Forward Chaining (Cont'd)
In example: no more rules, that is,
inference chain for this is:
Problem with forward chaining:
many rules may be applicable. The whole process is not directed
towards a goal.
Backward Chaining
Backward chaining or goaldriven inference works
towards a final state, and by looking at the working memory
to see if goal already there. If not look at the actions (THENparts)
of rules that will establish goal, and set up subgoals for achieving
premises of the rules (IFpart).
This continues until some rule can be applied, apply to achieve
goal state.
Advantage of backward chaining:
search is directed
Disadvantage of backward chaining:
goal has to be known
Now look at the example from above with backward chaining
Backward Chaining (Cont'd)
Forward or Backward Reasoning?
Four major factors
More possible start states or goal states?
Move from smaller set of states to the larger
Assume you are at home and you need some bread. You've one initial state
and many possible goal states where to get bread from (Sainsbury's, Tesco,
Aldi, Morrison, ASDA, CornerShop, Bakery). That is, in such a situation it
is probably a good approach to search in forward chaining fashion, since
you have many chances to hit a goal. If you choose backward chaining you
would have to commit to one of the goals, e.g. the Bakery and then search
backwardly how the get there from home.
If, however, there are 5 people Alice, Bob, Claire, Dora, and Edgar
at 5 different places A, B, C, D, and E, and one of them should get
the Tesco home brand of ketchup, then it would be better to start
backward chaining from the Tesco store and stop
search when one of the places A, B, C, D, or E is reached.
Forward or Backward Reasoning? (Cont'd)
 Has program to justify reasoning?
Prefer direction that corresponds more closely to the way users
think.
 What kind of events triggers problemsolving?
If it is arrival of a new fact, forward chaining makes sense.
If it is a query to which a response is required, backward
chaining is more natural.
 In which direction is branching factor greatest?
Go in direction with lower branching factor,
backward chaining better:
forward chaining better:
The ReteAlgorithm
Remember: Efficiency problem with matching:
Naive approach of recogniseact cycle:
In order to find all applicable rules, try to match all elements in the
working memory against all premises in all rules,
i.e. for n rules
with m premises on average and k elements in the working memory,
it is necessary to check
n* m* k
possible matches in each single cycle.
Observation:
 Rules have parts of the conditions in common. (called
"structural similarity")
 By the application of a rule the working memory remains
almost unchanged (called "temporal redundancy")
Method: Compile condition parts of the rule in a net
(Italian for net: "rete")
The ReteAlgorithm (Cont'd)
 The net encodes the condition parts (IFparts) of the
rules.
 The input are the changes of the working memory (i.e. new
elements or deleted elements. Modification of elements is simulated
by first delete then add modified version)
 The output is the conflict set (i.e., the applicable rules).
The ReteAlgorithm (Cont'd)  A Simple Example
The ReteAlgorithm (Cont'd)
General Procedure:
 Represent all rules in one net (as above)
 newly generated working memory elements infiltrate
the net from above. If they arrive at the bottom, they enlarge
the conflict set with the rule instance, else they are added as
internal nodes.
 newly deleted working memory elements are treated
analogously, but in this case the conflict set is possibly
narrowed.
Summary: By sharing and considering updates of the working
memory elements only, the efficiency problem can be mainly solved.
Conflict Resolution
recogniseact cycle
 Match the premise patterns of rules against elements in
the working memory
 If there is more than one rule that can be applied (i.e. that can
be "fired"), choose one to apply in the conflict
resolution. If no rule applicable, stop.
 Apply the chosen rule, perhaps by adding a new item to
the working memory or deleting an old one. If termination condition
fulfilled stop, else go to step 1.
Decisions in conflict resolution are crucial since they can
dramatically afflict the solution.
Conflict set is a set of pairs of the form
{ production rule, matching working memory elements}
Distinguish: general conflict resolution and
domainspecific conflict resolution
Example
 [R1:]
engine does not turn
battery is not flat

ask user to test starter motor
 [R2:]
there is no spark

ask user to check the points
 [R3:]
engine turns
engine does not start

ask user to check the spark
 [R4:]
engine does not turn

ask user to check the battery
 [R5:]
battery is flat

ask user to charge the battery
EXIT
Initial facts: engine does not turn, battery is not flat
Conflict set:
{{R1, engine does not turn, battery is not flat},
{R4, engine does not turn}}
General Conflict Resolution Strategies in
Ops5
MEA:
 Delete instantiations of rules if they have
already fired. If conflict set empty, exit.
 Compare generation age of elements in working memory,
which match the first condition of rules. Prefer younger ones.
 Order the instantiations by the generation age of all the
elements.
 Prefer the most specific rules (i.e. those with most
preconditions)
 Random choice
LEX: as MEA, but without point 2.
General Conflict Resolution Strat. in
Ops5
(Cont'd)
The rationale behind these strategies is:
 "1" prevents obvious endless loops (application of the same
rule with the same elements from the working memory)
 "2" and "3" prefer new elements in the working memory. New
elements most likely describe the actual situation.
 "2" in MEA gives a particular semantics to the first
conditions (efficiency).
 "4" preference to most specific rules can be used for
modelling exceptions.
ProblemSpecific Conflict Resolution
Standard way in
Ops5
: add extra conditions to rules to avoid
conflicts. These extra conditions can be related to strategies, e.g. what currently is searched for
Example:
Disadvantage: Mixture of heuristic knowledge and factual
knowledge. Large knowledge bases are not maintainable.
ProblemSpecific Conflict Resolution (Cont'd)
Hence separate object level and metalevel knowledge
Example:
How to model uncertainty?
 Randomness: In x out of 100 cases (probability),
if the three preconditions are given, the conclusion follows.
(Sources: random events: rolling dice; sensor fault; complexity of
the domain)
 Vagueness: Whenever the preconditions are given, the
action part follows, but is appropriate to a degree x
only.
(Sources: imprecision of the representation language, e.g.
"tall" person)
 Adequacy: In order to model the expert behaviour
appropriately, the rule should be weighted with x.
Mere probability may not be adequate, e.g.
if a certain infection is unlikely but fatal, it must be
adequately considered, the certainty value should be higher than
the probability.
Uncertainty in Rules and Facts
 Uncertainty in facts is expressed by uncertainty values between
1 and 1.
Fever(mary) with 0.7 and Spots(mary)
with 0.6
 Each rule has a certainty value.
Fever(?x)
Spots(?x)

Measles(?x) with 0.95
 When applying a rule multiply the certainty value of the
condition which closest to 0 with the certainty value of the
rule
Assert Measles(mary) with 0.6 * 0.95 = 0.57
Examples
Facts:
 It is warm
 John has a height of 180cm
 The power is switched off
 The door is open
Rules:
 IF the temperature is more than 21^{o}C
THEN switch the central heating off.
 IF a person is tall
THEN he can get the book from
the shelf.
 IF the power is switched off
THEN the machine does not work.
 IF the door is open
THEN close it.
Probability
(In finite domains) the probability can be measured as the limit of
the ratio of the positive events to all possible
events, e.g. with a fair die and positive outcome 5 or 6, in
the long run, there is a positive outcome in 1 out of 3 cases, i.e.
Probability P(5 or 6)=1/ 3.
Probability may involve random variables, e.g. the probability
that it will be at least 23^{o}C on New Year's Eve in Birmingham can be
expressed as
P(t>= 23^{o}C,Birmingham, New_Years_Eve)=0.001
Each random variable has a domain of possible values that it can take
on.
Conditional Probabilities
P(ab) is the conditional probability that a is true given
that it is known that b is true. E.g.
P( Measles Spots)=0.94
is the probability that a patient has measles given that the
only symptom is spots. If other symptoms like fever, then
P( Measles Spots & Fever)
Conditional probabilities can be defined in terms of unconditional
probabilities:
P(ab)=P(a & b)/P(b)
a and b are called independent if P(ab)=P(a). In this
case P(a & b)=P(a)P(b).
Bayes' Rule
P(ab)=P(ba) P(a)/ P(b)
can be used to calculate conditional probabilities, e.g. with
a= myocardial infarction and
b= chest pain Problem:
Where to get the numbers from, e.g. a doctor may not know the probability
that a myocardial infarction causes chest pain.
Solution:
Take subjective rather than objective numbers.
Bayesian Belief Networks
A Bayesian Belief Network is a directed acyclic graph that
represents causality relationships between domain variables which
consist of
 A set of random variables.
 A set of directed arcs linking pairs of nodes; an arc from
a node X to a node Y means that X has a direct influence on
Y.
 Each node has a conditional probability table that quantifies
the effects of the parents.
Fuzzy Logic
The value of a proposition like "John is tall" takes truth value
in the interval [0;1] (no just 0 or 1).
E.g.: At what temperature can a person be said to have a fever?
If a person has a temperature of 38^{o}C then the truth value of
Fever is T(Fever)=0.5
Fuzzy Logic (Cont'd)
The truth value of more complex formulae is calculated by:
T(p & q) 
min(T(p),T(q)) 
T(p  q) 
max(T(p),T(q)) 
T(~ p) 
1T(p) 
T(p > q) 
min(1,1T(p)+T(q)) 
Advantages:
 very successful in commercial applications (control systems)
Problems:
 not always intuitive, e.g. T(p  ~ p) =/= T( True)
The MYCIN Approach
Certainty factor CF in range [1, 1], x=1 (or x=1)
means that the conclusion is certain to be true (or false,
respectively) if the conditions are completely satisfied. (Acquired
from the expert)
This means, in order to derive a fact A, all rules which generate
A have to be investigated.
For efficiency reasons, values between 0.2 and 0.2 are ignored.
If a fact can be derived by independent rules, the certainty is
calculated by combining the different certainties:
If X and Y are certainties then combined certainty CC is:

X+YX* Y 
if X,Y>0 
CC =
{

X+Y+X* Y 
if X,Y<0 

(X+Y)/(1min(abs(X),abs(Y))) 
else 
Certainty Factors to Reduce Search
Let's assume single chain of reasoning:
For backward chaining, if B were known for sure, the certainty
value of E would be 0.4* 0.7* 0.7=0.19<0.2, hence be
ignored, so backward chaining can stop here.
If this is the an isolated chain, the whole chain needs
never to be considered.
Different for
Discussion of Certainty Factors
Advantages:
 Successful in application
 Stable against changes (e.g. in MYCIN four different values sufficient)
Problems:
 Increased search (efficiency)
 What is the meaning (semantics)?
 Humanlike reasoning (adequacy)?
Further Reading
 F. van Harmelen: Metalevel Inference Systems. Pitman
& Morgan Kaufmann Publishers, 1991.
 Thomas A. Cooper & Nancy Wogrin, Rulebased Programming with
OPS5, Morgan Kaufmann, 1988.
 Hubert L. Dreyfus & Stuart E. Dreyfus, Mind over Machine, The
Free Press, 1986.
 Richard A. Frost, Introduction to Knowledge Base Systems, Collins, 1986.
 Peter Jackson, Introduction to Expert Systems,
AddisonWesley, 1990, Second Edition.
 Elaine Rich & Kevin Knight, Artificial Intelligence,
McGraw Hill, Second Edition, 1991.
 Many more in the library and online, start e.g.,
http://www.aaai.org/AITopics/html/expert.html.
 Many tools available online, e.g. CLIPS,
http://www.ghg.net/clips/CLIPS.html.
© Manfred Kerber, 2004, Introduction to AI
24.4.2005
The URL of this page is http://www.cs.bham.ac.uk/~mmk/Teaching/AI/Teaching/AI/l2.html.
URL of module http://www.cs.bham.ac.uk/~mmk/Teaching/AI/