Introduction to AI - Week 2
Expert Systems
* Domain vs case knowledge
* Knowledge represented in rules
* Recognise-act-cycle
* Matching
* Forward vs Backward Chaining
* Rete algorithm
* Conflict resolution
* Uncertainty
_________________________________________________________________________________________________________
Domain Knowledge vs Case Knowledge
Expert knowledge mainly expressed by rules like:
(1) Interest rate is low &
(2) Demand in ?x is high &
(3) Economy is doing well
=> Price of ?x is high with strong evidence (0.8)
Case specific knowledge by facts like knowledge about House prices:
Interest rate is low with 0.9
Demand in houses is high with 0.8
Economy is doing well with 0.9
_________________________________________________________________________________________________________
Inference Rules
Deductive inference rule:
Forward Chaining: Conclude from "A" and "A implies B" to "B".
A
A -> B
-------- ------------- -------------
B
Example:
It is raining.
If it is raining, the street is wet.
-------- ------------- -------------
The street is wet.
Abductive inference rule:
Backward Chaining: Conclude from "B" and "A implies B" to "A".
B
A -> B
-------- ------------- -------------
A
Example:
The street is wet.
If it is raining, the street is wet.
-------- ------------- -------------
It is raining.
_________________________________________________________________________________________________________
Recognise-Act Cycle
A Rule Interpreter can be described as a recognise-act cycle
1. Match the premise patterns of rules against elements in the working memory
2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the
conflict resolution. If no rule applicable, stop.
3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If
termination condition fulfilled stop, else go to step 1.
The termination condition is either defined by a goal state or by a cycle condition (e.g. maximally 100 steps)
_________________________________________________________________________________________________________
Matching
In general there are variables in the rules, which have to be matched when the rule is applied. Variables
stand for arbitrary objects. Consider:
a rule (about the value of horses):
?x is-a horse
?x is-a-parent-of ?y
?y is fast
=> ?x is valuable
and the facts
Comet is-a horse
Prancer is-a horse
Comet is-a-parent-of Dasher
Comet is-a-parent-of Prancer
Prancer is fast
Dasher is-a-parent-of Thunder
Thunder is fast
Thunder is-a horse
Dasher is-a horse
Aslan is-a lion
Find bindings for ?x & ?y so that rule applicable
_________________________________________________________________________________________________________
Matching (Cont'd)
The bindings for ?x is-a horse:
?x -> Comet,
?x -> Prancer,
?x -> Thunder, and
?x -> Dasher
The bindings for ?y is fast
?y -> Prancer and
?y -> Thunder
The bindings for ?x and ?y in ?x is-a-parent-of ?y are:
?x -> Comet,?y -> Dasher; ?x -> Comet,?y -> Prancer; and
?x -> Dasher,?y -> Thunder
I.e., "Comet is-a horse" matches "?x is-a horse" but
"Aslan is-a lion" does not match "?x is-a horse"
Rule applicable with bindings
?x -> Comet,?y -> Prancer and ?x -> Dasher,?y -> Thunder
That is new facts that can be derived are:
valuable(Comet) and valuable(Dasher)
_________________________________________________________________________________________________________
Matching (Cont'd)
Assume variable ?x, ?y, and ?z and the rule
?x is parent of ?y
?y is parent of ?z
--------------------
?x is grandparent of ?z
and the facts:
Lydia is parent of Tony's mother
Sam's uncle is parent of Anne
Anne's cousin is parent of Lydia
Sam is parent of John
Anne is parent of Luke
John is not parent of Barbara
Which facts match the preconditions of the rule?
What can be derived?
_________________________________________________________________________________________________________
Forward Chaining
Forward chaining or data-driven inference works from an initial state, and by looking at the premises of the
rules (IF-part), perform the actions (THEN-part), possibly updating the knowledge base or working memory.
This continues until no more rules can be applied or some cycle limit is met, e.g.
[forward-rules.jpg]
_________________________________________________________________________________________________________
Forward Chaining (Cont'd)
In example: no more rules, that is, inference chain for this is:
[forward-ex.jpg]
Problem with forward chaining:
many rules may be applicable. The whole process is not directed towards a goal.
_________________________________________________________________________________________________________
Backward Chaining
Backward chaining or goal-driven inference works towards a final state, and by looking at the working memory
to see if goal already there. If not look at the actions (THEN-parts) of rules that will establish goal, and
set up subgoals for achieving premises of the rules (IF-part).
This continues until some rule can be applied, apply to achieve goal state.
Advantage of backward chaining:
search is directed
Disadvantage of backward chaining:
goal has to be known
Now look at the example from above with backward chaining
_________________________________________________________________________________________________________
Backward Chaining (Cont'd)
[backward-rules.jpg]
_________________________________________________________________________________________________________
Forward or Backward Reasoning?
Four major factors
More possible start states or goal states?
Move from smaller set of states to the larger
Assume you are at home and you need some bread. You've one initial state and many possible goal states where
to get bread from (Sainsbury's, Tesco, Aldi, Morrison, ASDA, CornerShop, Bakery). That is, in such a situation
it is probably a good approach to search in forward chaining fashion, since you have many chances to hit a
goal. If you choose backward chaining you would have to commit to one of the goals, e.g. the Bakery and then
search backwardly how the get there from home.
If, however, there are 5 people Alice, Bob, Claire, Dora, and Edgar at 5 different places A, B, C, D, and E,
and one of them should get the Tesco home brand of ketchup, then it would be better to start backward chaining
from the Tesco store and stop search when one of the places A, B, C, D, or E is reached.
_________________________________________________________________________________________________________
Forward or Backward Reasoning? (Cont'd)
* Has program to justify reasoning?
Prefer direction that corresponds more closely to the way users think.
* What kind of events triggers problem-solving?
If it is arrival of a new fact, forward chaining makes sense.
If it is a query to which a response is required, backward chaining is more natural.
* In which direction is branching factor greatest?
Go in direction with lower branching factor,
backward chaining better:
[forward-branching.jpg]
forward chaining better:
[backward-branching.jpg]
_________________________________________________________________________________________________________
The Rete-Algorithm
Remember: Efficiency problem with matching:
Naive approach of recognise-act cycle:
In order to find all applicable rules, try to match all elements in the working memory against all premises in
all rules,
i.e. for n rules with m premises on average and k elements in the working memory, it is necessary to check
n* m* k
possible matches in each single cycle.
Observation:
* Rules have parts of the conditions in common. (called "structural similarity")
* By the application of a rule the working memory remains almost unchanged (called "temporal redundancy")
Method: Compile condition parts of the rule in a net
(Italian for net: "rete")
_________________________________________________________________________________________________________
The Rete-Algorithm (Cont'd)
[rete.jpg]
* The net encodes the condition parts (IF-parts) of the rules.
* The input are the changes of the working memory (i.e. new elements or deleted elements. Modification of
elements is simulated by first delete then add modified version)
* The output is the conflict set (i.e., the applicable rules).
_________________________________________________________________________________________________________
The Rete-Algorithm (Cont'd) - A Simple Example
* [Grandfather-Rule:]
father(?a,?x)
father(?x,?b)
--------------------
assert (grandfather(?a,?b))
[rete-ex.jpg]
_________________________________________________________________________________________________________
The Rete-Algorithm (Cont'd)
General Procedure:
* Represent all rules in one net (as above)
* newly generated working memory elements infiltrate the net from above. If they arrive at the bottom, they
enlarge the conflict set with the rule instance, else they are added as internal nodes.
* newly deleted working memory elements are treated analogously, but in this case the conflict set is
possibly narrowed.
Summary: By sharing and considering updates of the working memory elements only, the efficiency problem can be
mainly solved.
_________________________________________________________________________________________________________
Conflict Resolution
recognise-act cycle
1. Match the premise patterns of rules against elements in the working memory
2. If there is more than one rule that can be applied (i.e. that can be "fired"), choose one to apply in the
conflict resolution. If no rule applicable, stop.
3. Apply the chosen rule, perhaps by adding a new item to the working memory or deleting an old one. If
termination condition fulfilled stop, else go to step 1.
Decisions in conflict resolution are crucial since they can dramatically afflict the solution.
Conflict set is a set of pairs of the form
{ production rule, matching working memory elements}
Distinguish: general conflict resolution and
domain-specific conflict resolution
_________________________________________________________________________________________________________
Example
* [R1:]
engine does not turn
battery is not flat
--------------------
ask user to test starter motor
* [R2:]
there is no spark
--------------------
ask user to check the points
* [R3:]
engine turns
engine does not start
--------------------
ask user to check the spark
* [R4:]
engine does not turn
--------------------
ask user to check the battery
* [R5:]
battery is flat
--------------------
ask user to charge the battery
EXIT
Initial facts: engine does not turn, battery is not flat
Conflict set: {{R1, engine does not turn, battery is not flat},
{R4, engine does not turn}}
_________________________________________________________________________________________________________
General Conflict Resolution Strategies in Ops5
MEA:
1. Delete instantiations of rules if they have already fired. If conflict set empty, exit.
2. Compare generation age of elements in working memory, which match the first condition of rules. Prefer
younger ones.
3. Order the instantiations by the generation age of all the elements.
4. Prefer the most specific rules (i.e. those with most preconditions)
5. Random choice
LEX: as MEA, but without point 2.
_________________________________________________________________________________________________________
General Conflict Resolution Strat. in Ops5 (Cont'd)
The rationale behind these strategies is:
* "1" prevents obvious endless loops (application of the same rule with the same elements from the working
memory)
* "2" and "3" prefer new elements in the working memory. New elements most likely describe the actual
situation.
* "2" in MEA gives a particular semantics to the first conditions (efficiency).
* "4" preference to most specific rules can be used for modelling exceptions.
_________________________________________________________________________________________________________
Problem-Specific Conflict Resolution
Standard way in Ops5 : add extra conditions to rules to avoid conflicts. These extra conditions can be related
to strategies, e.g. what currently is searched for
Example:
* [R1:]
interested_in starter motor
engine does not turn
battery is not flat
--------------------
ask user to test starter motor
Disadvantage: Mixture of heuristic knowledge and factual knowledge. Large knowledge bases are not
maintainable.
_________________________________________________________________________________________________________
Problem-Specific Conflict Resolution (Cont'd)
Hence separate object level and meta-level knowledge
Example:
* [R1:]
engine does not turn
battery is not flat
--------------------
ask user to test starter motor
* [META-RULE-R1:]
interested_in starter motor
--------------------
select-rule R1
_________________________________________________________________________________________________________
How to model uncertainty?
* Randomness: In x out of 100 cases (probability), if the three preconditions are given, the conclusion
follows.
(Sources: random events: rolling dice; sensor fault; complexity of the domain)
* Vagueness: Whenever the preconditions are given, the action part follows, but is appropriate to a degree x
only.
(Sources: imprecision of the representation language, e.g. "tall" person)
* Adequacy: In order to model the expert behaviour appropriately, the rule should be weighted with x.
Mere probability may not be adequate, e.g. if a certain infection is unlikely but fatal, it must be adequately
considered, the certainty value should be higher than the probability.
_________________________________________________________________________________________________________
Uncertainty in Rules and Facts
* Uncertainty in facts is expressed by uncertainty values between -1 and 1.
Fever(mary) with 0.7 and Spots(mary) with 0.6
* Each rule has a certainty value.
Fever(?x)
Spots(?x)
--------------------
Measles(?x) with 0.95
* When applying a rule multiply the certainty value of the condition which closest to 0 with the certainty
value of the rule
Assert Measles(mary) with 0.6 * 0.95 = 0.57
_________________________________________________________________________________________________________
Examples
Facts:
* It is warm
* John has a height of 180cm
* The power is switched off
* The door is open
Rules:
* IF the temperature is more than 21^oC
THEN switch the central heating off.
* IF a person is tall
THEN he can get the book from the shelf.
* IF the power is switched off
THEN the machine does not work.
* IF the door is open
THEN close it.
_________________________________________________________________________________________________________
Probability
(In finite domains) the probability can be measured as the limit of the ratio of the positive events to all
possible events, e.g. with a fair die and positive outcome 5 or 6, in the long run, there is a positive
outcome in 1 out of 3 cases, i.e. Probability P(5 or 6)=1/ 3.
Probability may involve random variables, e.g. the probability that it will be at least 23^oC on New Year's
Eve in Birmingham can be expressed as
P(t>= 23^oC,Birmingham, New_Years_Eve)=0.001
Each random variable has a domain of possible values that it can take on.
_________________________________________________________________________________________________________
Conditional Probabilities
P(a|b) is the conditional probability that a is true given that it is known that b is true. E.g. P( Measles|
Spots)=0.94 is the probability that a patient has measles given that the only symptom is spots.
If other symptoms like fever, then P( Measles| Spots & Fever)
Conditional probabilities can be defined in terms of unconditional probabilities: P(a|b)=P(a & b)/P(b)
a and b are called independent if P(a|b)=P(a). In this case P(a & b)=P(a)P(b).
_________________________________________________________________________________________________________
Bayes' Rule
P(a|b)=P(b|a) P(a)/ P(b)
can be used to calculate conditional probabilities, e.g. with
a= myocardial infarction and
b= chest pain
Problem:
Where to get the numbers from, e.g. a doctor may not know the probability that a myocardial infarction causes
chest pain.
Solution:
Take subjective rather than objective numbers.
_________________________________________________________________________________________________________
Bayesian Belief Networks
A Bayesian Belief Network is a directed acyclic graph that represents causality relationships between domain
variables which consist of
* A set of random variables.
* A set of directed arcs linking pairs of nodes; an arc from a node X to a node Y means that X has a direct
influence on Y.
* Each node has a conditional probability table that quantifies the effects of the parents.
[bayes.jpg]
_________________________________________________________________________________________________________
Fuzzy Logic
The value of a proposition like "John is tall" takes truth value in the interval [0;1] (no just 0 or 1).
E.g.: At what temperature can a person be said to have a fever?
If a person has a temperature of 38^oC then the truth value of Fever is T(Fever)=0.5
[fuzzy.jpg]
_________________________________________________________________________________________________________
Fuzzy Logic (Cont'd)
The truth value of more complex formulae is calculated by:
T(p & q) min(T(p),T(q))
T(p | q) max(T(p),T(q))
T(~ p) 1-T(p)
T(p -> q) min(1,1-T(p)+T(q))
Advantages:
* very successful in commercial applications (control systems)
Problems:
* not always intuitive, e.g. T(p | ~ p) =/= T( True)
_________________________________________________________________________________________________________
The MYCIN Approach
Certainty factor CF in range [-1, 1], x=1 (or x=-1) means that the conclusion is certain to be true (or false,
respectively) if the conditions are completely satisfied. (Acquired from the expert)
This means, in order to derive a fact A, all rules which generate A have to be investigated.
For efficiency reasons, values between -0.2 and 0.2 are ignored.
If a fact can be derived by independent rules, the certainty is calculated by combining the different
certainties:
If X and Y are certainties then combined certainty CC is:
X+Y-X* Y if X,Y>0
CC = { X+Y+X* Y if X,Y<0
(X+Y)/(1-min(abs(X),abs(Y))) else
_________________________________________________________________________________________________________
Certainty Factors to Reduce Search
Let's assume single chain of reasoning:
[certainty0.jpg]
For backward chaining, if B were known for sure, the certainty value of E would be 0.4* 0.7* 0.7=0.19<0.2,
hence be ignored, so backward chaining can stop here. If this is the an isolated chain, the whole chain needs
never to be considered.
Different for
[certainty.jpg]
_________________________________________________________________________________________________________
Discussion of Certainty Factors
Advantages:
* Successful in application
* Stable against changes (e.g. in MYCIN four different values sufficient)
Problems:
* Increased search (efficiency)
* What is the meaning (semantics)?
* Human-like reasoning (adequacy)?
_________________________________________________________________________________________________________
Further Reading [books-shelf1.jpg]
* F. van Harmelen: Meta-level Inference Systems. Pitman & Morgan Kaufmann Publishers, 1991.
* Thomas A. Cooper & Nancy Wogrin, Rule-based Programming with OPS5, Morgan Kaufmann, 1988.
* Hubert L. Dreyfus & Stuart E. Dreyfus, Mind over Machine, The Free Press, 1986.
* Richard A. Frost, Introduction to Knowledge Base Systems, Collins, 1986.
* Peter Jackson, Introduction to Expert Systems, Addison-Wesley, 1990, Second Edition.
* Elaine Rich & Kevin Knight, Artificial Intelligence, McGraw Hill, Second Edition, 1991.
* Many more in the library and on-line, start e.g.,
[1]http://www.aaai.org/AITopics/html/expert.html.
* Many tools available on-line, e.g. CLIPS,
[2]http://www.ghg.net/clips/CLIPS.html.
_________________________________________________________________________________________________________
_________________________________________________________________________________________________________
© Manfred Kerber, 2004, Introduction to AI
24.4.2005
The URL of this page is http://www.cs.bham.ac.uk/~mmk/Teaching/AI/Teaching/AI/l2.html.
URL of module [3]http://www.cs.bham.ac.uk/~mmk/Teaching/AI/
References
1. http://www.aaai.org/AITopics/html/expert.html
2. http://www.ghg.net/clips/CLIPS.html
3. http://www.cs.bham.ac.uk/~mmk/Teaching/AI/