Introduction to AI - Week 5 Genetic Algorithms Evolutionary Computation (also called evolutionary programming) copied from nature: * evolving successful organisms by reproduction of fit elements. * Fittest elements reproduce most. * random mutation _________________________________________________________________________________________________________ Principles of Darwinism * Set of individuals which can reproduce themselves * Offsprings are similar to their parents. * Each new generation has organisms that are similar to the fit members of the previous generation. Hence adaptation to slowly changing environment * random mutations _________________________________________________________________________________________________________ Why Genetic Algorithms * "Evolution is known to be a successful, robust method for adaptation within biological systems. * GAs can search spaces of hypotheses containing complex interacting parts, where the impact of each part on overall hypothesis fitness may be difficult to model. * Genetic algorithms are easily parallelized and can take advantage of the decreasing costs of powerful computer hardware." Taken from [Tom Mitchell, p.250] (from where also most of the rest of this lecture is adapted.) _________________________________________________________________________________________________________ Genetic Algorithms * measurement of the behaviour by a fitness function * can be seen as form of reinforcement learning (reward = offspring), but search in the space of individuals, no learning of relationship between rewards and actions/states of the environment. * form of hill-climbing, risk getting stuck at local maxima, hence spend only most (but not all) of the resources to the most promising individuals (random selection, mutation) In order to apply genetic algorithms, answer * What is the fitness function? * How is an individual represented? * How are the individuals selected? * How do individuals reproduce? _________________________________________________________________________________________________________ Components of Genetic Algorithms * Fitness function is problem dependent, maps individual to real number. * Representation of individuals as genes, i.e., string over a finite alphabet, e.g., alphabet {A,G,T,C} or {0,1}. * Selection for reproduction is randomised, probability of selection is proportional to the fitness. Usually, selection is done with replacement, so that very fit individuals are selected several times. * Reproduction is accomplished by cross-over and mutation. All selected individuals are randomly paired. For each pair, a cross-over point is randomly chosen. One offspring gets the first part from the first parent and the second part from the second, for the second offspring it is the other way around. Then each gene can be randomly mutated with small independent probability. _________________________________________________________________________________________________________ Pseudo Code for Genetic Algorithms GA(Fitness,Threshold,p,r,m) ;;; Fitness: a function that assign a score given a hypothesis ;;; Threshold: specifies the termination criterion ;;; p: the population size ;;; r: fraction of population replaced at each step ;;; m: mutation rate Randomly initialize population P; While max[h in P] fitness(h) < Threshold do Create new generation P[S] end_do Return h from P with highest fitness. _________________________________________________________________________________________________________ How to create the next generation Create new generation P[S] 1. Probabilistically select (1-r)p members of P to add to P[S]. Probability of an element to be selected in one step is proportional to its fitness, i.e. ratio of its fitness divided by the sum of all fitness values. 2. Probabilistically select [r* p]/[2] pairs of P for crossover. Probability of an element to be selected in one step is proportional to its fitness. For each pair generate two offsprings. Add all offsprings to P[S]. 3. Randomly flip each bit in P[S] in the representation with (small) probability m. 4. Update P as P[S] _________________________________________________________________________________________________________ How to create the next generation (Cont'd) 1. The survival from one generation to the next is proportional to the fitness of an element. (A fit gazelle is less likely to be eaten by a lion than a less fit one.) 2. Propagation of elements is proportional to fitness. Fit elements are more likely to succeed in reproduction than non-fit ones. 3. Totally new features can be generated by random mutations, which occur only with very small probability. They are important in order to avoid local maxima. _________________________________________________________________________________________________________ Example, Key Steps in Genetic Algorithms 11 \/ 000110010111 1 8 32% 11 111010101100 0 10 111010010111 ------> 111010010111 11 1 /\ 11 1 11 1 1 1 1 1 11 1 111 11 1 1 1 11 \/ 0 1 111010101100 1 6 24% 1 000110010111 1 11 000110101100 ------> 000110101100 1 /\ 111 111 1 11 \/ 001110101001 1 6 24% 1101 111010101100 1 111010101001 ------> 111110101001 1 11 /\ 10 ^ 111 1 1 111 1 1 1 11 1 111011011100 1 \/ 01 1 1 5 20% 001110101001 0 01 001110101100 ------> 001110101101 1 /\ 1 1 1 ^ INITIAL POPULATION FITNESS FUNCTION SELECTION CROSS-OVER MUTATION _________________________________________________________________________________________________________ Example Learn decision list for the restaurant waiting problem * Fitness function: number of examples an individual is consistent with. * representation of the attribute/value pairs by 5 bits: 00000: Alternate(x) 00001: ~ Alternate(x) . . . 10111: WaitEstimate(x,0-10) 11000: WaitEstimate(x,10-30) 11001: WaitEstimate(x,30-60) 11010: WaitEstimate(x,60+) One further bit for outcome. Representation of decision lists of length k with up to t tests, we need t(5k+1) bits. Standard selection and reproduction approaches. _________________________________________________________________________________________________________ Genetic Operators Single-point crossover 11101001000 11101010101 M. S. MMa 8MM2 MMa SMMMMM MM8 2MMZ0MM@ MMM MM M0 MMM MM M, MMM MMi MMM ,MM 0MM MM MMM :MM, 8MM XMM SMM ,MM SMM XMM rMM XMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMr MMi MMM MM MMM MM aMM iMM 0MM MM ZMM rMM ;MM rMM rMM :MM ,MM SMM .MM 2M 8MM MM M ZMM MMM2MMMX 0MM XMMMMM MMM @MMi .W 8 00001010101 00001001000 The procedure takes two bitstrings of length n, randomly selects a number i between 1 and n and breaks up the strings at position i and puts together the first part of the first string with the second part of the second as well as the first part of the second string with the second part of the first string. _________________________________________________________________________________________________________ Genetic Operators (Cont'd) Two-point crossover 11101001000 11001011000 M. S. MMa 8MM2 MMa SMMMMM MM8 2MMZ0MM@ MMM MM M0 MMM MM M, MMM MMi MMM ,MM 0MM MM MMM :MM, 8MM XMM SMM ,MM SMM XMM rMM XMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMr MMi MMM MM MMM MM aMM iMM 0MM MM ZMM rMM ;MM rMM rMM :MM ,MM SMM .MM 2M 8MM MM M ZMM MMM2MMMX 0MM XMMMMM MMM @MMi .W 8 00001010101 00101000101 The procedure takes two bitstrings of length n, randomly selects two numbers i and j between 1 and n and breaks up the strings at positions at i and j and puts together the first and last part of the first string with the middle part of the second as well as the first and last part of the second string with the middle part of the first string. _________________________________________________________________________________________________________ Genetic Operators (Cont'd) Uniform Crossover 11101001000 10001000100 M. S. MMa 8MM2 MMa SMMMMM MM8 2MMZ0MM@ MMM MM M0 MMM MM M, MMM MMi MMM ,MM 0MM MM MMM :MM, 8MM XMM SMM ,MM SMM XMM rMM XMM MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMr MMi MMM MM MMM MM aMM iMM 0MM MM ZMM rMM ;MM rMM rMM :MM ,MM SMM .MM 2M 8MM MM M ZMM MMM2MMMX 0MM XMMMMM MMM @MMi .W 8 00001010101 01101011001 The procedure takes two bitstrings of length n, randomly decides for each position of whether the bit of the first or the second should go in the first or in the second child. Point Mutation 11101001000 MMX MMX ,M@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@Zi.WMMMMMMM7, 022222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222; .2MMMMMMW' MM0, MM0, 11101011000 _________________________________________________________________________________________________________ Further Aspects * How should the fitness function look like? E.g. number of e-mails classified correctly compared to all: [correct]/[all]. Or a more complicated measure such as: [correctly-classified-spam]/[spam] - 100*[wrongly-classified-spam]/[genuine]. In general, difficult question how to choose a fitness function. * Elitism: Protect fittest elements from death in order not to lose the fittest elements in the population. _________________________________________________________________________________________________________ Genetic Programming * Extension of genetic algorithms by data structures and general program structures (i.e., not just bit strings). * Automated generation of programs by similar techniques (i.e., randomly generated initial population and evolution by cross-over and mutation). * In principle it is possible to generate automatically arbitrary programs. _________________________________________________________________________________________________________ Tree Representation of Programs Example: Representation of sin(x)+sqrt(x^2+y) in form of a tree. General programs can be represented in a similar form, assumed the corresponding primitives are available, e.g., conditionals, loops, boolean expressions. 100000001 00000 00000 0001 000 001 100 001 100 00 00 100 + 10 100 101 00 00 00 00 001 000 000 000 00001 00000 10000000001 0100 000 00 100 00 000 00 100 00 100 00 100 00 00 00 100 00 00 00 0 1000000000 0000000000000 000001 000001 1001 1000 1001 000 000 000 001 001 00 000 00 001 001 00 00 00 00 00 10 00 100 00 10 sin 00 00 sqrt 00 001 00 0010 00 00 100 00 100 00 100 000 000 000 000 0001 000 10001 10000 1000000000000 0000000000000 0001 001 00 000 00 000 00 001 00 001 00 00 001 001 00 00 001 00 001 00 111111 00 00 1000000000000 000000001 0000 0000 10000 1 1100001 000 000 000 0001 001 00 00 001 00 001 00 00 10 00 001 00 00 x 00 00 00 10 00 00 + 00 00 001 00 00 001 100 00 001 000 000 00 000 0001 1000 000 000 10000000000001 00000 1000000 1111 00 0000000000 000 00 000 00 000 00 000 00 000 00 001 00 000 00 001 00 001 00 00 1001 100000000001 0000000000000 0000 0000 000 0000 000 0001 000 100 000 001 000 00 100 00 00 01 00 01 10 00 00 00 00 00 00 ^ 00 10 y 00 00 101 00 101 00 00 000 00 00 100 000 000 000 1000 10001 0000 00000111100000 0000000000001 0010000001 00 101 00 100 00 00 00 100 00 00 001 100 001 000 001 000 000 000 001 000 100000000001 00000000000 100001 00000 10000 00000 000 000 000 1001 00 00 00 000 00 00 00 000 00 00 101 00 00 00 00 00 00 x 00 00 2 00 00 00 100 00 00 00 00 001 100 001 00 000 0000 0001 000 0001 00001 10000 00001 100000 1000000001 100000011 _________________________________________________________________________________________________________ Crossover in Genetic Programming 10000 000001 0 00 10 0 0 + 0 0 + 0 01 0 10 0 00 001 1001100 100 01 10 0 0 100 0 0 0 10 00 0 0 0 0 00 11 1 01001 000000 0000000 00 0011001 10 00 0 0 0 0 00 0 01 0 01 10 SIN 0 10 ^ 10 10 10 SIN 0 0 SQRT 10 0 01 0 0 01 0 0 01 0 001 001 000000 10 000000 100001 0 1 0 0 1 00 11 0 0 0 1001 01 10 0 0 0 001 0 01 0 0 0010 0001 0 10 00000 0101 00 00 00 101 1001000 0 00 0 00 101 0 0 0 0 00 0 0 0 X 0 01 0 0 X 0 0 2 0 0 + 01 0 0 0 0 + 0 00 01 01 00 10 10 0 01 00 0 00 00000 10000 00000001 0 1000 01000010 01 01 0 1001 0 01 01 0 1000 0 0 0 0 1 1 1 0 00000 1000000 0 1000000 001 100 0 00 00 00 10 0 00 0 01 0 0 0 X 0 0 Y 0 11 0 ^ 0 0 Y 0 0 01 0 01 0 0 00 10 10 00 0 001 00 0001000 0 1000000 000001 0 111 0 0 0 0 01 0 0 01 0 0 0 0 0 0 00000 00001 01 0 0 10 10 10 01 0 0 0 1 0 X 0 0 2 0 01 0 0 0 0 0 0 0 0 01 01 10 101 10 100 10 00 01 00 00 10001 10001 0 01 10 01 10 01 0 0 0 01 00 10 01 00000 1001 0 100 01 0 1000 001 00 0000 0011 001 00 10 10 0 01 100 10 0 00 00 10 + 11 00 01 + 01 00 01 00 10 1 0 10 00 0 0000 000001 00 10 1 0 1 11 10 00 10 0 10 00 10 0 0 1 10 01 10 0 00000 000000 10000 000000 10 00 10 01 00 01 101 00 00 10 10 01 0 0 00 00 0 0 0 0 0 0 0 SIN 0 01 SQRT 00 0 SIN 0 0 ^ 10 00 0 10 0 00 00 0 101 00 01 0 110001 1110 000001 00000 01 0 11 0 01 01 10 0 0 01 10 0 00 01 10 0 1 0 11 1 0 01 0 1000000 100000 0000001 1000000 100000 0 0 00 0 0 0 00 00 10 0 01 10 000 0 0 X 0 10 + 10 0 X 0 0 2 0 00 ^ 00 001 0 10 0 0 01 0 10 10 0 00 01 00 0 1001000 0011000 0001000 1000000 001 000 0 0 11 10 1100 1 0 0 11 01 01 0 0 01 01 0 0 000 000001 1111 0111 01 001 00 01 00 001 100 001 001 01 0 0 11 01 01 0 01 0 010 0 X 0 10 Y 01 0 X 0 0 2 0 000 00 01 00 00 10 00 0 00 00 0000000 1001 000001 000001 00 0 01 0 0 0 0 0000001 0001000 01 00 0 00 0 X 0 0 Y 0 0 00 01 00 0010101 100100 _________________________________________________________________________________________________________ Example: Stacking Blocks Z M M7S22222S7B B M @ SX2SrMrS2Xi M M M M M N M M M M M X S M ,. , M M ::::::: M M @ M M M E M @ @ MiSSXr7XSiM @ @ M M M S M M M W @ MiSSSSSSSiM MrSSSSSSrM 872S22222rM MrS22222S7B MrS22222XBi iBX22222SrM @ @ @ @ @ @ @ @ @ i; ;i @ M M M M M M M M M r; ;r M M R M M V M M U M M L M M A r; ;r I M @ W @ W W W W @ W ii ii @ WSSSSSSSSSW WSSSSSSSSW WSSSSSSSSSW WSSSSSSSSSW WSSSSSSSS77 77SSSSSSSSW MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM Stack blocks which are on a table to spell in a tower the word "universal." Use a set of 166 such initial configurations to evaluate the fitness of a candidate program. _________________________________________________________________________________________________________ Program primitives * CS, current stack, refers to the top block on the stack, or F if there is no current stack. * TB, top correct block, under which all blocks are correct. NN next necessary block to put on the stack. MS(x), move x to the current stack. MT(x), move x to the table. EQ(x,y), x is equal to y. NOT(x), swaps T and F. DU(x,y): Do x until y. _________________________________________________________________________________________________________ Generated Program * The system was initialised with 300 random programs. * After 10 generations the system came up with a program that solves all 166 reference problems. * Generated program: EQ(DU(MT(CS),NOT(CS)), DU(MS(NN),NOT(NN))) The program reads as follows: First move repeatedly the current top of the stack onto the table, until the stack becomes empty. The second "Do Until" statement repeatedly moves the next necessary block form the table onto the stack. EQ is only used to form a sequence of the programs. _________________________________________________________________________________________________________ Co-Evolution * The examples assumed that a fixed environment is provided, with respect to which an optimal member of the population is generated. * There is typically not one single optimal species. In a complicated system different species have different niches. E.g., the lion and the gazelle are very well adapted and evolved. * In co-evolution not only one species is evolved, but different ones simultaneously. * In the context of e-mail spam filtering, we can imagine that a spam filter evolves and simultaneously evolves a system to trick the spam filter. _________________________________________________________________________________________________________ Further Issues * Lamarck had the idea that the learning an individual does during its lifetime shouldn't be lost, but should be inherited to its offsprings. Scientific evidence speaks for the fact that this is not the case in biological evolution. Still it can be useful in artificial evolution. * Genotype vs Phenotype: The mapping from the genotype (the bitstrings/trees on which the evolutionary operations act) to the phenotype (that is the actual individual) is assumed to be trivial in the examples above, but can be complicated in general. _________________________________________________________________________________________________________ Summary Important decisions for a genetic approach: * Size of the population * Reproduction rate * mutation rate Wide range of application, for certain problems very good results (poor on others). Can be computationally very expensive, but is also be rather robust, e.g. the fitness function is not necessarily invariant over time. _________________________________________________________________________________________________________ Further Reading books-shelf1.jpg] * Tom M. Mitchell. Machine Learning. McGraw-Hill, 1997, Chapter 9. * John Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, 1975. * John Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992. * S. Russell, P. Norvig. Artificial Intelligence - A Modern Approach. 2nd Edition, Pearson Education, 2003. pp. 116-120. _________________________________________________________________________________________________________ _________________________________________________________________________________________________________ © Manfred Kerber, 2004, Introduction to AI 22.12.2004 The URL of this page is http://www.cs.bham.ac.uk/~mmk/Teaching/AI/Teaching/AI/l5.html. URL of module [1]http://www.cs.bham.ac.uk/~mmk/Teaching/AI/ References 1. http://www.cs.bham.ac.uk/~mmk/Teaching/AI/