Schema Theorem - Why Genetic Algorithm is working? :: Jan Górkiewicz

Overview

Theorem explains why the GA is working.
Schema - a template that identifies a subset of strings with similarities at certain string position.
* the wildcard means 0 or 1 (Doesn’t matter)

S1 = (*01*00110) - this schema is represented by 4 strings (only changes filled):
V1 = (0  0     )
V2 = (0  1     )
V3 = (1  0     )
V4 = (1  1     )

Schema Order o(S) is defined as the number of fixed positions in the template:

S1 =  (*1***1**0) o(S1) = 3
S2 = (*1**01*111) o(S2) = 6

Schema Length δ(S) is the distance between first and last fixed position.

S1 =  (*1***1**0) δ(S1) = 9 - 2 = 7
S2 = (*1**01*111) δ(S2) = 10 - 2 = 8

Schema Fitness is the average fitness of all strings matching the schema.

f(V) - Fitness of individual
f(S1) - Fitness of schema1

S1 = (*01*00110)
V1 = (001000110) f(V1) = 0.2
V2 = (001100110) f(V2) = 0.5
V3 = (101000110) f(V3) = 0.8
V4 = (101100110) f(V4) = 0.5

f(schema) = sum(fitness variables from schema) / (num variables from schema) 
f(S1) = [f(V1) + f(V2) + f(V3) + f(V4)] / 4 = 4 / 4 = 1

Effect of GA on a schema

Effect of Selection on Schema

assume fitness proportional selection

f(S1) - Fitness of schema1
f(all) - Average fitness of all individuals
M(S1) - Excepected number of individuals from schema

M(S1) = f(S1) / f(all)

Schemas with fitness f(S1) > f(all) are likely to appear more in the next generation

Effect of Crossover on Schema

assume the single-point crossover

Schema S1 survives crossover operation if at least one from a parent and offspring is an instance of the schema S1

Crossover survival example:

S1 = (* 1 0  * *)

P1 = (1 1 0| 1 0) ∈ S1 
P2 = (1 0 1| 1 1) ∉ S1
V1 = (1 1 0  1 1) ∈ S1 => Schema SURVIVED
V2 = (1 0 1  1 0) ∉ S1

--------------------------------------

P1 = (1 1| 0 1 0) ∈ S1 
P2 = (1 0| 1 1 1) ∉ S1
V1 = (1 1  1 1 1) ∉ S1 
V2 = (1 0  0 1 0) ∉ S1
Schema DESTROYED

Calculate probability if crossover occurs within the defining length bits:

m - total length 
δ(S1) - Schema defining length of schema S1
Pd(S1) - Probability that crossover happen within defining length of schema S1

Pd(S1) = δ(S1) / (m - 1)


S1 = (*10**)
l = 5
δ(S1) = 5 - 2 = 3

Pd(S1) = 3 / (5 - 1) = 3/4

Schemas with low order (number of fixed positions) are more likely to survive.

Effect of Mutation on Schema

assume mutation is applied gene by gene

Pm - probability of particular gene mutation
o(S1) - Order of schema S1
Ps(S1) - Probability that schema S1 survives mutation

Ps(S1) = (1 - Pm)^o(S1)

S1 = (*10**)
o(S1) = 2
Pm = 0.01 - Choosen by me

Ps(S1) = (1 - 0.01)^2 = 0.9801

For schema S1 to survive all fixed bits must remain unchanged.

Conclusion

Definitions

Schema theorem - schema with above-average fitness, short defining length and lower order is more likely to survive

Building-block hypothesis - GA works well when short, low-order, highly-fit schemas recombine to form more highly fit, higher-order schemas.

Criticism

Schema theorem assumes that GA maintains an infinitely large population.

Sources

Z. Michalewicz: Genetic Algorithms + Data Structures =Evolution Programs, Third Edition, Springer-Verlag, Berlin, 1996.
https://en.wikipedia.org/wiki/Schema_(genetic_algorithms)

Representations and Crossovers - Traveling Salesman Problem

Schema Theorem - Why Genetic Algorithm is working?