Schema Theorem - Why Genetic Algorithm is working?
Overview
- Theorem explains why the GA is working.
- Schema - a template that identifies a subset of strings with similarities at certain string position.
*
the wildcard means 0 or 1 (Doesn’t matter)
S1 = (*01*00110) - this schema is represented by 4 strings (only changes filled):
V1 = (0 0 )
V2 = (0 1 )
V3 = (1 0 )
V4 = (1 1 )
- Schema Order
o(S)
is defined as the number of fixed positions in the template:
S1 = (*1***1**0) o(S1) = 3
S2 = (*1**01*111) o(S2) = 6
- Schema Length
δ(S)
is the distance between first and last fixed position.
S1 = (*1***1**0) δ(S1) = 9 - 2 = 7
S2 = (*1**01*111) δ(S2) = 10 - 2 = 8
- Schema Fitness is the average fitness of all strings matching the schema.
f(V) - Fitness of individual
f(S1) - Fitness of schema1
S1 = (*01*00110)
V1 = (001000110) f(V1) = 0.2
V2 = (001100110) f(V2) = 0.5
V3 = (101000110) f(V3) = 0.8
V4 = (101100110) f(V4) = 0.5
f(schema) = sum(fitness variables from schema) / (num variables from schema)
f(S1) = [f(V1) + f(V2) + f(V3) + f(V4)] / 4 = 4 / 4 = 1
Effect of GA on a schema
Effect of Selection on Schema
- assume fitness proportional selection
f(S1) - Fitness of schema1
f(all) - Average fitness of all individuals
M(S1) - Excepected number of individuals from schema
M(S1) = f(S1) / f(all)
Schemas with fitness f(S1) > f(all)
are likely to appear more in the next generation
Effect of Crossover on Schema
- assume the single-point crossover
Schema S1
survives crossover operation if at least one from a parent and offspring is an instance of the schema S1
Crossover survival example:
S1 = (* 1 0 * *)
P1 = (1 1 0| 1 0) ∈ S1
P2 = (1 0 1| 1 1) ∉ S1
V1 = (1 1 0 1 1) ∈ S1 => Schema SURVIVED
V2 = (1 0 1 1 0) ∉ S1
--------------------------------------
P1 = (1 1| 0 1 0) ∈ S1
P2 = (1 0| 1 1 1) ∉ S1
V1 = (1 1 1 1 1) ∉ S1
V2 = (1 0 0 1 0) ∉ S1
Schema DESTROYED
Calculate probability if crossover occurs within the defining length bits:
m - total length
δ(S1) - Schema defining length of schema S1
Pd(S1) - Probability that crossover happen within defining length of schema S1
Pd(S1) = δ(S1) / (m - 1)
S1 = (*10**)
l = 5
δ(S1) = 5 - 2 = 3
Pd(S1) = 3 / (5 - 1) = 3/4
Schemas with low order (number of fixed positions) are more likely to survive.
Effect of Mutation on Schema
- assume mutation is applied gene by gene
Pm - probability of particular gene mutation
o(S1) - Order of schema S1
Ps(S1) - Probability that schema S1 survives mutation
Ps(S1) = (1 - Pm)^o(S1)
S1 = (*10**)
o(S1) = 2
Pm = 0.01 - Choosen by me
Ps(S1) = (1 - 0.01)^2 = 0.9801
For schema S1
to survive all fixed bits must remain unchanged.
Conclusion
Definitions
Schema theorem - schema with above-average fitness, short defining length and lower order is more likely to survive
Building-block hypothesis - GA works well when short, low-order, highly-fit schemas recombine to form more highly fit, higher-order schemas.
Criticism
Schema theorem assumes that GA maintains an infinitely large population.
Sources
- Z. Michalewicz: Genetic Algorithms + Data Structures =Evolution Programs, Third Edition, Springer-Verlag, Berlin, 1996.
- https://en.wikipedia.org/wiki/Schema_(genetic_algorithms)
Maybe you want to share? :)