The conclusion of a manuscript that I recently submitted for review
The biosphere is replete with organisms that are exquisitely well adapted to the environmental niches they inhabit. Natural sexual evolution has been crucial to the generation of what are arguably the most highly adapted of these organisms — cheetahs, owls, humans etc. A deeply intriguing idea is that we can build adaptation algorithms which, at an abstract level, mimic the behavior of natural sexual evolution, and in doing so, “harness” something of the adaptive power of this incredibly effective process. But what is the abstract level at which natural sexual evolution should be mimicked? In other words given everything we know about natural sexual evolutionary systems, how do we distinguish between aspects of these systems which are essential to their adaptive power, and those which can be viewed as “mere biological detail” and need not be simulated, especially when taking a first swipe at building an artificial evolutionary systems which harnesses the power of natural sexual evolution? For instance is it necessary for such “firstorder” artificial evolutionary systems to simulate hydrogen bonding between the bases of DNA strands? What about diploidy, or the way genomes of organisms are comprised of multiple chromosomes? And how crucial is the fact that crossover takes place between homologous chromosomes during meiosis?
Engineers of yesteryear faced a similar quandary when trying to ascertain just what it is about birds that gives them their capacity for flight? Stories of inventors in feathered suits jumping to their deaths off cliffs and buildings bear testament to the fact that that our initial answers to such questions are often incorrect. It was only after the realization that birds were “using” Bernoulli’s principle to stay aloft that researchers began to make any real progress towards building machines that successfully harnessed the principle underlying birds’ capacity for soaring. One can infer the following general rule from this example: without a good understanding of exactly why a natural system exhibits a certain useful phenomenon, efforts to build artificial systems that exhibit the same phenomenon by mimicking the natural system will be misguided and are unlikely to be successful.
The field of Population Genetics stems from the efforts of its founders — Fisher, Wright and Haldane — to reconcile Darwin’s theory of adaptation by natural selection with Mendel’s theory of genetics (odd as it may now seem, these two theories were once thought to contradict each other. See Okasha, 2006). The literature of this field seems like the most reasonable place to look for answers to questions about how and why adaptation occurs in natural sexually reproducing populations. Unfortunately Population Genetics does not hold ready answers to these questions. Indeed the differing theories of Fisher and Wright regarding exactly this issue has been the subject of a longstanding and ongoing debate (Wade and Goodnight, 1998; Brodie III, 2000). Significant empirical evidence has been gathered by both sides in support of their respective positions yet a definitive answer has not emerged. The absence of a definitive answer makes it difficult to make principled decisions about the level of detail which must be present in an artificial system which seeks, through mimicry, to harness something of the adaptive power of natural sexual evolution.
Let us return to the analogy with the field of aviation that we introduced above. For the sake of argument let us suppose that in the age before the discovery of Bernoulli’s principle someone had, for some reason or the other, succeeded in inventing a simple winged machine which a) mimicked birds at some relatively abstract level, and b) was capable of soaring long distances (even if slowly, or inefficiently). Such a machine would immediately be incredibly interesting because when compared to the complex body of a bird, such a machine would be much more amenable to analysis. The principle underlying this machine’s ability to soar, once derived, could then be used to build “better” soaring machines. This underlying principle would also play a very important part in the development of an accurate theory of why birds can soar (If the inventor of the winged machine gives an incorrect reason for why his machine can soar, that would probably slow down the progression described above, but it would take away nothing from the importance of the winged machine itself.). The implications of this vignette for the importance of the SGA to the fields of Evolutionary Computation and Evolutionary Biology should be evident. The SGA should thus be viewed as a ‘lucky break’, one that can and should be exploited for its potential to advance theories and applications of the adaptive capacity of sexual evolution.
Let us spell out the importance of studying the SGA’s capacity for adaptation. As models of sexual evolutionary systems go, the SGA is arguably the simplest one which has regularly been observed to adapt high-quality solutions despite the almost certain presence of non-trivial epistasis between genomic loci, in other words, despite its application to problems whose representations are in all likelihood riddled with local optima. Because of its effectiveness in spite of its simplicity, the SGA is a model of sexual evolution that is highly likely to a) yield an explanation for the incredible adaptive capacity of sexual evolution and b) precipitate the identification of classes of non-trivial epistasis which do not pose much of a problem for sexual evolution. The SGA is of course not the last word in evolution inspired adaptive systems. Efforts to extend this algorithm in ways that “increase its adaptive power” should and have been made. However if, while attempting to extend the SGA, one works within a flawed paradigm, one is unlikely to capitalize on, and may even compromise, whatever “power” the SGA derives, by virtue of imitation, from natural sexual evolution. A non-dogmatic study of how SGAs perform adaptation has the potential to yield a theory which accurately explains the reasons behind the SGA’s frequent success. Such a theory will probably usher in a new paradigm within which fruitful research into the construction of more “powerful” evolutionary algorithms can proceed. If such a theory differs significantly from those of Fisher and Wright it is likely to have deep implications for the field of Population Genetics and the larger field of Evolutionary Biology (within which several basic questions — why sex? Why punctuated equilibrium? Why diploidy and polyploidy? Why speciation? What is the unit of selection? — have yet to receive satisfactory answers). Finally, a study that reveals the SGA’s capacity for adaptation will also undoubtedly reveal classes of problems that SGAs can efficiently solve. These classes of problems may prove useful to machine learning researchers in their efforts to find semiprincipled reductions of difficult learning problems to problems for which robust and efficient solvers exist.
This paper makes two concrete contributions. Firstly, in sections 2–10 we have derived results which we believe are relevant to the riddle of the SGA’s capacity for adaptation. Secondly in section 11 we have presented results which show that an oft perceived shortcoming of the SGA is misplaced. The following two paragraphs expand upon these contributions.
As we discussed in section 1.2, a promising way to understand the effect of selection and variation on the composition of the evolving population of an SGA is by understanding the multi-generational effect of these operations on the search distribution of the SGA. One way to study an evolving high-dimensional distribution is to study its evolving multivariate marginals. In this paper we derive conditions under which a multivariate marginal of the search distribution of an SGA, with an infinite population of long genomes, can be approximated over multiple generations. In other words we derive conditions under which the frequency dynamics of some family of schemata under the action of an SGA, with an infinite population of long genomes, can be approximated over multiple generations. The conditions we derived in this paper are much weaker than those derived by Wright et al. (2003). This makes our result more useful. The conclusions reached in section 9 are reached by making small leaps of intuition. In section 10 we experimentally validated these conclusions. Our validation, though indirect, is based on assumptions and modeling decisions which are, in our opinion, uncontroversial.
Besides being incorrectly used to support an outlandish hypothesis about what the SGA can do (hierarchical building block assembly), Holland’s Schema Theorem has also heavily shaped opinion about what the SGA cannot do. Following the experiments by Mitchell et al. (1992), and Forrest and Mitchell (1993), the perceived abilities of the SGA stand compromised, yet the perceived limitations of the SGA have remained unchanged. The SGA is currently thought to be incapable of increasing the frequency of a low-order schema with higher than average fitness when the defining length of that schema is large (Goldberg et al., 1989; Goldberg, 2002). In section 11 we argued that this perception is just plain wrong — an SGA can increase the frequency of a low-order schema with higher than average fitness even when the defining length of that schema is large.
In closing we briefly mention that we have recently obtained a new, simple, and (in our opinion) satisfying theory which explains the SGA’s remarkable capacity for adaptation. We have also identified a class of hard statistical problems such that a) the problems in this class can be solved efficiently and robustly by an SGA, and b) this class is likely to be a useful target of machine learning reductions. All of this will soon appear in our forthcoming dissertation. Our theory relies crucially on the SGA’s ability to increase the frequency of schemata of low order and above average fitness, even when the defining bits of those schemata are widely dispersed.