FOGA 2013 Slides

Optimization by Genetic Algorithms with uniform crossover is one of the deep mysteries of Evolutionary Computation. At first glance, the efficacy of uniform crossover, an extremely disruptive form of variation, makes no sense. Yet, genetic algorithms with uniform crossover often outperform genetic algorithms with variation operators that induce tight linkage between genetic loci.

At the Foundations of Genetic Algorithms Conference XII (January 16-20, Adelaide), I proposed an explanation for the efficacy of uniform crossover. The hypothesis presented posits that genetic algorithms with uniform crossover implement an intuitive, general purpose, global optimization heuristic called Hyperclimbing extraordinarily efficiently. The final version of the paper is available here. A generalization of the Hyperclimbing Hypothesis that explains optimization by genetic algorithms with less disruptive forms of crossover appears in my dissertation.

The presentation contains several animations, such as the one below, that serve as proof of concept for the Hyperclimbing Hypothesis. To view the animations, click on the “Watch On YouTube” links in the slides above or just download the presentation (93 Mb) and view it in Adobe Acrobat Reader. Note: the pdf reader in Chrome will not display the animations.

Download SpeedyGApy to run the experiments yourself (easy, and highly recommended).

FOGA 2013 Slides

SpeedyGA Ported to Python

Today, I ported SpeedyGA from Matlab to Python. SpeedyGApy is a configurable, single-file, barebones, vectorized, numpy + matplotlib based genetic algorithm that rips.

SpeedyGApy Github Repo

The following video shows the kind of animation that gets displayed when speedyGA is run on a staircase function. Animations like this one serve as proof-of-concept for the Hyperclimbing Hypothesis, a new explanation for optimization in genetic algorithms with uniform crossover. Once you’ve downloaded speedyGApy, you can run this, and other experiments for yourself with the following command:

$ python --fitnessFunction staircase

SpeedyGA Ported to Python

Generative Fixation

One of the mainstays of human engineering is the idea of hierarchical assembly—the assembly of useful low level modules (a.k.a. building blocks) into useful modules at higher levels. Artifacts ranging from nuclear submarines to enterprise software are all constructed in this fashion. The building block hypothesis holds that genetic algorithms also construct solutions using hierarchical assembly, and that the basic building blocks used by these algorithms are short chromosomal snippets that confer above average fitness. Tantalizing as this hypothesis may be, it is based on strong assumptions about the distribution of fitness over a search space. I’ve criticized these assumptions in my dissertation and have proposed a different hypothesis based on weaker assumptions. This alternate hypothesis is grounded in a different metaphor—generative fixation.

Though not as ubiquitously recognizable as hierarchical assembly, generative fixation underlies progress in many areas. Like in the video industry for instance, which only really took off once the VHS/Betamax war had run its course. The “fixation” of VHS within this industry in essence generated new opportunities for advancement. For example, it permitted the development of the video rental business, which brought films from the back rooms of studio houses into living rooms everywhere. Had the tussle between VHS and Betamax continued, many of these opportunities might not have presented themselves. The economics of supporting two formats simultaneously would have been economically crippling for the fledgling industry.

VHS v. Betamax was a case where two contestants competed for dominance along a single dimension—the format accepted by video players. Consider a scenario consisting of multiple dimensions and multiple contestants in each dimension. Suppose, moreover, that no contestant in any dimension is outright superior. That is, the superioriority/non-superiority of the contestants in each dimension is dependent on the state of the contests in the other dimensions. This scenario describes the most commonly arising situation in natural evolution. Here, the “dimensions” are genetic loci, and the “contestants” are alleles. (Statistically, the scenario just described is more likely than one where one or more locus has an allele that is superior regardless of the frequencies of the alleles at other loci.)

The danger in such cases is that progress will stall because no contestant will come to dominate its dimension. The generative fixation hypothesis holds that in a system undergoing recombinative evolutionary dynamics, progress will continue. Although no locus has an allele that is outright superior, a small number of  alleles belonging to different dimensions that play nice together (i.e. confer above average fitness as a group) will come to dominate their respective dimensions. In doing so, they will set the stage for the next multi-dimensional contest over the remaining dimensions. And so on.

I’ve demonstrated the genetic algorithm’s ability to scaleably identify and fix synergistic sets of unlinked non-competing alleles in a recent manuscript and in my dissertation.

Generative Fixation

More Debate, Please!

“… there are many issues in computing that inspire differing
opinions. We would be better off highlighting the differences
rather than pretending they do not exist”
–Moshe Y. Vardi

In an article entitled “More Debate, Please!”, in the January, 2010 issue of Communications of the ACM, Moshe Y. Vardi, editor-in-chief of Communications, writes:

‘Vigorous debate, I believe, exposes all sides of an issue—their strengths and weaknesses. It helps us reach more knowledgeable conclusions. To quote Benjamin Franklin: “When Truth and Error have fair play, the former is always an overmatch for the latter.”’[1]

Vardi goes on to say that as he solicited ideas for the 2008 relaunch of Communications, he was frequently told to keep controversial topics front and center. “Let blood spill over the pages of Communications,” a member of a focus group colorfully urged [1].

When attempting to publish my doctoral research in evolutionary computation journals, I found the sentiments expressed by Vardi to be in short supply. The reviewers seemed much more invested in not rocking the boat than in fostering a climate in which prevailing assumptions can be challenged, and alternate ideas expressed transparently. They seemed, in short, to be inured to the poverty of the field’s foundations, and, for the most part, had little tolerance for someone with a bone to pick with the status quo. “Fall in line, or have your work be rejected,” was the overarching message.

One way this unfortunate state of affairs may be addressed is through the institution of a forum like the Point/Counterpoint section introduced to Communications by Vardi in 2008—a forum where the various controversies that mark our field are periodically featured, and the different sides of each controversy given, as Benjamin Franklin put it, “fair play”. There are several contentious topics in EC. Tapped correctly, many of  these topics can be powerful vehicles for learning—not just about the workings of evolutionary algorithms, but, also, about the workings of a vibrant intellectual community. Right now, instead of vigorous, open, ongoing debates in the EC literature, uneasy truces prevail. The community, by and large, steps around the the really big points of contention. Researchers talk past each other to niche audiences. And, if my experience is anything to go by, new lines of criticism, and new modes of analysis are hastily dismissed.

In the absence of a written record of ongoing controversies, new entrants to the field will not have access to the various positions involved. Pressed for time, and confronting the reality of “publish or perish”, most will fall back on the opinions and practices of their advisors. It doesn’t take much to see that in an environment like this, opportunities for learning and advancement will frequently be missed.

A forum for open, ongoing, collegial debate would  bring awareness, and transparency to the controversies in our field. It would also (one hopes) inculcate a more welcoming attitude toward alternate approaches, conclusions, and critiques.

Two topics for debate:

EC Theory and First Hitting Time:  Is it problematic that so much contemporary theoretical  work in EC focuses on “first hitting time”, i.e., the number of fitness evaluations required to find a global optimum? Do we look at first hitting time only because there currently isn’t a well developed, and generally accepted theoretical framework for examining adaptation (the generation of fitter points over time)? If so, isn’t the study of first hitting time a lot  like the proverbial search for one’s house keys under the light of a street lamp just because it happens to be dark in one’s house?

The Building Block Hypothesis: Can the building block hypothesis be reconciled with the widely reported utility of uniform crossover? If yes, how? If no, can we—more to the point, should we—be comfortable with this knowledge given the considerable influence of the building block hypothesis on contemporary evolutionary computation research?

What other topics have been under-addressed in the evolutionary computation literature? Leave a comment with your opinion, or a link to your own blog post.

[1] Moshe Y. Vardi. More debate please!, In Communications of the ACM 53(1):5, 2010

More Debate, Please!

Hyperclimbing and Decimation

In recent years, probabilistic inference algorithms such as survey propagation and belief propagation have been shown to be remarkably effective at tackling large, random instances of SAT, and other combinatorial optimization problems that lie beyond the reach of previous approaches. These inference algorithms belong to a class of techniques called decimation strategies. Decimation strategies monotonically reduce the size of a problem instance by iteratively fixing partial solutions (partial variable assignments in the case of SAT).

The generative fixation hypothesis essentially states that genetic algorithms work by efficiently implementing a decimation strategy called hyperclimbing.

Hyperclimbing and Decimation