Part 3: Variation Operators, Move Rules, and Another Dialogue
This is the big-money question, the dream: to understand the system well enough to make a model of its long-term evolution. But it is not an easy task.
This formidable challenge has led those who yearn for a precise model of evolutionary change to simplify the system and make assumptions, taming the problem until it is workable. These efforts have been successful in a number of ways, but even their weaknesses proved useful by shining a sharp light on areas in which more empirical research is needed.
From the perspective of the adaptive landscape concept, it is somewhat easy to put these models into two rough categories: bottom-up and top-down. In both, the questions we are asking are how can the system change, and how does the system change. Along the way, we'll have to make assumptions that prompt more questions about evolution and biological systems. What is the nature of variation in populations? Is evolution limited by the supply of mutations? What matters more in understanding evolutionary change: selection on phenotypic variation, or the sources of that variation (the survival of the fittest, or the arrival of the fittest)?
But first, a dialogue:
Tortoise: Achilles! It seems like it has been years!
Achilles: I think it has my dear Tortoise. I haven't seen you since that trip up Mt. Fuji a while back.
Tortoise: Oh yes, how could I have forgotten. So what have you been up to?
Achilles: Well, I was inspired by all of our talk about evolution, so I decided to start studying it myself. Come and take a look (waving Tortoise over towards a flat-file in the corner of the room) ... at these!
Achilles pulled out one of the drawers, which was topped with glass. Underneath the glass were dozens of pictures of field mice, each with a little number scrolled in the bottom corner
Tortoise: Oh dear me, what are these?
Achilles: It's my data. You see, I've been studying the evolution of the huge mouse population in the fields around my house. I've collected data from hundreds of mice.
Tortoise: That seems like quite a bit of work.
Achilles: You have no idea. For each mouse I have to measure their weight and their tail length.
Tortoise: Just those two things?
Achilles: Do you see how many I have to measure?!? Just two things...
Tortoise: Okay, so are they evolving?
Achilles: Yes. At least, I think so. Here, let me show you the adaptive landscape I made. A fellow named Simpson first put me onto the idea that I could make a landscape like this, with "phenotypes" on the axes instead of allele frequencies or gene combinations.
Achilles leads Tortoise over to the table in the middle of the room. On it is a large square matte board with two edges that share a corner labeled "body weight" and "tail length." On top of the board are hundreds of tiny metal rods - like pins or needles but of all different heights.
Achilles: So after I get the data on a mouse's physical characteristics, I ask them about their fitness, about how many kids they've had.
Tortoise: You just ask them?
Achilles: Mice are known for being very honest. So then I put a rod on the board, in a position based on the weight and tail length - see? And the rod's height in inches is how many kids the mouse had - my measure of its "fitness."
Achilles, smiling proudly, grabs the table cloth from the counter and throws it over the hundreds of rods on the board. The heavy cloth falls over the rods, making a kind of cloth topography.
Achilles: And there it is - my very own adaptive landscape! So see up here, with small tail and large body weight, that's the fittest place. But I checked and the mean individual in the population is over here, with a big tail and a small body weight, so - they are going to evolve along this ridge here, and in a few years there will be bigger mice with smaller tails!
Tortoise stands for a moment, staring down at Achilles cloth landscape.
Tortoise: I think you're making some bold assumptions here my friend. How do you even know that these characteristics are heritable - that they get passed down to the kids of these mice? Even if you do understand the heritability, can you really say that the reason those mice (Tortoise gestures towards the "peak" on the landscape) are having more kids is specifically because they are bigger and have shorter tails? Couldn't something else be causing both?
Achilles: I don't see how that would change anything though - I'd still end up with bigger mice with shorter tails.
Tortoise: Well don't you want to understand what causes the changes? What is truly responsible for the variation you are seeing?
Achilles: Well I suppose I should make sure the variation is heritable, like you said. But past that ... it seems like there is plenty of variation, so I don't think it's really important - what is important is this (gesturing again to the cloth landscape).
Tortoise: Well maybe for just those two traits you mentioned - but what about other variation - like the number of toes on each mouse, or the side of the body the heart is on, or - things like that! Those won't fit into your simple continuous axes. Those are due to specific mutations - the real driver behind evolution.
Achilles: Don't be so silly Tortoise, mutations just provide the raw material for natural selection to act on, which there is clearly plenty of. And I'm not convinced by your odd examples either - most evolutionary change can be described by selection on continuous traits.
Tortoise: And what about all of the genetics I taught you? What about DNA, and genotype space?
Achilles: But all of those concepts are so distant from what I care about (pulling out a few pictures of the mice) - the actual animals, and how they change! Isn't that what we want to understand anyways?
Tortoise, fuming now, grabs her jacket from the floor and hurries to the door.
Tortoise: I'll be back.
Achilles: Fine.
Tortoise: Fine.
A month later, after an apologetic but simultaneously intense call from Tortoise, Achilles comes over to her house. Achilles arrives, as usual, a few minutes late, and pulls his armor off as he stoops to fit through the door.
Achilles: My dear friend, I'm glad you invited me over.
Tortoise: I'm glad you came. I'm ever so sorry for getting so upset at your house -
Achilles: I'm sorry too. I was out of line.
Tortoise: Anyways, I have something to show you.
Achilles: What's that?
Mrs. Tortoise leads Achilles over to her computer, and opens a program. On the screen are 8 rows of letters. Each row resembles the others with one or two different letters.
Tortoise: These are variants of the DNA sequence for a small microbe that I've been raising here in my kitchen. It's the same thing that I use to make my bread.
Achilles: That stuff is alive?
Tortoise: I've been using the latest technology to read DNA sequences from it, and I've been able to take clones of this microbe and introduce specific mutations into it. Look, this is the library of 8 sequences that represent all of the possible combinations of 3 different mutations in this one gene.
Tortoise walks across the room to the table on the other side, where she has a little wire cube with tags at each corner.
Tortoise: So for each of the eight combinations, I make a strain of the microbe with that combination and compete it against the original strain in my batch of dough for the day. At the end of the rising, I see what percentage of the total population each strain is. Those are the numbers I've written on the tags on the corners of the cube. They represent the fitness of the strain, so the cube is really a fitness landscape.
Achilles: It doesn't look like a landscape at all!
Tortoise: I know, but it's the concept. Look, this shows exactly how these mutations affect the strains success - this is the true landscape in which change happens, it's exactly accurate to reality.
Achilles: But it has nothing to do with reality! Look at it. I could describe how the animals will actually change with my landscape. All this shows is a measly three mutations. There's no animals, no population, no variation!
Tortoise: But look at this: mutation 1 causes a fitness increase on its own. So does mutation 2. But together, they cause a decrease in fitness. This means that if a strain has mutation 1, it won't get mutation 2, and vice versa. This could be the basis for speciation - for how one species splits into two!
Achilles: Woah I think you're getting a bit ahead of yourself here.
Tortoise: I think you're getting a little jealous.
Achilles: (walking for the door and picking up his armor) Me, jealous? This is ridiculous, I'm leaving.
Tortoise: Fine.
Achilles: Fine.
One month later, the two meet halfway between their houses. Uncharacteristically, they are both right on time. Characteristically, they are both smiling.
Tortoise: Good day.
Achilles: And good day to you too.
Tortoise: Let's skip right to it -
Achilles: Yes, let's. I've fully and completely described the evolution of the mice.
Tortoise: Well I've fully and completely described the evolution of the yeast!
Achilles: Let me explain. I went back out and measured all of the phenotypic characters of the mice. Professor Crab helped me find some college undergraduates who helped me measure the mice - AND we also measured the heritability of each trait. In fact, we measured the additive genetic variance underlying each trait, and the covariances underlying combinations of traits. I couldn't use a cloth anymore, because of all the dimensions, but I used a computer program to simulate the evolution of the population mean value through high-dimensional phenotype space. You see, natural selection pushes the population in one direction on this landscape, "uphill" if you will, and then the population will move generally in that direction - it actually depends on the additive genetic variance and covariances, because if two traits are affected by the same genetic factors then selection on one phenotypic trait can also lead to a change in the other, even in the absence of selection on that trait. It's kind of complicated, I'll get my buddies Lande and Arnold to explain it to you later.
Tortoise: So you did have to look at the genetics?
Achilles: Well I looked at them ... statistically, based on how traits were inherited.
Tortoise: So you couldn't tell me about any of the specific mutations underlying the changes?
Achilles: It's not important. All we need to know is how much additive genetic variation -
Tortoise: But what about the non-additive genetic variation - like I was showing you last time with the three mutations - certain combinations are fit and others aren't. You can't just ignore that - what if one mutation has a really profound effect?
Achilles: Hey, I've heard that most mutations have small effects. And like I said, what is important to selection, and what changes the statistical features of the phenotypes in the population, has to do with additive genetic variation. Which I'm sure there is plenty of. The point is, I can tell you what the mice are going to look like in 10 years!
Tortoise: Well while you were busy measuring your mice, I was busy with the yeast genome. I used a little trick to measure the fitness of every possible sequence for the 12 million bases in the yeast genome.
Achilles: That's impossible. There are about 10^20,000,000 possible combinations. That is BY FAR more than the number of atoms in the universe. Even if you tested one every millisecond since the beginning of the universe in every corner of the universe you still wouldn't get it done.
Tortoise: If I was just doing my tests in this 3-dimensional universe that would be true. But I have my ways. So I've created a 12-million-dimensional hypercube that describes all of the possible yeast sequences. With it I can simulate the path a strain will take as it traverses the hypercube during evolution. At each step I choose a mutation randomly and accept or reject it based on the fitness change.
Achilles: Each step? What does that mean? And how do you track the population mean on your hypercube?
Tortoise: Oh, well I've assumed that at each step, a mutation arises randomly. If it leads to an increase in fitness, it spreads through the entire population, or "fixes," and if not, it is eliminated. Then another mutation arises. Like this, the population takes "steps" around the landscape. I can simulate these steps and discover principles of evolution - like how channelized or repeatable it is. These simulations can reveal to us the best conditions for evolutionary innovations and breakthroughs, and maybe even the details of how speciation happens!
Achilles: Well that's cool, but how can you say that only one mutation arises at a time? You saw how much variation there is in my population. Why would you think that mutation, and not selection, is limiting the process?
Tortoise: Well why do you insist that selection is limiting your process? Couldn't there be changes in the mice that depend on specific combinations of mutations that are not yet present in the population?
Achilles: Well... I don't know.
Tortoise: Me neither... Hey, you know what? I've got an idea. What if I checked all the possible genomes for the mice, but instead of just measuring fitness, I sent the mice to your undergrads -
Achilles: I only have about 100 undergrads -
Tortoise: Okay, I'll have to find some undergrads in high-dimensional space. Anyways, then we'll measure the phenotypes and construct a full genotype to phenotype map. Then we can combine our simulations to look at the evolution of a population in genotype space based on selection at the phenotypic level.
Achilles: Alright. So we would look into the actual genetic variation in the population as a starting point. And we could do a full population simulation, instead of assuming that only one mutation exists per step.
Tortoise: Fine!
Achilles: Fine!
And the two old friends embraced, happy to be collaborating at last.
Bottom-up vs. Top-down Landscapes
Movement through a space of possibilities
The first question these models ask is how you can move between states. One answer is simple - at each step you can change one trait by either flipping it on or off or changing the letter at a specific position in a sequence. In the light-switch example, the edges I drew on the cube represent one-step changes, or one-step neighbors. How I define what is allowed in a step defines the structure of the space. These rules are called "variation operators," and if I wanted to I could define one that would change the diagram and the way states are connected. Say you can only flip two switches at once. Each state still has three one-step neighbors, but the connections have changed.
The three-switch situation. We will describe the state of each switch by a 1 (for on) or a 0 (for off), and put them together in a state string like 000 (all off) or 100 (just the first light on):
A nucleotide sequence. This is how we usually write DNA or RNA sequences anyways, like ATGCGTATGC.
A set of specific mutations. If we know 5 mutations are involved in an adaptation, we might want to test combinations of these specific mutations. 10010 would mean that mutation 1 and mutation 4 are present but the others are not.
Side-note-game: Below are the criteria for a measure of distance that defines a metric space. Can you come up with an example of a variation operator (or list of variation operators/rules) that violate one of these criteria (I can make up ones that violate #3, let me know if you think of ways to break the others!).
1. Non-negativity: the distance between A and B is not negative. d(A, B) ≥ 0
2. Identity of Indiscernibles (good band name?): d(A, B) = 0 if and only if A=B
3. Symmetry: d(A, B) = d(B, A)
4. Triangle inequality: d(A, B) ≤ d(A, C) + d(C, B)
Back to biology. If you've taken a genetics class you might object to the point mutation variation operator, pointing out that larger mutations like deletions, insertions, and inversions should be included as additional variation operators.
To address that point, let's first turn to the first real application of our notion of distance in genotype space: determining the evolutionary distance between two DNA sequences with a common ancestor. Measures of this distance rely on first aligning the two sequences (a step necessary precisely because of insertions and deletions) and then scoring the differences between them. This scoring will have to take into account both differences in single nucleotides and gaps in the alignment caused by deletions or insertions. By coming up with a way to add "gap penalties" into the distance measure, these methods acknowledge those kinds of mutations while still basing the main distance measure on point mutations alone.
For these reasons, when people talk about a fitness landscape, it's interesting to ask 1) what is a "trait" is in the system? and 2) what are the variation operators? For example, a study might be looking at how differences in the DNA sequence for a particular protein affect its function. In this case, the "traits" are the individual nucleotide positions in the sequence, and the variation operator is probably a single point mutation. Distance is measured as the number of positions at which two sequences differ. In another study, researchers might be looking at how the different possible combinations of gene knockouts affect some phenotype. In this case, the "traits" are individual genes that, like the light switches, can be either on (1) or off (0). The variation operator is turning a single gene on or off, so if we represent the state of the system by a string of 1's and 0's based on which genes are on or off, distance is again the number of positions at which the strings differ (Hamming Distance).
You can see how adding variation operators for deletions and insertions would complicate our idea of distance and structure in our space of possibilities, which explains why deletions, insertions, and inversions are often ignored in these theoretical models.
How do we move?
In each case, I think it is useful to think of the set of rules defining this evolution, the "algorithm" of change. And this brings us back to the mapping and its purpose. In the usual case, the algorithm of change in these systems "sees" the mapping, and its rules are designed to move through the space in a way that leads to an increase (or decrease in some systems) in the value of the mapping. This is the popular view of biological evolution by natural selection, a "hill-climbing" algorithm that involves a population of agents where those with a higher mapping (fitness) reproduce more. A simpler example would be for my whiskey drink recipe, where every night I change the amount of one ingredient by one milliliter from my "top recipe." If the drink tastes better than the night before (the mapping here is to taste), I set it as my new top recipe, and if not, I keep the same top recipe, essentially taking a step back.
There are two questions that we might ask now about these algorithms for change. 1) For a given system, what is the best algorithm, the one that will most often result in the highest mapping value? 2) What is the algorithm that is actually running in these systems? Importantly, the two questions are not the same. The algorithm for biological evolution may not be the "best" one that could be imagined. And even more importantly, for biological evolution at least, we don't really know the answer to either question (largely because we don't know what complete adaptive landscapes are like).
For the bottom-up discrete landscapes, all of the examples I have seen that involve simulating "evolution" make the same general assumption - that evolution is limited by mutations. This idea is often written as the strong-selection-weak-mutation condition (SSWM). SSWM has one huge strength for these bottom-up models: it makes things simple. Under SSWM, the model can move through the high-dimensional landscape one step at a time. A mutation arises (take a step), selection either fixes it in the population (ok good), or it removes it completely from the population (reverse the step). Then another mutation arises. Rinse and repeat. This is the kind of algorithm that is easily run by a computer and in some cases, is easy to analyze with mathematics as well. The situation in which many mutations are at different frequencies in the population at once complicates the algorithm greatly. This assumption may, in some cases (particularly in clonal populations), be reasonable. But a good deal of recent work in experimental evolution has confirmed the idea that the actual "algorithm" is not simple - competing mutations and linkage lead to a considerably different vision of the process. These observations are beginning to be integrated into the mathematical theory but haven't (that I've seen) made their way into a landscape model or any model of long-term evolution.
So to summarize / over-generalize this type of model:
Variation operators precisely defined as point mutations or binary changes
Fitness mapping constructed empirically (for relatively low-dimensional subspaces) or theoretically by assuming some general structure and drawing from some fitness distribution
SSWM assumption for simulating evolution or doing mathematical analysis
How do we move in phenotype space?
The second objection is about discontinuity in phenotype space. The basic difficulty is that we are dealing with a continuous system state that is caused by a discrete system state. We saw how Tortoise could define distance in her landscape. How can Achilles define it in his? What he would like to do is define it based on the phenotype values, using Euclidian distance, or basically the length of the diagonal in phenotype space, where phenotype values might be scaled by standard deviations, for example. In order to make this idea work in a rigorous sense, we need to assume that the genetic factors underlying phenotypic traits are responsible for relatively small changes in the phenotypic values, and in order to apply the breeder's equation to making long-term predictions, we'll need to assume that these effects are additive (I believe the breeder's equation does not need this assumption to make statements about the immediate effect of selection, but does need it to make a longer term prediction like "the population will move to that peak").
A very interesting paper by Walter Fontana and his colleagues describes some of the issues with trying to define phenotype space in this way. Essentially, if there is redundancy in the genotype-phenotype map, neutral networks of genotypes can lead to the same phenotype. There is good experimental evidence that this is the case in some systems - Fontana particularly sites the map between RNA sequence and RNA molecule shape. These neutral networks of genotypes can lead to situations in which the transition from one phenotypic state A to state B is more likely to occur than the transition from B to A. This type of asymmetry in the move rules in phenotype space makes it impossible to really define distance.
That said, there is also good evidence that some phenotypic traits are controlled by many additive genetic factors. In some cases it may be quite reasonable to assume that phenotype space is truly continuous and that there is a supply of mutations to respond to any selection pressure. And this way of thinking, this model of the landscape, does not rely on the SSWM assumption. In fact, it explicitly relies on the variation in the population. Still, if a novel mutation arising in the population that interacts in a non-additive way with the other genetic factors controlling the phenotype, the model can break down.
So to summarize / over-generalize this type of model:
Variation operators defined as continuous change, contingent on the G matrix, which describes genetic variance and covariance
Fitness mapping investigated empirically, and created using some kind of surface-fitting / regression
Generally assumes a supply of additive genetic factors with small effects
General Landscape Movement
As an aside, we're pretty much always dealing with discrete time steps here. What if instead we we're considering, say, the position and speed of a falling object? Then it might make more sense to use continuous time dynamics to describe the system, and the move rules might depend on gravity. Here, we've actually stumbled into the realm of dynamics and classical mechanics... which means I've gotten way off track.
So is there any hope for the evolutionary biologists?
Where I'm at right now (and my viewpoint is definitely still evolving on this), is that the first type of thinking is more useful for understanding the origins of novelty in biological systems, and the second is more useful for understanding the shorter term evolution of some phenotypic traits. If you're wondering about antibiotic resistance in bacteria, you're going to be in the first mindset. If you're wondering about the evolution of beak size in birds faced with long-term drought, you're going to be in the second mindset.
But part of my goal in outlining these two different types of landscapes, and how they differ in their variation operators and move rules, is to try to illuminate the need for some crossover, or at least some breaking of the assumptions. Obviously it would be lovely if Tortoises and Achilles-es could team up and figure out the whole genotype»phenotype»fitness map. This is the ultimate goal. Unfortunately, combinatorial explosion makes it difficult / impossible to chart out these maps entirely, at least without Tortoise's high-dimensional tricks. But that makes this an exciting problem! Can we find some high-throughput way to explore more of genotype space and measure some phenotype or fitness for each combination? Can we use different evolution algorithms in microbial evolution experiments to discover general characteristics of the genotype»fitness map? And can we safely generalize any of these results to say something interesting about long-term evolutionary dynamics? As one of my old TA's used to tell us: "Hard saying, not knowing." But I'd say it's worth a shot.