Skip to main content Skip to secondary navigation

Hereditary Speciation and Stochastic Evolutionary Models

Main content start

Note: I haven't figured out how to make MathJax render so my equations are typed like they are in MathJax/LaTeX. If you are not used to seeing math written like this, apologies. 

Phylogenetic trees describe the ways in which the evolutionary tree of life branches out to create the immense diversity which exists today. Contained within these structures is an entire history of the past: what existed when, which species did the things alive today descend from, and how long ago did these events all occur. Along with understanding the evolutionary history of organisms (or genes, or viruses, or the many other applications of phylogenetic trees), studying the trees can shed light onto the driving processes which created this diversity. I will first diverge a minute to try to emphasize what sorts of processes may be interesting and relevant.

Imagine starting from a single origin species, a mythical Adam and Eve from which all future life sprouted. The species multiplies, grows, and persists until something mysterious happens: it has split into two separate populations, which are different enough that they no longer reproduce together. Perhaps they are geographically separated and enough mutations accumulate so that they fail to reproduce together. Alternatively, they do not segregate physically but perhaps separate in the niche they play in the environment. One can picture a population of salamanders in a narrow mountain ravine. Some salamanders are more prone to hunting in the leaf litter, while others become arboreal and spend time in the moist moss and cavities of trees. These creatures seldom interact with each other in the first place, mating a single time every year. If a difference in behavior among some members of the population was sufficient to partition the reproductive members of the population in a correlated manner (so that all salamanders which are arboreal tend to mate together, and all which occupy the leaf litter mate together), then it is easy to envision how these creatures could speciate, even if they lived concurrently in the same valley. These processes, or alternative mechanisms of speciation, have occurred millions of times over the history of life on earth, perhaps hundreds of millions of times. Every phylogenetic tree we have today consists of only a tiny fragment of this incredible tree of life. But even these small fragments can offer intuition into the mechanisms of why these speciation events happen.

What can a tree tell us?

Let's take a look at a phylogenetic tree. In particular, I am considering what is called an unlabeled history: unlabeled, because I don't necessarily care what each of the leaves of the tree (current species) are, and a history, meaning that we know the order in which the branching events happened. I assume that we don't really have information on the exact times which these splitting events happened. So, what we have is an order of what happened in the past, along with what still exists today (I typically look at situations in which we assume extinction does not happen, but let's just forget about that for the sake of this post). Now what? Well, let's return to our first species. At some point in time, this splits into two species (this can either be that it truly split into two species, or that a group diverged from the species; for our purposes, these are the same thing). Let's call that time $t=1$. Then, at some later time which we'll call time $t=2$, one of these species splits into two species. Naturally, this process continues. At a given time $t$, one of the currently existing $t$ species splits into 2 more, so that we have $t+1$ species in total. Which of the existing species is the one which splits? Here, we get to the question of a mode of speciation.

A natural model to consider is what is known as the Yule model of speciation, in which one of the existing species is picked (randomly, each with equal probability) to be the next species to split. This Yule model can be thought of as entirely neutral: species splitting is a very random process which does not depend at all on anything about the extant species. But research in the literature typically indicates that real phylogenetic trees do not behave like trees generated under the Yule model.

What is an alternative? Well, obviously it makes sense that some things are more likely to evolve than others. Viruses evolve on entirely different timescales than vertebrates, and it's likely that certain traits-being an r-selected species, having sexual rather than asexual reproduction, being somewhere unique on the specialist-generalist tradeoff, etc.-make it far more likely for a given species to speciate. While exploring mechanistic methods contributing to phylogenetic tree patterns would be a highly interesting (and probably well-explored) area of research, I'm dealing in a generality beyond individual taxa where we can say something further. So, instead of having one mechanism of speciation in mind, let us think about a more general mechanism: perhaps the species which have already speciated are more (or less) likely to speciate in the future. In other words, the rate or likelihood at which an individual species may be hereditary: speciation rates could be considered a trait just like any other trait which gets passed on.

Two possible models, and their biological implications

There are two basic ways in which one can create models of evolution as a forward in time progress which may have variable speciation rates. The first approach, modelled in papers such as (Blum & Francois) and (Sainudiin and Amandine), is as follows. The original species has probability to speciate equal to 1. Then, its children get a speciation rate $\lambda$ and $1 - \lambda$, respectively, where $\lambda$ is some random variable taking its values in the interval [0,1]. Now, at later times, a species is chosen to branch with probability equal to some fixed probability $p$, and its children now have probability $\lambda p$ and $(1 - \lambda) p$ to speciate at a further time. The biological intuition of this approach is that each clade has a fixed, intrinsic speciation rate relative to all other clades. Suppose, for instance, that the original species as the common ancestor was some early reptile, the left branch contains todays birds and the right branch contains today's lizards. Then this means that the probability that the next species to evolve is a bird or a lizard is fixed, no matter now many of each branch there are. While this obviously seems quite artificial when the tree is zoomed out to the resolution of classes, it could certainly have merit over shorter timescales (i.e., maybe some genus of fish is much more likely to produce new species than another genus of fish, even though its uncertain which individual in the clade will speciate).

The alternative natural way to model an evolutionary splitting process is as a branching Markov process. Imagine that we have a pool of $n$ species. Each of them can have some intrinsic rate of speciation $\gamma_i$, (and potentially an intrinsic rate of extinction). Then the process plays out independently for each species: it has it's own (Poisson) clock that's running that sets when it branches into a new species. When compared to the above, this says that speciation is proceeding independently for each species, rather than for each clade. This has the added realism of the following notion: suppose species $n$ splits into two species. Then the probability that species 1 is the next species to speciate should be slightly lower, since there are now more possible species which can split. However, the clade speciation rate may be more applicable especially in the context of ecological niche construction: some groups of organisms simply are more readily able to diversify due to tangible characteristics they possess (i.e. beetles, viruses, etc).

There are certainly many more models to be explored; these are merely gross idealizations from which some intuition can be defined. I do believe that these two classes of models and their resulting assumptions are the two most natural ways to think of these evolutionary models. Much of the work I am focusing on now relates to developing techniques to be able to effectively compute statistics about trees generated from larger classes of phylogenetic models, including those above. While the research is hard at times, every time I pick my head up and start reading what other people have written, I get new ideas. Funny how that works!


Blum, Michael GB, and Olivier François. "Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance." Systematic Biology 55.4 (2006): 685-691.

Sainudiin, Raazesh, and Amandine Véber. "A Beta-splitting model for evolutionary trees." Royal Society open science 3.5 (2016): 160016.