Stochastic Models for the Inference of Life Evolution

SMILE | Stochastic Models for the Inference of Life Evolution | Collège de France


SMILE is an interdisciplinary research group gathering mathematicians, bio-informaticians and biologists.
SMILE is affiliated to the Institut de Biologie de l'ENS, in Paris.
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.


SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.


You can reach us by email (amaury.lambert - at - ; (guillaume.achaz - at - or (smile - at -

Light on



The genealogical decomposition of a matrix population model with applications to the aggregation of stages

Matrix projection models are a central tool in many areas of population biology. In most applications, one starts from the projection matrix to quantify the asymptotic growth rate of the population (the dominant eigenvalue), the stable stage distribution, and the reproductive values (the dominant right and left eigenvectors, respectively). Any primitive projection matrix also has an associated ergodic Markov chain that contains information about the genealogy of the population. In this paper, we show that these facts can be used to specify any matrix population model as a triple consisting of the ergodic Markov matrix, the dominant eigenvalue and one of the corresponding eigenvectors. This decomposition of the projection matrix separates properties associated with lineages from those associated with individuals. It also clarifies the relationships between many quantities commonly used to describe such models, including the relationship between eigenvalue sensitivities and elasticities. We illustrate the utility of such a decomposition by introducing a new method for aggregating classes in a matrix population models to produce a simpler model with a smaller number of classes. Unlike the standard method, our method has the advantage of preserving reproductive values and elasticities. It also has conceptually satisfying properties such as commuting with changes of units.



Exchangeable coalescents, ultrametric spaces, nested interval-partitions: A unifying approach

Kingman's representation theorem (Kingman 1978) states that any exchangeable partition of \$$\mathbb{N}\$$ can be represented as a paintbox based on a random mass-partition. Similarly, any exchangeable composition (i.e.\ ordered partition of \$$\mathbb{N}\$$) can be represented as a paintbox based on an interval-partition (Gnedin 1997. Our first main result is that any exchangeable coalescent process (not necessarily Markovian) can be represented as a paintbox based on a random non-decreasing process valued in interval-partitions, called nested interval-partition, generalizing the notion of comb metric space introduced by Lambert & Uribe Bravo (2017) to represent compact ultrametric spaces. As a special case, we show that any \$$\Lambda\$$-coalescent can be obtained from a paintbox based on a unique random nested interval partition called \$$\Lambda\$$-comb, which is Markovian with explicit semi-group. This nested interval-partition directly relates to the flow of bridges of Bertoin & Le~Gall (2003). We also display a particularly simple description of the so-called evolving coalescent by a comb-valued Markov process. Next, we prove that any measured ultrametric space \$$U\$$, under mild measure-theoretic assumptions on \$$U\$$, is the leaf set of a tree composed of a separable subtree called the backbone, on which are grafted additional subtrees, which act as star-trees from the standpoint of sampling. Displaying this so-called weak isometry requires us to extend the Gromov-weak topology, that was initially designed for separable metric spaces, to non-separable ultrametric spaces. It allows us to show that for any such ultrametric space \$$U\$$, there is a nested interval-partition which is 1) indistinguishable from \$$U\$$ in the Gromov-weak topology; 2) weakly isometric to \$$U\$$ if \$$U\$$ has complete backbone; 3) isometric to \$$U\$$ if \$$U\$$ is complete and separable.



The species problem from the modeler’s point of view

How to define and delineate species is a long-standing question sometimes called the species problem. In modern systematics, species should be groups of individuals sharing characteristics inherited from a common ancestor which distinguish them from other such groups. A good species definition should thus satisfy the following three desirable properties: (A) Heterotypy between species, (B) Homotypy within species and (E) Exclusivity, or monophyly, of each species. In practice, systematists seek to discover the very traits for which these properties are satisfied, without the a priori knowledge of the traits which have been responsible for differentiation and speciation nor of the true ancestral relationships between individuals. Here to the contrary, we focus on individual-based models of macro-evolution, where both the differentiation process and the population genealogies are explicitly modeled, and we ask: How and when is it possible, with this significant information, to delineate species in a way satisfying most or all of the three desirable properties (A), (B) and (E)? Surprisingly, despite the popularity of this modeling approach in the last two decades, there has been little progress or agreement on answers to this question. We prove that the three desirable properties are not in general satisfied simultaneously, but that any two of them can. We show mathematically the existence of two natural species partitions: the finest partition satisfying (A) and (E) and the coarsest partition satisfying (B) and (E). For each of them, we propose a simple algorithm to build the associated phylogeny. We stress that these two procedures can readily be used at a higher level, namely to cluster species into monophyletic genera. The ways we propose to phrase the species problem and to solve it should further refine models and our understanding of macro-evolution.



The impact of selection, gene conversion, and biased sampling on the assessment of microbial demography

Recent studies have linked demographic changes and epidemiological patterns in bacterial populations using coalescent-based approaches. We identified 26 studies using skyline plots and found that 21 inferred overall population expansion. This surprising result led us to analyze the impact of natural selection, recombination (gene conversion), and sampling biases on demographic inference using skyline plots and site frequency spectra (SFS). Forward simulations based on biologically relevant parameters from Escherichia coli populations showed that theoretical arguments on the detrimental impact of recombination and especially natural selection on the reconstructed genealogies cannot be ignored in practice. In fact, both processes systematically lead to spurious interpretations of population expansion in skyline plots (and in SFS for selection). Weak purifying selection, and especially positive selection, had important effects on skyline plots, showing patterns akin to those of population expansions. State-of-the-art techniques to remove recombination further amplified these biases. We simulated three common sampling biases in microbiological research: uniform, clustered, and mixed sampling. Alone, or together with recombination and selection, they further mislead demographic inferences producing almost any possible skyline shape or SFS. Interestingly, sampling sub-populations also affected skyline plots and SFS, because the coalescent rates of populations and their sub-populations had different distributions. This study suggests that extreme caution is needed to infer demographic changes solely based on reconstructed genealogies. We suggest that the development of novel sampling strategies and the joint analyzes of diverse population genetic methods are strictly necessary to estimate demographic changes in populations where selection, recombination, and biased sampling are present.

Upcoming seminars


Planning des salles du Collège de France.
Intranet du Collège de France.