Stochastic Models for the Inference of Life Evolution

SMILE | Stochastic Models for the Inference of Life Evolution | Collège de France


SMILE is an interdisciplinary research group gathering mathematicians, bio-informaticians and biologists.
SMILE is affiliated to the Institut de Biologie de l'ENS, in Paris.
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.


SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.


You can reach us by email (amaury.lambert - at - ; (guillaume.achaz - at - or (smile - at -

Light on



A minimal yet flexible likelihood framework to assess correlated evolution

An evolutionary process is reflected in the sequence of changes of any trait (e.g. mor- phological, molecular) through time . Yet, a better understanding of evolution would be procured by characterizing correlated evolution, or when two or more evolutionary pro- cesses interact. Many previously developed parametric methods often require significant computing time as they rely on the estimation of many parameters. Here we propose a minimal likelihood framework modelling the joint evolution of two traits on a known phylogenetic tree. The type and strength of correlated evolution is characterized by few parameters tuning mutation rates of each trait and interdependencies between these rates. The framework can be applied to study any discrete trait or character ranging from nucleotide substitution to gain or loss of a biological function. More specifically, it can be used to 1) test for independence between two evolutionary processes, 2) iden- tify the type of interaction between them and 3) estimate parameter values of the most likely model of interaction. In its current implementation, the method takes as input a phylogenetic tree together with mapped discrete evolutionary events on it and then maximizes the likelihood for one or several chosen scenarios. The strengths and limits of the method, as well as its relative power when compared to a few other methods, are assessed using both simulations and data from 16S rRNA sequences in a sample of 54 γ-enterobacteria. We show that even with datasets of fewer than 100 species, the method performs well in parameter estimation and in the selection of evolutionary scenario.



The reconstructed tree in the lineage-based model of protracted speciation

A popular line of research in evolutionary biology is the use of time-calibrated phylogenies for the inference of diversification processes. This requires computing the likelihood of a given ultrametric tree as the reconstructed tree produced by a given model of diversification. Etienne and Rosindell in Syst Biol 61(2):204–213, (2012) proposed a lineage-based model of diversification, called protracted speciation, where species remain incipient during a random duration before turning good species, and showed that this can explain the slowdown in lineage accumulation observed in real phylogenies. However, they were unable to provide a general likelihood formula. Here, we present a likelihood formula for protracted speciation models, where rates at which species turn good or become extinct can depend both on their age and on time. Our only restrictive assumption is that speciation rate does not depend on species status. Our likelihood formula utilizes a new technique, based on the contour of the phylogenetic tree and first developed by Lambert in Ann Probab 38(1):348–395, (2010). We consider the reconstructed trees spanned by all extant species, by all good extant species, or by all representative species, which are either good extant species or incipient species representative of some good extinct species. Specifically, we prove that each of these trees is a coalescent point process, that is, a planar, ultrametric tree where the coalescence times between two consecutive tips are independent, identically distributed random variables. We characterize the common distribution of these coalescence times in some, biologically meaningful, special cases for which the likelihood reduces to an elegant analytical formula or becomes numerically tractable.

Upcoming seminars


Planning des salles du Collège de France.
Intranet du Collège de France.