Stochastic Models for the Inference of Life Evolution

SMILE | Stochastic Models for the Inference of Life Evolution | Collège de France


SMILE is an interdisciplinary research group gathering mathematicians, bio-informaticians and biologists.
SMILE is affiliated to the Institut de Biologie de l'ENS, in Paris.
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.


SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.


You can reach us by email (amaury.lambert - at - ; (guillaume.achaz - at - or (smile - at -

Light on



MOLD, a novel software to compile accurate and reliable DNA diagnoses for taxonomic descriptions

DNA data are increasingly being used for phylogenetic inference, and taxon delimitation and identification, but scarcely for the formal description of taxa, despite their undisputable merits in taxonomy. The uncertainty regarding the robustness of DNA diagnoses, however, remains a major impediment to their use. We have developed a new program, mold, that identifies diagnostic nucleotide combinations (DNCs) in DNA sequence alignments for selected taxa, which can be used to provide formal diagnoses of these taxa. To test the robustness of DNA diagnoses, we carry out iterated haplotype subsampling for selected query species in published DNA data sets of varying complexity. We quantify the reliability of diagnosis by diagnosing each query subsample and then checking if this diagnosis remains valid against the entire data set. We demonstrate that widely used types of diagnostic DNA characters are often absent for a query taxon or are not sufficiently reliable. We thus propose a new type of DNA diagnosis, termed "redundant DNC" (or rDNC), which takes into account unsampled genetic diversity, and constitutes a much more reliable descriptor of a taxon. mold successfully retrieves rDNCs for all but two species in the analysed data sets, even in those comprising hundreds of species. mold shows unparalleled efficiency in large DNA data sets and is the only available software capable of compiling DNA diagnoses that suit predefined criteria of reliability.



Coagulation-transport equations and the nested coalescents

The nested Kingman coalescent describes the dynamics of particles (called genes) contained in larger components (called species), where pairs of species coalesce at constant rate and pairs of genes coalesce at constant rate provided they lie within the same species. We prove that starting from \$$rn\$$ species, the empirical distribution of species masses (numbers of genes\$$/n\$$) at time \$$t/n\$$ converges as \$$n\to\infty\$$ to a solution of the deterministic coagulation-transport equation $$ \partial_t d \ = \ \partial_x ( \psi d ) \ + \ a(t)\left(d\star d - d \right), $$ where \$$\psi(x) = cx^2\$$, \$$\star\$$ denotes convolution and \$$a(t)= 1/(t+\delta)\$$ with \$$\delta=2/r\$$. The most interesting case when \$$\delta =0\$$ corresponds to an infinite initial number of species. This equation describes the evolution of the distribution of species of mass \$$x\$$, where pairs of species can coalesce and each species' mass evolves like \$$\dot x = -\psi(x)\$$. We provide two natural probabilistic solutions of the latter IPDE and address in detail the case when \$$\delta=0\$$. The first solution is expressed in terms of a branching particle system where particles carry masses behaving as independent continuous-state branching processes. The second one is the law of the solution to the following McKean-Vlasov equation $$ dx_t \ = \ - \psi(x_t) \,dt \ + \ v_t\,\Delta J_t $$ where \$$J\$$ is an inhomogeneous Poisson process with rate \$$1/(t+\delta)\$$ and \$$(v_t; t\geq0)\$$ is a sequence of independent rvs such that \$$\mathcal L(v_t) = \mathcal L(x_t)\$$. We show that there is a unique solution to this equation and we construct this solution with the help of a marked Brownian coalescent point process. When \$$\psi(x)=x^\gamma\$$, we show the existence of a self-similar solution for the PDE which relates when \$$\gamma=2\$$ to the speed of coming down from infinity of the nested Kingman coalescent.

Upcoming seminars


Planning des salles du Collège de France.
Intranet du Collège de France.