SMILE

Stochastic Models for the Inference of Life Evolution

SMILE | Stochastic Models for the Inference of Life Evolution | Collège de France

Presentation

SMILE is an interdisciplinary research group gathering mathematicians, bio-informaticians and biologists.
SMILE is affiliated to the Institut de Biologie de l'ENS, in Paris.
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.

Directions

SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.

Contact

You can reach us by email (amaury.lambert - at - college-de-france.fr) ; (guillaume.achaz - at - college-de-france.fr) or (smile - at - listes.upmc.fr).

Light on

Publication

2018

Ranked Tree Shapes, Nonrandom Extinctions, and the Loss of Phylogenetic Diversity

Phylogenetic diversity (PD) is a measure of the evolutionary legacy of a group of species, which can be used to define conservation priorities. It has been shown that an important loss of species diversity can sometimes lead to a much less important loss of PD, depending on the topology of the species tree and on the distribution of its branch lengths. However, the rate of decrease of PD strongly depends on the relative depths of the nodes in the tree and on the order in which species become extinct. We introduce a new, sampling-consistent, three-parameter model generating random trees with covarying topology, clade relative depths and clade relative extinction risks. This model can be seen as an extension to Aldous' one parameter splitting model (\$$\beta\$$, which controls for tree balance) with two additional parameters: a new parameter \$$\alpha\$$ quantifying the correlation between the richness of a clade and its relative depth, and a parameter \$$\eta\$$ quantifying the correlation between the richness of a clade and its frequency (relative abundance or range), taken herein as a proxy for its overall extinction risk. We show on simulated phylogenies that loss of PD depends on the combined effect of all three parameters, \$$\beta\$$, \$$\alpha\$$ and \$$\eta\$$. In particular, PD may decrease as fast as species diversity when high extinction risks are clustered within small, old clades, corresponding to a parameter range that we term the `thin ice zone' (\$$\beta<-1\$$ or \$$\alpha<0\$$; \$$\eta>1\$$). Besides, when high extinction risks are clustered within large clades, the loss of PD can be higher in trees that are more balanced (\$$\beta>0\$$), in contrast to the predictions of earlier studies based on simpler models. We propose a Monte-Carlo algorithm, tested on simulated data, to infer all three parameters. Applying it to a real dataset comprising 120 bird clades (class Aves) with known range sizes , we show that parameter estimates precisely fall close to close to a 'thin ice zone': the combination of their ranking tree shape and non-random extinctions risks makes them prone to a sudden collapse of PD.

Publication

2018

Exchangeable coalescents, ultrametric spaces, nested interval-partitions: A unifying approach

Kingman's representation theorem (Kingman 1978) states that any exchangeable partition of \$$\mathbb{N}\$$ can be represented as a paintbox based on a random mass-partition. Similarly, any exchangeable composition (i.e.\ ordered partition of \$$\mathbb{N}\$$) can be represented as a paintbox based on an interval-partition (Gnedin 1997. Our first main result is that any exchangeable coalescent process (not necessarily Markovian) can be represented as a paintbox based on a random non-decreasing process valued in interval-partitions, called nested interval-partition, generalizing the notion of comb metric space introduced by Lambert & Uribe Bravo (2017) to represent compact ultrametric spaces. As a special case, we show that any \$$\Lambda\$$-coalescent can be obtained from a paintbox based on a unique random nested interval partition called \$$\Lambda\$$-comb, which is Markovian with explicit semi-group. This nested interval-partition directly relates to the flow of bridges of Bertoin & Le~Gall (2003). We also display a particularly simple description of the so-called evolving coalescent by a comb-valued Markov process. Next, we prove that any measured ultrametric space \$$U\$$, under mild measure-theoretic assumptions on \$$U\$$, is the leaf set of a tree composed of a separable subtree called the backbone, on which are grafted additional subtrees, which act as star-trees from the standpoint of sampling. Displaying this so-called weak isometry requires us to extend the Gromov-weak topology, that was initially designed for separable metric spaces, to non-separable ultrametric spaces. It allows us to show that for any such ultrametric space \$$U\$$, there is a nested interval-partition which is 1) indistinguishable from \$$U\$$ in the Gromov-weak topology; 2) weakly isometric to \$$U\$$ if \$$U\$$ has complete backbone; 3) isometric to \$$U\$$ if \$$U\$$ is complete and separable.

Publication

2016

Testing for Independence between Evolutionary Processes

Evolutionary events co-occurring along phylogenetic trees usually point to complex adaptive phenomena, sometimes implicating epistasis. While a number of methods have been developed to account for co-occurrence of events on the same internal or external branch of an evolutionary tree, there is a need to account for the larger diversity of possible relative positions of events in a tree. Here we propose a method to quantify to what extent two or more evolutionary events are associated on a phylogenetic tree. The method is applicable to any discrete character, like substitutions within a coding sequence or gains/losses of a biological function. Our method uses a general approach to statistically test for significant associations between events along the tree, which encompasses both events inseparable on the same branch, and events genealogically ordered on different branches. It assumes that the phylogeny and themapping of branches is known without errors. We address this problem from the statistical viewpoint by a linear algebra representation of the localization of the evolutionary events on the tree.We compute the full probability distribution of the number of paired events occurring in the same branch or in different branches of the tree, under a null model of independence where each type of event occurs at a constant rate uniformly inthephylogenetic tree. The strengths and weaknesses of themethodare assessed via simulations; we then apply the method to explore the loss of cell motility in intracellular pathogens.

Upcoming seminars

Resources

Planning des salles du Collège de France.
Intranet du Collège de France.