SMILE

Stochastic Models for the Inference of Life Evolution

SMILE | Stochastic Models for the Inference of Life Evolution | Collège de France

Presentation

SMILE is an interdisciplinary research group gathering mathematicians, bio-informaticians and biologists.
SMILE is affiliated to the Institut de Biologie de l'ENS, in Paris.
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.

Directions

SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.

Contact

You can reach us by email (amaury.lambert - at - college-de-france.fr) ; (guillaume.achaz - at - college-de-france.fr) or (smile - at - listes.upmc.fr).

Light on

Publication

2022

MOLD, a novel software to compile accurate and reliable DNA diagnoses for taxonomic descriptions

DNA data are increasingly being used for phylogenetic inference, and taxon delimitation and identification, but scarcely for the formal description of taxa, despite their undisputable merits in taxonomy. The uncertainty regarding the robustness of DNA diagnoses, however, remains a major impediment to their use. We have developed a new program, mold, that identifies diagnostic nucleotide combinations (DNCs) in DNA sequence alignments for selected taxa, which can be used to provide formal diagnoses of these taxa. To test the robustness of DNA diagnoses, we carry out iterated haplotype subsampling for selected query species in published DNA data sets of varying complexity. We quantify the reliability of diagnosis by diagnosing each query subsample and then checking if this diagnosis remains valid against the entire data set. We demonstrate that widely used types of diagnostic DNA characters are often absent for a query taxon or are not sufficiently reliable. We thus propose a new type of DNA diagnosis, termed "redundant DNC" (or rDNC), which takes into account unsampled genetic diversity, and constitutes a much more reliable descriptor of a taxon. mold successfully retrieves rDNCs for all but two species in the analysed data sets, even in those comprising hundreds of species. mold shows unparalleled efficiency in large DNA data sets and is the only available software capable of compiling DNA diagnoses that suit predefined criteria of reliability.

Publication

2018

Exchangeable coalescents, ultrametric spaces, nested interval-partitions: A unifying approach

Kingman's representation theorem (Kingman 1978) states that any exchangeable partition of \$$\mathbb{N}\$$ can be represented as a paintbox based on a random mass-partition. Similarly, any exchangeable composition (i.e.\ ordered partition of \$$\mathbb{N}\$$) can be represented as a paintbox based on an interval-partition (Gnedin 1997. Our first main result is that any exchangeable coalescent process (not necessarily Markovian) can be represented as a paintbox based on a random non-decreasing process valued in interval-partitions, called nested interval-partition, generalizing the notion of comb metric space introduced by Lambert & Uribe Bravo (2017) to represent compact ultrametric spaces. As a special case, we show that any \$$\Lambda\$$-coalescent can be obtained from a paintbox based on a unique random nested interval partition called \$$\Lambda\$$-comb, which is Markovian with explicit semi-group. This nested interval-partition directly relates to the flow of bridges of Bertoin & Le~Gall (2003). We also display a particularly simple description of the so-called evolving coalescent by a comb-valued Markov process. Next, we prove that any measured ultrametric space \$$U\$$, under mild measure-theoretic assumptions on \$$U\$$, is the leaf set of a tree composed of a separable subtree called the backbone, on which are grafted additional subtrees, which act as star-trees from the standpoint of sampling. Displaying this so-called weak isometry requires us to extend the Gromov-weak topology, that was initially designed for separable metric spaces, to non-separable ultrametric spaces. It allows us to show that for any such ultrametric space \$$U\$$, there is a nested interval-partition which is 1) indistinguishable from \$$U\$$ in the Gromov-weak topology; 2) weakly isometric to \$$U\$$ if \$$U\$$ has complete backbone; 3) isometric to \$$U\$$ if \$$U\$$ is complete and separable.

Upcoming seminars

Resources

Planning des salles du Collège de France.
Intranet du Collège de France.