SMILE

Stochastic Models for the Inference of Life Evolution

SMILE | Stochastic Models for the Inference of Life Evolution | Collège de France

Presentation

SMILE is an interdisciplinary research group gathering mathematicians, bio-informaticians and biologists.
SMILE is affiliated to the Institut de Biologie de l'ENS, in Paris.
SMILE is hosted within the CIRB (Center for Interdisciplinary Research in Biology) at Collège de France.
SMILE is supported by Collège de France and CNRS.
Visit also our homepage at CIRB.

Directions

SMILE is hosted at Collège de France in the Latin Quarter of Paris. To reach us, go to 11 place Marcelin Berthelot (stations Luxembourg or Saint-Michel on RER B).
Our working spaces are rooms 107, 121 and 122 on first floor of building B1 (ask us for the code). Building B1 is facing you upon exiting the traversing hall behind Champollion's statue.

Contact

You can reach us by email (amaury.lambert - at - college-de-france.fr) ; (guillaume.achaz - at - college-de-france.fr) or (smile - at - listes.upmc.fr).

Light on

Publication

2022

MOLD, a novel software to compile accurate and reliable DNA diagnoses for taxonomic descriptions

DNA data are increasingly being used for phylogenetic inference, and taxon delimitation and identification, but scarcely for the formal description of taxa, despite their undisputable merits in taxonomy. The uncertainty regarding the robustness of DNA diagnoses, however, remains a major impediment to their use. We have developed a new program, mold, that identifies diagnostic nucleotide combinations (DNCs) in DNA sequence alignments for selected taxa, which can be used to provide formal diagnoses of these taxa. To test the robustness of DNA diagnoses, we carry out iterated haplotype subsampling for selected query species in published DNA data sets of varying complexity. We quantify the reliability of diagnosis by diagnosing each query subsample and then checking if this diagnosis remains valid against the entire data set. We demonstrate that widely used types of diagnostic DNA characters are often absent for a query taxon or are not sufficiently reliable. We thus propose a new type of DNA diagnosis, termed "redundant DNC" (or rDNC), which takes into account unsampled genetic diversity, and constitutes a much more reliable descriptor of a taxon. mold successfully retrieves rDNCs for all but two species in the analysed data sets, even in those comprising hundreds of species. mold shows unparalleled efficiency in large DNA data sets and is the only available software capable of compiling DNA diagnoses that suit predefined criteria of reliability.

Publication

2016

Testing for Independence between Evolutionary Processes

Evolutionary events co-occurring along phylogenetic trees usually point to complex adaptive phenomena, sometimes implicating epistasis. While a number of methods have been developed to account for co-occurrence of events on the same internal or external branch of an evolutionary tree, there is a need to account for the larger diversity of possible relative positions of events in a tree. Here we propose a method to quantify to what extent two or more evolutionary events are associated on a phylogenetic tree. The method is applicable to any discrete character, like substitutions within a coding sequence or gains/losses of a biological function. Our method uses a general approach to statistically test for significant associations between events along the tree, which encompasses both events inseparable on the same branch, and events genealogically ordered on different branches. It assumes that the phylogeny and themapping of branches is known without errors. We address this problem from the statistical viewpoint by a linear algebra representation of the localization of the evolutionary events on the tree.We compute the full probability distribution of the number of paired events occurring in the same branch or in different branches of the tree, under a null model of independence where each type of event occurs at a constant rate uniformly inthephylogenetic tree. The strengths and weaknesses of themethodare assessed via simulations; we then apply the method to explore the loss of cell motility in intracellular pathogens.

Upcoming seminars

Resources

Planning des salles du Collège de France.
Intranet du Collège de France.