SMILE

Stochastic Models for the Inference of Life Evolution

Neutrality tests for sequences with missing data

Ferretti, L., Raineri, E., Ramos-Onsins, S.

Genetics

2012

Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.

Bibtex

@article{ferretti_neutrality_2012,
Author = {Ferretti, Luca and Raineri, Emanuele and Ramos-Onsins,
Sebastian},
Title = {Neutrality tests for sequences with missing data},
Journal = {Genetics},
Volume = {191},
Number = {4},
Pages = {1397--1401},
abstract = {Missing data are common in DNA sequences obtained
through high-throughput sequencing. Furthermore,
samples of low quality or problems in the experimental
protocol often cause a loss of data even with
traditional sequencing technologies. Here we propose
modified estimators of variability and neutrality tests
that can be naturally applied to sequences with missing
data, without the need to remove bases or individuals
from the analysis. Modified statistics include the
Watterson estimator θW, Tajima's D, Fay and Wu's H,
and HKA. We develop a general framework to take missing
data into account in frequency spectrum-based
neutrality tests and we derive the exact expression for
the variance of these statistics under the neutral
model. The neutrality tests proposed here can also be
used as summary statistics to describe the information
contained in other classes of data like DNA
microarrays.},
doi = {10.1534/genetics.112.139949},
issn = {1943-2631},
language = {eng},
month = aug,
pmcid = {PMC3416018},
pmid = {22661328},
year = 2012
}

Link to the article

Accéder à l'article grâce à son DOI.