SMILE

Stochastic Models for the Inference of Life Evolution

Repseek, a tool to retrieve approximate repeats from large DNA sequences

Achaz, G., Boyer, F., Rocha, E. P. C., Viari, A., Coissac, E.

Bioinformatics (Oxford, England)

2007

Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels. AVAILABILITY: http://wwwabi.snv.jussieu.fr/public/RepSeek/

Bibtex

@article{achaz_repseek_2007,
Author = {Achaz, Guillaume and Boyer, Frédéric and Rocha,
Eduardo P. C. and Viari, Alain and Coissac, Eric},
Title = {Repseek, a tool to retrieve approximate repeats from
large {DNA} sequences},
Journal = {Bioinformatics (Oxford, England)},
Volume = {23},
Number = {1},
Pages = {119--121},
Keywords = {Algorithms, Base Sequence, DNA, Information Storage
and Retrieval, Information Systems, Sequence Analysis,
DNA, Software, Tandem Repeat Sequences},
abstract = {Chromosomes or other long DNA sequences contain many
highly similar repeated sub-sequences. While there are
efficient methods for detecting strict repeats or
detecting already characterized repeats, there is no
software available for detecting approximate repeats in
large DNA sequences allowing for weighted substitutions
and indels in a coherent statistical framework. Here,
we present an implementation of a two-steps method
(seed detection followed by their extension) that
detects those approximate repeats. Our method is
computationally efficient enough to handle large
sequences and is flexible enough to account for
influencing factors, such as sequence-composition
biases both at the seed detection and alignment levels.
AVAILABILITY:
http://wwwabi.snv.jussieu.fr/public/RepSeek/},
doi = {10.1093/bioinformatics/btl519},
issn = {1367-4811},
language = {eng},
month = jan,
pmid = {17038345},
year = 2007
}

Link to the article

Accéder à l'article grâce à son DOI.