Detecting the limits of regulatory element conservation anddivergence estimation using pairwise and multiple alignments [electronic resource].
- Washington, D.C. : United States. Dept. of Energy, 2006. and Oak Ridge, Tenn. : Distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy.
- Additional Creators:
- United States. Department of Energy, National Institutes of Health (U.S.), and United States. Department of Energy. Office of Scientific and Technical Information
- Background: Molecular evolutionary studies of noncodingsequences rely on multiple alignments. Yet how multiple alignmentaccuracy varies across sequence types, tree topologies, divergences andtools, and further how this variation impacts specific inferences,remains unclear. Results: Here we develop a molecular evolutionsimulation platform, CisEvolver, with models of background noncoding andtranscription factor binding site evolution, and use simulated alignmentsto systematically examine multiple alignment accuracy and its impact ontwo key molecular evolutionary inferences: transcription factor bindingsite conservation and divergence estimation. We find that the accuracy ofmultiple alignments is determined almost exclusively by the pairwisedivergence distance of the two most diverged species and that additionalspecies have a negligible influence on alignment accuracy. Conservedtranscription factor binding sites align better than surroundingnoncoding DNA yet are often found to be misaligned at relatively shortdivergence distances, such that studies of binding site gain and losscould easily be confounded by alignment error. Divergence estimates frommultiple alignments tend to be overestimated at short divergencedistances but reach a tool specific divergence at which they cease toincrease, leading to underestimation at long divergences. Our moststriking finding was that overall alignment accuracy, binding sitealignment accuracy and divergence estimation accuracy vary greatly acrossbranches in a tree and are most accurate for terminal branches connectingsister taxa and least accurate for internal branches connectingsub-alignments. Conclusions: Our results suggest that variation inalignment accuracy can lead to errors in molecular evolutionaryinferences that could be construed as biological variation. Thesefindings have implications for which species to choose for analyses, whatkind of errors would be expected for a given set of species and howmultiple alignment tools and phylogenetic inference methods might beimproved to minimize or control for alignment errors.
- Published through SciTech Connect., 08/14/2006., "lbnl--62553", ": 400412000", BMC Bioinformatics 7 376 FT, Moses, Alan M.; Pollard, Daniel A.; Eisen,Michael B.; Iyer, Venky N., and Ernest Orlando Lawrence Berkeley NationalLaboratory, Berkeley, CA (US)
- Funding Information:
- DE-AC02-05CH11231, NIHR01-HG002779 02, and GHEIS2
View MARC record | catkey: 13801745