Detection of bias in simulated and real data
Patrick Kück, Bernhard Misof, Johann-Wolfgang
Universität Bonn, Germany
We analysed the robustness and efficiency of Maximum Likelihood in respect to different subclasses of the typical 4-taxon long branch case in multiple taxon topologies. The analyses were performed over a broad range of different branch length conditions and model parameters. Although the inclusion of a mixed-distribution model (Gamma+I) fits our data much better than a Gamma distribution or invariant sites proportion model alone, our results show that for some topologies and branch lengths the reconstruction success of maximum likelihood is still low for alignments with a length of 100,000 base positions even if the model assumptions are correct. Thus the risk of obtaining a wrong topology increases even if ML is used in the reconstruction process and highly depends on given model parameters and branch length relations in the true topology that shall be reconstructed.
To identify taxa which will most likely be misplaced in trees and which negatively influence the tree-likeness of the data we developed AliGROOVE, a new tool to visualize the quality of sequence similarity in multiple alignments. AliGROOVE summarises site scores of profiles of sequence similarity over the whole alignment length from each pairwise comparison and translates the obtained scoring distances into a similarity matrix. We used simulated data to see whether this approach is sensitive enough to pick up ambiguously aligned single taxa or groups of taxa. Additionally, we applied AliGROOVE on empirical data. Results of tests are subsequently put into relation to observations of our simulated data analyses.