Integrative models of genome evolution
Bastien Boussau1,2, Gergely Szollosi1, Laurent Duret1, Manolo Gouy1, Eric Tannier1, Vincent Daubin1
1LBBE, UMR CNRS 5558, Université Lyon1, Université de Lyon;
2UC Berkeley, United States of America
Species trees are usually built as an average of the phylogenetic signal of at most a few dozens of gene families. These gene families are selected because of their simple history, apparently devoid of duplications and losses. However, these gene families may have undergone hidden duplication and loss events, in which case their phylogeny may differ from the species phylogeny: in such circumstances, the reconstructed species tree may differ from the true species tree. Contrary to such common approaches, we propose to model gene family evolution in the presence of gene duplication and loss, and consequently separately infer gene family trees and species tree. Importantly, this enables inferring species trees based on all gene families in genomes, and based on the phylogenetic information contained in events of gene duplication and loss. In this model, each branch of the species tree is associated to particular duplication and loss parameters to accommodate heterogeneity in the processes of genomic evolution. We explain how one can efficiently compute the likelihood of a species tree and gene family trees with such a model, and present its parallel implementation in PHYLDOG, a program able to analyze simultaneously dozens of species and thousands of gene families in a statistical framework. We show that PHYLDOG performs very well on simulated data, and we reveal general trends of genomic evolution by applying it to more than 7000 gene families in 37 whole genome sequences from mammalian species.