Genomes contain mosaics of discordant evolutionary histories. While genome-wide data are routinely used for discordance-aware phylogenomic analyses, due to modeling and scalability limitations, the current practice leaves out large chunks of the genomes. Analyzing whole genome alignments using existing methods is impractical. Furthermore, the most scalable discordance-aware methods require recombination-free unlinked locus trees as input, making them unsuitable for analyzing the entire genome. As more high-quality genomes become available across the tree of life, we urgently need methods that can infer the tree from multiple genome alignments using all the reliably aligned sites and accounting for discordance across the genome. Here, we introduce CASTER, a site-based method that is statistically consistent under ILS and achieves levels of scalability that make it possible to analyze hundreds of mammalian genomes. We show both in simulations and in applications to real data that CASTER is scalable and accurate and that its per-site scores can reveal interesting patterns of evolution across the genome.
CASTER: Direct species tree inference from whole-genome alignments