Papillomaviruses (PVs) are a ubiquitous group of DNA viruses that infect a wide range of vertebrate hosts, including human. It is well established that the infection of some PV types (e.g., HPV16 and HPV18) are strongly associated with the occurrence and progression of cancers in human, but it remains elusive how such pathogenicity got evolved at the genomic level. In this study, we curated and annotated a compendium of 3,562 PV genomes (including 3,329 human PVs and 233 animal PVs) and performed the largest ever comparative genomics analyses on PVs with regard to their global and taxonomical distribution, host and niche specificity, population and phylogenetic stratification, as well as gene repertoire and genetic variants, especially those contribute to PVs’ pathogenicity. Substantial variation in PV’s genome size and GC% was noticed, with the latter being positive correlated with those of their hosts, suggesting a co-evolution between PVs and their hosts. With time-resolved phylogenetic analysis, a high-resolution view was obtained regarding the early divergence, niche adaptation, and horizontal host switch events that collectively shaped PVs’ evolution. Key evolutionary events and genomic variants that defined their early divergence and carcinogenicity were identified, with those high-risk-associated PV variants showing higher potential for host immune evasion. Based on 90 independently acquired clinical samples of cervical squamous cell carcinoma (CSCC) and oropharyngeal squamous cell carcinoma (OPSCC) patients, we further discovered a set of HPV16 variants that predict patients’ survival. Such illumination on the relationship among PV’s genome evolution, pathogenicity, and clinical prognosis paves the road for better prevention, control, and treatment of PV-related cancers.
The pathogenicity evolution of papillomavirus revealed by a compendium of 3,562 human and animal papillomavirus genomes