Coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has escalated into a global pandemic, profoundly affecting public health and the global economy. Since the release of the first SARS-CoV-2 genome, thousands of genetic variants have emerged from patient samples worldwide. Our study categorizes SARS-CoV-2 genomes in the early outbreak into two primary lineages, L and S, based on variants at positions 8782 and 28144. Evolutionary analysis indicates that the S lineage is ancestral, with the L lineage subsequently evolving from it. By analyzing 271 COVID-19 patients from the early outbreak, we observed significant differences in clinical severity between the L and S lineages, suggesting that lineage tracking can offer insights into disease management. Our research also identified extensive epistasis and advantageous compensatory mutations among closely linked variants in SARS-CoV-2. Notably, the rising number of infections has accelerated the virus's evolution. The S gene, especially the S1 region, displays signs of positive selection in both SARS-CoV-2 and other coronaviruses. While the S1 N-terminal domain shows positive selection signals across all coronavirus genera, it's the S1 C-terminal domain (receptor-binding domain) that primarily undergoes positive selection in SARS-CoV-2. Furthermore, we observed that SARS-CoV-2 favors codons less preferred by humans. This codon deoptimization might attenuate protein expression as the virus evolves. Our findings underscore the significance of codon usage in viral evolution and shed light on optimizing codons for mRNA and DNA vaccine development.
Evolutionary dynamics of SARS-CoV-2