Introduction

Late in December 2019, a number of patients with pneumonia of unknown cause were observed in Wuhan, China. Deep metagenomic sequencing of bronchoalveolar lavage fluid samples of the first patients resulted in the identification of a new SARS-like coronavirus [1, 2]. The novel virus was designated as SARS-CoV-2 by the International Committee on Taxonomy of Viruses, and the disease related to this virus was named coronavirus disease 19 (COVID-19) by WHO [3]. Prior to the discovery of SARS-CoV-2, six human-pathogenic coronaviruses (hCoVs) – NL63, 229E, OC43, HKU1, SARS-CoV, and MERS-CoV – had been identified [4]. While the first four were relatively benign, SARS-CoV (severe acute respiratory syndrome coronavirus) and MERS-CoV (Middle East respiratory syndrome coronavirus) were highly pathogenic [4, 5]. SARS-CoV first emerged in southern China and caused an epidemic that lasted from November of 2002 to July of 2003; over 8,000 infected cases and 774 deaths in more than 25 countries were reported during this period [6,7,8]. MERS-CoV appeared in Saudi Arabia in 2012, and by 2020, approximately 2,500 cases of MERS-CoV infection and over 850 deaths had been confirmed in 27 countries [8, 9]. The new virus, SARS-CoV-2, which contains a linear positive-strand RNA genome of about 30,000 nucleotides, is associated with mild to severe clinical symptoms and sometimes death [1, 10]. The higher person-to-person transmission rate of SARS-CoV-2 as compared with the other hCoVs has led to rapid spread of this virus, and on March 11, 2020, the World Health Organization (WHO) declared this spread to be a global pandemic [11]. Various lines of evidence support a possible contribution of recombination events to tropism and adaptation of coronaviruses to new host species and to the emergence and diversity of human pathogenic coronaviruses, including SARS-CoV-2 [12,13,14,15,16,17,18,19]. Recombination may be a normal feature of CoV replication, and its role in the emergence of new strains may be quite significant [20].

Emergence and expansion of SARS-CoV-2 clades

Very shortly after the sequencing of SARS-CoV-2 genomes isolated from the first infected individuals from China, thousands of whole-genome sequences of virus isolates from across the world were submitted to public databases such as GISAID (the Global Initiative on Sharing All Influenza Data; GISAID https://www.gisaid.org/). Moreover, other bioinformatics platforms including Nextstrain (https://nextstrain.org/ncov/global) were developed that continuously monitor the submitted sequences and enable real-time tracking and visualization of sequence variations. These developments and other analyses have allowed close monitoring of the evolution of the SARS-CoV-2 genome. Analysis of various SARS-CoV-2 genome sequence data sets has led to an estimated evolutionary rate of approximately 1.1 × 10-3 substitutions/site/year [21]. Although rapid, the mutation rate is lower than that of some other RNA viruses, and this may partly be due to presence of a 3'-to-5' exoribonuclease (ExoN)-encoding gene in the SARS-CoV-2 genome whose protein product may correct some of the errors that occur during replication [22]. In a survey of 2,790 complete and high-coverage sequences available on April 2, 2020, in GISAID, 13 distinct haplotype groups (H1-H13) and their associated daughter haplotypes were described. Each haplotype was defined by the co-segregation of two or more Tag nucleotide sequence variations in a significant number of available genome sequences [23]. Some haplotypes, after initial emergence, spread to various regions of the world, while others remained localized or were eliminated. For example, the haplotype group H2, which contained the sequence variations 8782C>T and 28144T>C, first appeared in the Far East but soon spread to other continents. This haplotype group had earlier been recognized as the S lineage by others [24]. On the other hand, a haplotype (H13) with a large deletion (27848_28229del), after its appearance and initial local expansion in a country in the Far East, essentially disappeared with the passage of time [23, 25]. Differences in the history of sequence variations at least partly reflect their contribution to viral fitness. This is well exemplified by the haplotype H1, defined by co-segregation of four variations: 241C>T, 3037C>T, 14408C>T, and 23403A>G. This haplotype is named clade G in GISAID and 20A in Nextstrain. 23403A>G causes substitution of aspartic acid at position 614 of the spike glycoprotein by glycine (p.D614G) [26]. The p.D614G variation had a rare global presence in March 2020 but became a dominant variation that was present in more than 74% of available sequences by June 2020 [26, 27]. Analysis of approximately 50,000 complete and high-coverage sequences collected between August 1 and November 15, 2020, showed that 99.7 percent of sequences had the four defining sequence variations of clade G (https://cov.lanl.gov/content/index) [28]. It has been reported that this variation confers a competitive advantage because of enhancement of viral replication, viral load, and infectivity [26, 27, 29,30,31]. The SARS-CoV-2 spike protein has two functional subunits, S1 and S2, which play crucial roles, respectively, in the attachment of the virus to the host-cell receptor (angiotensin-converting enzyme 2; ACE2) through its receptor-binding domain (RBD) and mediation of fusion of viral and host membranes, leading to entry of the virus into the host cell [32, 33]. At the molecular level, p.D614G induces an open conformation in RBD, which facilitates virus entry [34, 35]. Some subclades, including GR, GH, and GV, have evolved from the G clade. Three adjacent nucleotide variations (28881_28883GGG>AAC in N) define GR (H1a), one (25563G>T in orf3a) defines GH (H1b), and seven (445T>C in nsp1, 6286C>T in nsp3, 21255G>C in 2ʹ‐O‐ribose methyltransferase, 22227C>T in S, 26801C>G in M, 28932C>T in N, and 29645G>T in orf10) define GV (H1r) (https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/) [28]. The three-nucleotide variation in subclade GR causes the amino acid substitutions p.R203K and p.G204R in the nucleocapsid protein, which is encoded by the N gene. The wild-type N protein binds the virus genome and establishes a condensed organization that is crucial for virion assembly and protection of the viral genome against host RNA sensors and, consequently, immune surveillance [36, 37]. Formation of the RNA-protein complex relies on a liquid–liquid phase separation (LLPS) of the N protein [37, 38]. It has been suggested that p.R203K and p.G204R affect this physiochemical process in a manner that causes an increased tendency of the N protein to undergo LLPS and greater inhibition of host antiviral responses [37]. The 25563G>T variation in GH causes p.Gln57His in the protein encoded by orf3a, which is a putative viral ion channel [39]. However, cryo-electron microscopy and bioinformatics studies on the ORF3a structure have predicted that p.Gln57His does not affect the permeability properties of the ion channel [40, 41]. The 22227C>T variation of the GV subclade causes p.A222V in the N-terminal domain (NTD) of the spike protein, but this variation was reported not to inhibit binding of monoclonal antibodies specific for the spike NTD [42]. It is noteworthy that new sub-clades (lineages) are continually being identified (Pango lineages; https://cov-lineages.org/index.html) [43].

The issue of SARS-CoV-2 re-infection

Soon after the beginning of the COVID-19 outbreak, various companies and governments initiated programs aimed at the development of anti-SARS-CoV-2 vaccines. It is believed that vaccination of a substantial proportion of the world population can limit the outbreak by ultimately reducing the number of circulating viruses [44]. In a study in which symptomatic patients infected very early in the pandemic were followed up, it was found that spike RBD- and nucleocapsid (N)-recognizing IgG antibodies that could bind and neutralize viruses were present at high positivity rates and at high titers for more than six months after recovery. The data were suggestive of durable humoral immunity and long-lasting protection [45]. On the other hand, multiple reports of SARS-CoV-2 re-infection in individuals who had only recently recovered from an earlier infection have created concerns for scientists and health officials. These reports raise questions about the duration of immunity and the protection conferred by anti-COVID-19 vaccines/antibodies [46,47,48,49]. It is hoped that repeated booster vaccine shots may induce long-term immunity and increase vaccine effectiveness [50, 51].

Emergence of SARS-CoV-2 variants of concern with large numbers of sequence variations

A SARS-CoV-2 variant of interest (VOI) is a virus strain associated with an increase in public-health-related parameters such as transmissibility, pathogenicity, severity of clinical presentation, therapeutic escape, and antigenicity (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/). A SARS-CoV-2 variant of concern (VOC) is a strain associated with more-drastic changes in these parameters. As of the end of July 2021, four VOIs (eta [η], iota [ι], kappa [κ], and lambda [λ]) and four VOCs (alpha [α], beta [β], gamma [γ], and delta [δ] were recognized by WHO [https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/]. Although various SARS-CoV-2 strain nomenclatures have been used, WHO has proposed that the letters of the Greek alphabet be used as labels for global SARS-CoV-2 variants of interest and variants of concern alongside the scientific nomenclature in communications about the variants to the public (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/). The objective was to introduce easy-to-pronounce and non-stigmatizing labels. As the scientific community is also familiar with the more facile Greek letter nomenclature of the VOCs, these (sometimes in conjunction with the scientific names) will hereafter be used for the strains discussed.

In mid-December 2020, a new lineage of the GR clade with an unusual cluster of 17 additional non-synonymous, nonsense, and deletion mutations was identified in the United Kingdom (UK). It was designated as VOC lineage B.1.1.7 (lineage alpha). Almost half of the new mutations were located in the spike protein, including the mutations Δ69-70, p.N501Y, and p.P681H, which affect key epitopes of the protein. The possible effects of these mutations are discussed below. No lineage that resembled this strain had been reported previously [52,53,54]. Based on data available at GISAID on June 13, 2021, GV was the major clade circulating in the UK in November 2020; this dominance was taken over by the alpha strain by the second half of December 2020, indicative of a rapid and remarkable shift in the virus profile of the UK [28]. The alpha strain was reported in approximately 90 countries/territories by the end of December. An unprecedentedly rapid global spread of this strain was observed by the beginning of March 2021. Soon after identification of the alpha strain, other potentially dangerous lineages with large numbers of mutations were observed in South Africa (B.1.351; beta), Brazil (B.1.1.28.1/P.1; gamma), and India (B.1.617.2; delta) [55,56,57,58]. Based on strain designations by GISAID on June 13, 2021, more than 93% of samples collected in the United Kingdom in April 2021 had alpha strain sequences, whereas among samples collected between May 1 and June 7, 2021, only 38% had the alpha strain sequence and 60% had the delta strain sequence. This suggests a marked rapid displacement of the alpha strain by the delta strain. Rapid expansion of VOCs suggested that they may have properties that confer a fitness advantage. The emergence of VOCs raised questions about their origins, and their rapid expansion raised questions and concerns regarding their infectivity and transmission rate, and more importantly, the effectiveness of the vaccines being developed in targeting the novel strains.

Concerns pertaining to the effectiveness of current vaccines against VOC lineages

Recent empirical and population-based findings suggest that the alpha strain has a significant transmission advantage over earlier strains, causes a higher virus load, and is associated with a higher risk of infection for individuals under 20 years old. A substantial correlation between the strain and disease severity or mortality rate was not observed [59,60,61]. Antibody sensitivity experiments indicate that viruses of the G clade (the earliest strains shown to have the p.D614G variation) and the alpha strain viruses are highly susceptible to neutralization by antibodies present in sera or nasal swabs from individuals who had recovered from infection with viruses circulating before the emergence of these strains, or from individuals who had been vaccinated with existing vaccines [62,63,64]. However, there is substantial evidence that antibodies in the sera of convalescent individuals who had been infected in early 2020 and antibodies induced by some vaccines have decreased neutralization potential against the beta, gamma, and delta VOCs [51, 62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79].

The alpha, beta, and gamma SARS-CoV-2 strains all share the mutation that causes the p.N501Y change in the RBD domain of the spike protein. In fact, position 501 is within the region of the RBD (receptor-binding motif; RBM) that makes contact with the host ACE2 receptor [80]. It has been shown that the p.N501Y change plays a critical role in promoting the increased infectivity and transmissibility of the alpha strain. Increased infectivity may be a consequence of a gain in replication fitness conferred by the amino acid change, and increased transmissibility may be a consequence of considerable strengthening of the spike/cell receptor interaction [81, 82]. The beta and gamma strains also share the p.E484K-causing mutation in the same domain. Additionally, the lysine at position 417 of the earliest sequenced SARS-CoV-2 strains is changed to asparagine and threonine, respectively, in the beta, and gamma strains [55, 58]. It is believed that mutated amino acids at positions 417, 484, and 501 result in a tighter interaction between the spike protein and the ACE2 host cell receptor, probably by induction of conformational changes. The p.E484K mutation may also cause a favorable change in the charge of the flexible loop region of the spike RBD [83]. Reduced neutralization of the beta and gamma strains by antibodies in the sera from convalescent patients may be partly or largely attributable to steric clashes and charge switches at antibody-binding sites caused by the p.E484K mutation in these strains [75, 78, 83,84,85,86,87,88]. The alpha (p.P681H) and delta (p.P681R) strains both contain a variation at position 681 of the spike protein. The mutations that affect p.681 are believed to be associated with enhanced furin cleavage at the suboptimal S1/S2 junction site during virus and host-cell membrane fusion, which may ultimately result in increased transmissibility of the virus [89,90,91]. The delta strain contains a unique p.L452R variation in the RBD domain of the spike protein. Antigenicity experiments have shown that this mutation is linked to a marked resistance to some neutralizing monoclonal antibodies [92]. The Δ69-70 deletion (p.69-70delHV) in the spike protein that is observed only in the alpha lineage is proposed to contribute to increased infectivity, possibly by promotion of spike protein cleavage [93]. The apparent biological and molecular consequences of some of the potentially important sequence variations observed in SARS-CoV-2 VOCs are summarized in Table 1.

Table 1 Putative effects of mutations in the spike-encoding genes of SARS-CoV-2 VOCs alpha, beta, gamma, and delta

Figure 1 shows the defining genome sequence changes of the four VOCs described above and the effects of these changes on the amino acid sequences of the encoded proteins. The distribution of the amino acid changes in the domains of the spike protein is shown in Figure 2. Shared variations between any two or among three or four of the VOCs are also shown in the two figures. These figures are based on analysis of 50 alpha, beta, gamma, and delta SARS-CoV-2 genome sequences from the UK, South Africa, Brazil, and India, respectively, that were designated as the respective strain in GISAID. GISAID, in addition to serving as a database for reported SARS-CoV-2 genome sequences, itself designates the lineage associated with each of the sequences (https://www.gisaid.org/). The sequences chosen for the analysis included the first and most recently (as of June 21, 2021) reported sequence associated with the lineage, and other sequences fairly equally distributed with respect to date of sample collection in the interim (Supplementary Tables S1-S4). Nucleotide sequence variations in the chosen sequences were identified by alignment to the reference genome sequence of the Wuhan‐Hu‐1 isolate (NC_045512.2) as described previously [23]. Haplotype network and phylogenetic analyses of all the sequences were also performed as described previously [23]. Figure 3 shows haplotype networks based on whole-genome sequences of the VOCs, and Figure 4 shows the networks based only on the S gene sequences. Phylogenic trees were also constructed based on whole-genome and S gene sequences (available upon request). As expected, the sequences pertaining to each of the four VOCs cluster together in the networks and in the phylogenetic trees. As compared to the whole-genome sequence network, there is less branching at each of the foci in the S gene network; this is because there is less variation among the 50 S gene sequences of each of the VOCs than among the 50 whole-genome sequences of each of the VOCs. Figures 1 and 2 show the variations in each of the strains that were present in at least 80% of the 50 selected sequences. There were 32 (Supplementary Table S1), 21 (Supplementary Table S2), 35 (Supplementary Table S3), and 20 (Supplementary Table S4) sequence changes among the alpha, beta, gamma, and delta genome sequences, respectively, that were present in at least 40 of the 50 sequences (the 80% threshold). All of the sequence changes described for each VOC were usually, but not always, present together. All of the sequence changes pertaining to the alpha, beta, gamma, and delta VOCs that are described in Figure 1 were present together in 50 (100%), 37 (74%), 34 (68%), and 38 (76%) of the 50 sequences of the respective VOC genomes. These data suggest that the set of variations described for the alpha strain indeed describes a robust haplotype; the haplotypes that are described for each of the remaining three VOCs may be less robust. Four sequence variations that define clade G (major haplotype group H1) were shared among all four VOCs, suggesting that the VOCs are derived from this major clade. Furthermore, the alpha and gamma VOCs contain the subclade GR (H1a sub-haplotype)-defining variations (28881_28883GGG>AAC), and the beta VOC has the subclade GH (H1b sub-haplotype)-defining variation (25563G>T). These observations are consistent with these three VOCs having evolved from GR or GH subclades or having been products of recombination events in which one of the participating genomes was GR or GH. The haplotype network based on whole-genome sequences also suggests a common origin (presumably GR) for the alpha and gamma VOCs (Fig. 3). A mutation (11288-11296deltctggtttt) in nsp6 that causes a three amino-acid-deletion (p.106-108delSGF) in the encoded protein was shared among three VOCs (alpha, beta, and gamma strains). The nsp6 (non-structural protein 6) protein of coronaviruses has been suggested to affect autophagy and possibly thus promote viral infection [96, 97]. Additionally, it has been shown that the SARS-CoV-2 nsp6 affects the host cell immune response by suppression of interferon production [96]. To the best of our knowledge, the effect of the p.106-108delSGF mutation on these nsp6 functions has not been studied.

Fig. 1
figure 1

Defining genome sequence variations of SARS-CoV-2 variants of concern (VOCs). A schematic representation of the SARS-CoV-2 genome is shown at the top of the figure. Nucleotide sequence variations pertaining to the alpha (B.1.1.7), beta (B.1.351), gamma (P.1), and delta (B.1.617.2) lineages (relative to Wuhan-Hu-1 sequence at https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2) and effects on amino acid sequences of the encoded proteins are shown. The defining sequence variations were selected as described in the text. Green, red, and blue dots indicate synonymous, non-synonymous, and nonsense mutations, respectively. Black dashes represent indels. 5' UTR, 5'- untranslated region; nsp1-16, non-structural proteins 1-16; orf1ab, open reading frame 1ab; S, spike; orf3a, open reading frame 3a; E, envelope; M, membrane; orf6, open reading frame 6; orf7a, open reading frame 7a; orf7b, open reading frame 7b; orf8, open reading frame 8; N, nucleocapsid; orf10, open reading frame 10; 3' UTR, 3'- untranslated region. ***, variations present in all four VOCs; **, variations present in three VOCs; *, variations present in two VOCs

Fig. 2
figure 2

Effects of defining genome sequence variations of SARS-CoV-2 variants of concern (VOCs) on the domains of the spike protein. A schematic representation of the domains of the spike protein is shown on top of the figure. Amino acids affected by the defining genome sequence variations of the alpha (B.1.1.7), beta (B.1.351), gamma (P.1), and delta (B.1.617.2) lineages are indicated, and their corresponding genome positions are shown in parentheses. S1, S1 subunit: S2, S2 subunit; S1/S2, S1 and S2 interface – spike cleavage site; SP, signal peptide; NTD, N-terminal domain; RBD, receptor-binding domain; RBM, receptor-binding motif; HR1, heptad repeat 1; HR2, heptad repeat 2; TM, transmembrane domain, CP, cytoplasmic domain. Green, red, and blue dots indicate synonymous, non-synonymous, and nonsense mutations, respectively. Black dashes represent deletions. ***, variations present in all four VOCs; **, variations present in three VOCs; *, variations present in two VOCs

Fig. 3
figure 3

Haplotype network of 200 whole-genome sequences of VOC. The nodes represent the various whole-genome sequences. Because of space limitations, the samples are abbreviated as alpha1, alpha2, etc., and the corresponding GISAID IDs are presented in Supplementary Table S5. The number of crossbeams on the connecting lines between nodes represents the number of sequence differences between the respective sequences.

Fig. 4
figure 4

Haplotype network of 200 S gene sequences of VOCs. The nodes represent the various S gene sequences. Because of space limitations, the samples are abbreviated as alpha1, alpha2, etc., and the corresponding GISAID IDs are presented in Supplementary Table S5. The number of crossbeams on the connecting lines between nodes represents the number of sequence differences between the respective sequences.

With regard to vaccine efficacy, a promising finding was increased effectiveness of second-dose vaccine shots against the alpha and delta strains. This emphasizes the importance of booster vaccinations [51]. Also, it has been reported that a G614 pseudovirus was more susceptible to neutralization by sera from earlier infected convalescent individuals than viruses with the D614-encoding genotype. The molecular explanation for this may be the RBD “up” conformation induced by p.D614G. In any case, the finding that G614 is not an escape variation implies that the global dominance of G614 over D614 is not expected to affect the effectiveness of the SARS-CoV-2 vaccines that have been developed [35].

Potential sources of changes in the SARS-CoV-2 genome

Genome replication infidelity [98], intra-host viral evolution in prolonged infections, which predominantly occurs in immunocompromised individuals [54, 99,100,101,102], host RNA-editing systems [103], and within-host recombination events occurring subsequently to co- or super-infection with different circulating strains [104] are considered the potential sources of SARS-CoV-2 genome sequence variability. Random mutations account for the majority of genomic variations [103, 105]. Although most immunocompromised persons effectively clear SARS-CoV-2 infection, accelerated viral evolution in persistently infected immunocompromised individuals has been reported [54, 103, 106]. In a publication that reported findings on a carefully monitored patient, it was shown that a cluster of mutations accumulated rapidly, sometimes within a span of approximately 50 days [99]. These new mutations disproportionately changed amino acids in the spike protein and in its receptor-binding domain. Definitive empirical evidence for involvement of cellular RNA editing processes in promoting sequence changes in the SARS-CoV-2 genome is lacking. However, the observation of an enrichment of C-to-U transitions among the reported single-nucleotide variations in the genome and the known activities of enzymes of the editing process (including the APOBEC cytidine deaminases and ADAR adenosine deaminases) that promote such changes suggest that RNA editing may contribute to the variability of the SARS-CoV-2 genome [103, 107,108,109].

Presently, recombination may be the best candidate competitor of genome replication errors as the driving force for emergence of new SARS-CoV-2 strains. Recombination occurs with many RNA viruses, and the contribution of recombination to the evolution of betacoronaviruses is well established [4, 19, 110,111,112]. Recognition of recombination between SARS-CoV-2 viruses requires co-infection with viruses with distinct sequences. Co-infection with different SARS-CoV-2 strains has been documented [113,114,115]. With increased frequencies of infections worldwide, at least partly due to the emergence of strains with higher transmissibility, it is expected that the frequency of co-infections will also rise. Since the new strains have distinct combinations of variations, there is also a greater possibility of co-infection with strains with different combinations of sequence variations and subsequent recombination events that would produce novel genome sequences. To the best of our knowledge, the first report of recombination between SARS-CoV-2 strains was published in August of 2020 [104]. In a few subsequent studies, recombination events between SARS-CoV-2 strains were surmised on the basis of defining markers of major clades or sequence variations in locally circulating strains [26, 105, 115,116,117]. It is not unlikely that some recombinant genomes have not been recognized because of technical issues in sequence analysis [105]. In light of the importance of recombination vis a vis the emergence of novel virus strains with a large number of mutations, efforts must be made to resolve these impediments [54, 106]. Although strains with large numbers of variations may have evolved by accumulation of sequential mutations and subsequent expansion due to stochastic events or selection, an alternative scenario is that they are products of one or more recombination events between virus strains that each contained a subset of the variations. It is to be noted that coinfection and recombination are most likely to occur between locally or globally dominant strains [26, 115]. In a recent study in which SARS-CoV-2 genome sequences in various cities of a single country were analyzed using non-conventional protocols, notably high frequencies of co-infection as well as sequences that were probable products of recombination events between locally or globally dominant strains were indeed observed [115]. For example, one putative recombinant genome simultaneously had GR and GH marker sequence variations. Selective advantages may have contributed to the dominant frequency status of some strains, including some VOCs, and recombination between any two of these strains may produce a recombinant strain that has the combined advantages associated with both of the parental strains. This renders the recognition of such possibly evolving strains even more important. The proposed role of recombination in the emergence of new strains is reminiscent of the origin of SARS-CoV-2 itself [15, 16].

Conclusion

Since the first identification of SARS-CoV-2, numerous variations in the genomes of evolved viruses have been identified. It is expected that the majority of evolved variations have neutral effects and that those with highly deleterious consequences for the virus will be rapidly eliminated. Mutations that confer an advantage to the virus, such as p.D614G, tend to be retained and can lead to a notable shift in the spectrum of the local or worldwide virus population in a short time period. The presence of a significant proportion of the retained frequent variations in the spike S1 subunit, and especially in the RBD domain, of the variants of concern and the proposed pathogenic consequences associated with these sequence variations provide strong evidence of the critical importance of S1 in terms of infectivity and transmissibility as well as immune escape. Some reports emphasize the biased concentration of recombination events in the spike coding gene of SARS-CoV-2 [19, 26]. Given the essential role of the spike protein in the infection process, this protein or its main domains have been targeted in developing immunization protocols. In the face of resistance of some variants to antibody neutralization, the robust cross-reactive responses elicited by the South African strain against other virus strains are somewhat reassuring. This observation suggests that vaccine development strategies should take into consideration features of the spike proteins of the VOC strains [118]. The continuous tracking of novel and potentially clinically important sequence variations and recombinant and/or highly mutated strains is of immense importance in light of public health, disease control, and design of new preventive immunization strategies.