Introduction

Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can induce fever, severe respiratory illness, and various multi-organ disease manifestations [1]. The SARS-CoV-2 virus predominantly attacks lung cells, which can lead to pneumonia and acute respiratory distress syndrome [2].

SARS-CoV-2 viral particles, which range from 60 to 140 nm size, contain single positive-stranded ribonucleic acid (RNA) with a length of 26 to 32 kb. Sequencing of the virus revealed six major open-reading frames and several accessory genes encoding spike (S) protein, 3-chymotrypsin-like cysteine protease (also termed main protease), papain-like protease, and RNA-dependent RNA polymerase [3, 4]. Structurally, SARS-CoV-2 virus contains a S protein on its surface, which, upon binding with host receptors, becomes enzymatically activated via host proteases, leading to viral fusion and endocytosis [5]. Inside the host cell, the viral, RNA-dependent RNA polymerase transcribes the viral genome, which is then translated by host ribosomes to biosynthesize viral proteins. Further, new virions are formed by budding into the lumen of the endoplasmic reticulum-Golgi intermediate compartments (ERGIC), and finally, they are primed for exocytosis [6] (Fig. 1).

Fig. 1
figure 1

Different processes during SARS-CoV-2 infection. Viral binding: the receptor-binding domain (RBD) of spike (S) protein interacts with the host cell surface receptors such as ACE2, GAGs, and other potential receptors; fusion: host proteases such as TMPRSS2, cathepsins, and furin cleave the S1 and S2 subunits, and the S2 subunit mediates the viral fusion; and entry: the virus enter the host cell by endocytosis or membrane fusion. Once inside the host cell, the RNA uses the host machinery to translate the viral proteins. The post-translational modifications happen on structural proteins by hijacking the host system and viral budding occurs at endoplasmic reticulum-Golgi intermediate compartments (ERGIC). Finally, the viral assembly occurs, and the virus is released by exocytosis

Recent structural analysis of the SARS-CoV-2 S protein by cryo-EM shows that it is extensively glycosylated, similar to SARS-CoV-1 S protein [7,8,9]. Moreover, the site-specific glycosylation analysis of S protein by our group and other groups through mass spectrometry revealed both N- and O-glycosylation [10,11,12,13,14,15,16]. Based on recent reports, each trimeric spike presents up to 66 N-linked glycosylation sites and several O-linked glycosylation sites [12, 16]. Site-specific glycosylation on virus-derived, wild-type non-stabilized and recombinant stabilized spike glycoproteins was compared in a latest study [16].

Glycans on the viral surface proteins are involved in the process of viral binding to host cells for entry, viral fusion, shielding of specific epitopes, and in the folding, stability, and protection of viral proteins [6, 17,18,19,20]. In general, the glycans in the vicinity of receptor-binding regions can negatively impact viral binding. Thus, the comparatively slim glycan shield of coronaviruses, in contrast to other viruses, may be advantageous for more efficient receptor binding [17]. An elevation of glycan shield densities and oligomannose abundance was observed in certain types of viruses, such as HIV-1, which can evade the immune system response very effectively [6]. The glycosylation on the surface proteins of viruses can hinder antibodies from binding by shielding the surface antigens with glycan envelopes. Glycans can undergo large internal motions, leading to challenges in their accurate description by any single three-dimensional (3D) shape [21, 22]. Recently, MD simulations have been conducted on glycoproteins to accurately predict the 3D shapes and glycan motions, similar to characterization of oligosaccharide conformations and dynamics by NMR [22].

The structural and functional roles of glycosylation on viral pathogenesis are enumerated in several studies. In Hendra virus, a bat-borne virus causing a highly fatal infection in horses and humans, fusion (F) protein contains five N-linked glycosylation sites, and the glycans at N414 are critical for fusion protein folding and transport [19]. A study on the Nipah virus showed that the removal of N-glycans on the fusion protein resulted in significant increase in viral fusogenicity but improved sensitivity to antibody neutralization [20]. Glycans decorating the viral envelope proteins account for half of the molecular weight of these carrier glycoproteins [23]. The glycosylation on the viral surface, where host glycans decorate the viral surface proteins, facilitates immune evasion by blocking humoral and cellular innate immune systems [24, 25]. Moreover, since the virus hijacks the host cellular machinery for its replication and protein glycosylation, the viral surface glycans are composed of host glycans that are recognized as self by the immune system and thus suppress the anti-carbohydrate immune response [26]. However, the immune system responds to glycosylated pathogens in several ways, such as either increasing or decreasing the expression of certain endogenous lectins during infections and thereby fighting against the pathogens through lectin-mediated defense mechanisms [27]. Differential hemagglutinin N-glycosylation affects T cell activation and cytokine production and can thus challenge the development of vaccines [28]. These types of immune evasion and shielding of the receptor-binding site with glycans are shown by viral glycoproteins of HIV-1 Env, influenza hemagglutinin, and Lassa virus envelope glycoprotein complex (LASV GPC) [29]. All this demonstrates how glycosylation helps the virus evade host innate and adaptive immune responses.

In general, during the viral evolution, protein sequences in viruses undergo mutations (antigenic drift), which can lead to loss of species specificity in the virus [30]. This can also lead to modulation of viral infectivity and surface protein antigenicity [6, 17]. These mutations can alter the glycosylation of the protein by generation of new or removal of existing glycosylation sites as reported in the case of influenza viruses [31, 32]. These changes in glycosylation can lead to new virus strains with increased ability to evade the host immune response, and this can attenuate vaccine efficacy [28, 31]. While these observations are mostly on viruses such as influenza, mutation studies on SARS-CoV-2 S protein show that some glycosylation sites are crucial for viral infectivity [33]. The most common mutation reported on the S protein is D614G mutation, although a total of 9654 mutations have been detected on 400 distinct sites on S protein [34].

In this review, we discuss the functional and structural roles glycans play in SARS-CoV-2 pathobiology and how these understandings can lead to future therapeutic interventions to tackle COVID-19.

Roles played by the structural proteins (S, M, E, and N proteins) in viral assembly

There are four major structural proteins of SARS-CoV-2, including the spike (S), membrane (M), envelope (E), and nucleocapsid (N) proteins. E and M proteins regulate intracellular trafficking of the S protein, as well as its intracellular processing [35]. When inducing the retention of the S protein inside cells, the E and M proteins provide a mechanism that allows targeting close to the virion assembly site, as well as limits processing to a fusion-active conformation and cell surface expansion, preventing syncytia formation. E or M protein co-expression with S protein alters the N-glycans of S, independent of their effect on S retention (Fig. 1). The E protein induces the retention of S protein by slowing down the cell secretory pathway independently of the retrieval motif harbored by the S protein cytoplasmic tail. All four structural proteins are required for optimal production of SARS-CoV-2 virus-like particles (VLPs). S expression alone does not induce secretions of VLPs; however, when this expression was combined with any other structural protein (E, M, or N), there was an increased formation on VLPs. Co-expression of all four structural proteins in combination induces VLP secretion more powerfully than other combinations of the four proteins [35]. Combined studies on different coronavirus strains suggest that this virus family utilizes their M proteins to evade the host innate immune system [36]. The M protein is important for the coronavirus budding process. During virus particle assembly, the M protein interacts with the N protein, E protein, S protein, and itself [37]. The M protein cooperates with the S protein during cell attachment and entry in alpha-coronaviruses [38]. A study comparing SARS-CoV-2 E and M proteins in humans with that found in bats and pangolins shows that the E protein is identical in these species. Structural similarities of human SARS-CoV-2 M and E proteins to their counterparts in bat and pangolin isolates, as well as differences specific to SARS-CoV-2 proteins could explain the cross-species transmission and properties of the virus [39].

SARS-CoV-2 spike protein glycosylation

The S protein of SARS-CoV-2 is a heavily glycosylated homotrimer protein with two subunits, S1 and S2, linked through transmembrane protease, serine 2 (TMPRSS2), and furin cleavage sites [7]. Subunit S1 facilitates attachment to host cell receptors through a receptor-binding domain (RBD), and subunit S2 is involved in fusion of viral and human cellular membranes [5, 7]. The glycosylation of the S protein could mediate S protein folding and modulates conformational dynamics of S protein, priming by host proteases, and immune evasions through glycan shielding [7, 40,41,42]. One of the earlier studies on coronaviruses showed that inhibition of N-glycosylation by treatment with tunicamycin led to spikeless virions because of improper protein folding [7, 42]. Two mutations in the spike of bat coronavirus HKU4 and consequent introduction of a new N-glycan site mediated entry of Middle East respiratory syndrome (MERS) coronavirus into human cells by allowing it to be primed by human proteases [41]. It was demonstrated that the coronaviruses employ conformational masking and glycan shielding of S trimer to evade immune recognition [7, 43].

Site-specific N-glycosylation of S protein

The S protein of SARS-CoV-2 expressed in human HEK-293 cells is extensively glycosylated on 22 N-glycan sites and a number of O-glycosylation sites (Figs. 2 and 3) [10, 11]. Site-specific analyses of N-linked glycosylation on S proteins expressed in the human cell line HEK-293 revealed extensive heterogeneity by showing high mannose-type glycans to highly processed complex-type glycans with sialylation and fucosylation [10, 11]. Site N234, which is adjacent to the RBD of SARS-CoV-2 S protein, displays high mannose-type glycans [6, 10]. Complex-type N-glycans with bi- and triantennary glycans and high mannose-type glycans were identified on sites N165, N331, and N343 [10, 11]. While sites N331 and N343 in the RBD region showed predominantly high mannose glycans when S protein is expressed in individual subunits S1 and S2, the same sites showed complex-type glycans (with 98% fucosylation) on trimeric form of S protein expressed on HEK293 cells [6, 10]. A recent quantitative N-glycan analysis on S protein subunit S1 isolated from SARS-CoV-2-infected Calu-3 cells via immunoaffinity purification showed high prevalence of complex-type N-glycans (79%) and 21% high mannose and/or hybrid structures [16]. The same study compared the different glycans on vaccine candidates and recombinant S protein with the wild-type virus S protein with an aim to help the vaccine design strategies and thereby enable high-quality immune response through correct immunogen presentation. The authors demonstrated that distinctive cellular secretion pathways result in variation in protein glycosylation.

Fig. 2
figure 2

The glycosylation of SARS-CoV-2 spike (S) protein. A 3D model of SARS-CoV-2 spike protein trimer showing the RBD region and glycosylation sites (only labelled on one monomer). B The site-specific N- and O-glycosylation of S protein [10, 11]

Fig. 3
figure 3

Distribution of N- (A) and O- (B) glycosylation on specific sites of SARS-CoV-2 spike (S) protein expressed in HEK293 cells [10]

An NMR study on the RBD domain of SARS-CoV-2 S protein allowed identification of the chemical nature and structural details, such as glycosidic linkages, and sulfation of the RBD glycans. They observed fucoses and GalNAc at the RBD domain and several unexpected glycan motifs such as 4-O-sulfated LacDiNAc, α2,6-sialylated LacDiNAc, LewisX (LeX), and fucosylated terminal GalNAcβ1-4GlcNAc (LacdiNAcFuc) along with terminal LacNAc, LacDiNAc, α2,3-linked sialyl (3′SLacNAc), and α2,6-linked sialyl (6′SLacNAc) fragments [44]. A recent report on the characterization of S protein glycosylation by a newly developed liquid chromatography-mass spectrometry methodology showed the presence of LacdiNAc structural motifs on all occupied N-glycopeptides and polyLacNAc structures on six glycopeptides [12].

Roles of S protein glycosylation in viral binding, fusion, entry, and immunogenicity

Specific types of glycans were observed at each site of the S protein, and the type of glycosylation in the RBD region, such as sites N331 and N343, are critical for viral infectivity [33]. By blocking N-glycan biosynthesis at the high mannose stage through both genetic manipulation and use of the small-molecule kifunensine, only minor changes were noted in spike-ACE2 binding. However, S protein N-glycosylation is important for viral entry to human cell models as the viruses lacking N-glycans enter the host cell less efficiently [45]. The glycans at sites N165 and N234 have roles on RBD conformational plasticity, as their presence stabilizes the RBD “up” conformation, permitting efficient binding to human angiotensin-converting enzyme 2 (hACE2) receptor. Deletion of these glycan residues through N165A and N234A mutations significantly reduced binding of S protein to ACE2 as a result of a conformational shift of the RBD toward the “down” state, hampering accessibility to ACE2 [40]. Coronaviruses have less high mannose-type N-glycans on their surface proteins than other viruses such as HIV-1 [6]. The structural mapping of glycans of MERS-CoV S proteins revealed that glycans contribute to the formation of a cluster of high-density oligomannose-type glycans at specific regions of the S protein [9]. N-Glycosylation sites N331 and N343, which are in the RBD region, and N165 and N234, which are adjacent to it, are suggested to be critical for immune recognition [29]. In addition, the N234Q mutant was significantly resistant to neutralizing antibodies and the N165Q mutant became more sensitive to antibody neutralization [46].

O-linked glycosylation on S protein

The presence of O-glycosylation at T323 and the plausible glycosylation at S325 have previously been reported by our group, and this was confirmed by several later studies (Fig. 3) [10, 14, 47]. More recently, O-glycosylation has been detected at residues T678, S686, and T1160 of SARS-CoV-2 S glycoprotein [47]. O-linked glycans such as Tn, core 1, mono and di-sialyl core 1, and sialylated core are reported on the S protein [10, 14]. The O-glycans located in the hinge region of RBD (T323 and S325) and those adjacent to the furin cleavage site (S686) have been suggested to play critical roles in viral binding and the membrane fusion, respectively [10, 47].

Interaction of SARS-CoV-2 with the host receptors

Recent extensive research on SARS-CoV-2 revealed multiple sources of viral entry and enhanced our understanding how the host system helps the viruses throughout the process of infection (Table 1). Walls et al. demonstrated that ACE2 is acting as the functional receptor for SARS-CoV-2 S-mediated entry into cells [7]. They showed that both SARS-CoV-1 and SARS-CoV-2 bind to hACE2 with comparable affinity. Both of these viruses depend on the hACE2 receptor for binding and the host membrane serine protease TMPRSS2 and cathepsin for S protein cleavage for subsequent activation [5, 29]. Interestingly, the presence of a furin cleavage site at the S1/S2 boundary of SARS-CoV-2, in contrast to SARS-CoV-1, which does not have such a cleavage site, is implicated as cause of increased infectivity of SARS-CoV-2. Abrogation of the furin cleavage motif reduced S-mediated SARS-CoV-2 entry into VeroE6 or BHK cells [7].

Table 1 Potential cell surface human receptors or factors for SARS-CoV-2 infection

Glycosylation of SARS-CoV-2 host receptor human angiotensin-converting enzyme 2

hACE2, which is a type I transmembrane protein, comprises an extracellular, a transmembrane, and a cytosolic domain with a total of 805 amino acids [51]. The transmembrane region of hACE2 can be cleaved into a soluble form of hACE2 (sACE2)—lacking the transmembrane and cytosolic domains—that is enzymatically active having a catalytic site and a zinc-binding motif [52]. This secreted hACE2 is involved in the renin-angiotensin system (RAS) [53].

hACE2 acts as a receptor for human coronaviruses SARS-CoV-1 and SARS-CoV-2, as well as human coronavirus NL63/HCoV-NL63 [5, 54, 55] (Figs. 1 and 4). Multiple studies have shown efficient infection of SARS-CoV-2 on cell lines and mouse models expressing hACE2 [7, 56].

Fig. 4
figure 4

3D model of human ACE2 showing the S protein binding region and the distribution of N- and O-glycosylation sites (A) and (B) are showing different orientations

The N-glycosylation site at N90 of hACE2 plays an important part in the interaction of the coronavirus with the ACE2 receptor (Fig. 4). An in silico study predicted the role N90 in the viral interaction with ACE2, and removal of the N-glycosylation motif by in silico mutation of N90 and T92 was associated with stronger interaction with SARS-CoV-2 virus, suggesting that glycosylation at N90 may impose steric hindrance for the RBD binding [57]. Another study also highlighted that all substitutions of N90 and T92 other than S92 (which retains N-glycosylation motif) enhances the RBD binding, and this enhancement may depend on the type of glycans on ACE2, which changes with expression cell types [58]. Devaux et al. showed in silico that species with ACE2 sequence containing K31, Y41, N90, and K353 are likely to be susceptible to SARS-CoV-2 infection [59]. The importance of several hinge regions and N-glycosylations including N90 of ACE2 is suggested based on the crystal structure analysis of ACE2 [59]. This study by Devaux et al. contradicts other studies on N90-mediated ACE2 binding mentioned above and thus demands more detailed experimental evidence on the roles of ACE2 glycosylation in viral binding. The glycan at N322 enabled tight interaction of ACE2 with the RBD, an opposite effect to that of N90 [29].

Quantitative site-specific N-linked and O-linked glycosylation of hACE2 has recently been reported by our group and others through glycomic and glycoproteomic approaches [60, 61]. Glycosylation at all seven potential N-glycosylation sites on hACE2 and one O-glycosylation site was described in detail (Figs. 4 and 5). Moreover, evidences for the presence of both core and antennal fucosylation, bisecting GlcNAc, and prevalence of 2,3-linked sialic acid were demonstrated in our study [60]. However, a recent study found that the effect of hACE2 sialic acids on the viral interaction is smaller than anticipated based on previous crystal structure and molecular modeling studies [45].

Fig. 5
figure 5

A, B Distribution of N- and O-glycosylation on specific sites of human ACE2 expressed in HEK293 cells [60]

SARS-CoV-2 and glycosaminoglycan interactions

Glycosaminoglycans (GAGs) are linear polysaccharides involved in a variety of biological processes, including wound healing, anticoagulation, cell signaling, and pathogenesis [62,63,64]. GAGs are covalently bound to a core protein, making up proteoglycans (PGs). PGs are found inside cells, on the surface of cells, and in the extracellular matrix [50]. The four main groups of GAGs are heparin/heparan sulfate (Hp/HS), chondroitin sulfate/dermatan sulfate (CS/DS), keratin sulfate (KS), and hyaluronic acid (HA) [65]. GAG chains can be modified with acetylation and sulfation. The uronic acid residues combined with sulfation modifications result in a net negative charge [65]. Hp is an FDA-approved anticoagulant, which exerts its effect by binding to antithrombin III [66, 67]. The WHO recommends the use of Hp in COVID-19 patients to reduce incidence of venous thromboembolism [50]. GAG-binding proteins contain amino acid sequences, Cardin-Weintraub motifs, which correspond to ‘XBBXBX’ and ‘XBBBXXBX’ where X is a hydropathic residue and B is a basic residue [50]. The basic residue is responsible for interacting with the sulfate groups present on the GAG chain.

SARS-CoV-1 and numerous other pathogens are known to utilize host cell surface GAGs during host cell entry. Kim et al. reported GAG-binding and GAG-binding-like motifs at sites 1–3 (453–459 (YRLFRKS), 681–686 (PRRARS), and 810–816 (SKPSKRS), respectively) of the SARS-CoV-2 S glycoprotein [50]. Human lung cells predominantly contain HS and CS GAGs, and the mast cells are rich in Hp. In a surface plasmon resonance (SPR) direct binding assay, both monomeric and trimeric SARS-CoV-2 S proteins (KD = 40 pM and 73 pM, respectively) bind more tightly to immobilized Hp than SARS-CoV-1 and MERS-CoV S protein (500 nM and 1 nM, respectively) [50]. Specific degree and position of sulfation are imperative for binding, with N-, 2-O, and 6-O sulfation required for binding to SARS-CoV-2. Hp, which has 6-O, 2-O, 3-O, and N-sulfation, and trisulfated (TriS) HS, which has 6-O, 2-O, and N-sulfation, both have therapeutic potential as competitive inhibitors against SARS-CoV-2 infection. When the receptor-binding domain is in open conformation, HS interacts with the GAG-binding motif at the S1/S2 site 2 (681–686 (PRRARS)) and at site 1 (453–459 (YRLFRKS)) [50]. A model displaying how GAGs influence SARS-CoV-2 host cell entry is shown in Fig. 6. A follow-up study to these findings investigated if the high binding affinity of Hp to the S protein of SARS-CoV-2 translates to a potent antiviral activity [68]. In vitro antiviral properties of Hp, HS, and CS were tested, as well as fucoidan, a sulfated polysaccharide composed of monomers. When testing the binding affinity of these polysaccharides to the SARS-CoV-2 S protein, two varieties of fucoidans, trisulfated (TriS) Hp and unfractionated USP-Hp, were able to compete with cell surface Hp for S protein binding. Alternatively, other GAGs, such as HS, CS, and KS, showed no competitive binding. When testing efficacy, a fucoidan-like, branched polysaccharide was substantially more potent than remdesivir (EC50 = 83 nM and 770 nm, respectively), which is an approved therapeutic for severe COVID-19 infections. Hp and TriS Hp, which differ only in one sulfation (Hp has 3-O-sulfation whereas TriS Hp does not), had lower activity than fucoidan-like compounds (2.1 μM and 5 μM, respectively). The higher activity of the fucoidan-like samples could be due to multivalent interactions between these polysaccharides and the virus. These results suggest that certain polysaccharide structures can be used as decoys to prevent SARS-CoV-2 S protein binding to the HS co-receptor in host tissues [68].

Fig. 6
figure 6

Kim et al. proposed model of SARS-CoV-2 host cell entry. A Virion binds to heparan sulfate. B Cell surface protease digests S protein, initiating viral-host cell membrane fusion via conformational change by host cell receptor binding to heparan sulfate and ACE2. C Virion enters host cell and experiences further proteolytic processing. Reprinted with permission from Elsevier [50]

Clausen et al. investigated the SARS-CoV-2 S protein interactions with cellular HS and ACE2 through its RBD [69]. The RBD of the SARS-CoV-2 S protein was found to bind to Hp/HS, likely through a docking site composed of positively charged amino acid residues. This is a separate docking site than that involved in ACE2 binding. SARS-CoV-2 S protein binds cell surface HS in a cooperative manner to ACE2 receptors. The binding of Hp/HS to S trimers enhanced the binding to ACE2. This suggests that cell surface HS works as a virus collector and mediator of the RBD-ACE2 interaction, resulting in more efficient viral infection. HS structures vary across tissue and cell types, and gender and age, possibly shedding some light on the different susceptibility of virus infection by different patient populations. Cell surface HS removed using a mixture of heparin lyases I, II, and III (HSase) in multiple cell types before SARS-CoV-2 infection prevented infection of cells. However, SARS-CoV-1 infection was not blocked by the removal of cell surface HS, which confirms the tighter binding of SARS-CoV-2 S protein to Hp than SARS-CoV-1. Hp binding to the SARS-CoV-2 S protein increased interactions with ACE2 [69]. A mechanism for SARS-CoV-2 infection utilizing host cell HS is shown in Fig. 7. Further work by Tandon et al. built upon these studies and utilized a lentiviral pseudotyping system to screen potential viral entry inhibitors [70]. When testing a variety of GAG structures, it was determined that 6-O sulfation is not necessary for inhibitors, as 6-O-desulfated Hp/HS showed no change in their ability to inhibit infection. Researchers also found that HS bound tightly with the pseudotyped lentivirus, making it a possible candidate as an adhesion co-receptor. These works outline the possibility of utilizing HS mimetics, degrading lyases and metabolic inhibitors of HS biosynthesis for therapeutic components against COVID-19 [69, 70].

Fig. 7
figure 7

Clausen et al. proposed mechanism of SARS-CoV-2 viral attachment facilitated by host cell heparan sulfate. Reprinted with permission from Elsevier [69]

Aside from Hp/HS, hyaluronic acid (HA) has also been found to influence SARS-CoV-2 infection. Specifically, genes encoding enzymes involved in upregulation of HA and GAG metabolism in bronchoalveolar cells infected by SARS-CoV-2 establish that inhibition of these GAG’s synthesis could contribute toward management of severe COVID-19 cases [71]. T CD4+ lymphocytes, neutrophils, and macrophages were also found infiltrating the lungs of COVID-19 patients. Increased amounts of macrophages have been identified in the lungs of deceased COVID-19 patients and are likely responsible for the inflammatory process [72]. Blood mononuclear cells were also tested and displayed a proliferative state. Control studies also displayed a dramatic reduction of NK and T lymphocytes and an increase in monocytes, which supports previous findings of changes in myeloid, NK, and B cells in COVID-19 patients [73]. These data show multiple molecular events that are likely involved in SARS-CoV-2 infection and the pulmonary complications known to occur with COVID-19 [71].

Other host receptors for SARS-CoV-2 S protein

Several recent studies have shown that many neutralizing human antibodies that bind to SARS-CoV-2 S do not bind the RBD, which suggests the possibility of other important host receptors and/or co-receptors that bind to different domain(s) of SARS-CoV-2 S protein and promote the entry of virus into host cells [46, 48, 74].

In a recent study, Wang et al. demonstrated that the tyrosine-protein kinase receptor UFO (AXL) specifically interacts with SARS-CoV-2 S protein, and overexpression of AXL promoted viral entry as efficiently as ACE2 overexpression. Significant reduction of pulmonary cell infection by SARS-CoV-2 was observed by downregulating AXL, but not ACE2. Moreover, they showed that soluble human recombinant AXL could block SARS-CoV-2 infection in cells expressing high levels of AXL, whereas soluble ACE2 did not show such an effect [48]. In another study, the roles played by neuropilin-1 (NRP-1) in increasing SARS-CoV-2 infectivity by binding with the furin-cleaved S1 fragment of the S protein were shown, as well as how blocking such interaction with a small-molecule inhibitor or monoclonal antibodies reduces the viral infection in cell culture [49].

It has been reported that the S protein of SARS-CoV-2 potentially binds to several innate immune receptors such as C-type lectin receptors (CLRs) [47]. CLRs bind to specific glycans via a Ca2+-dependent interaction [75]. Several CLRs such as DC-SIGN/CD209, L-SIGN/CD209L/CLEC4M, mannose receptor/MR/MRC1/CD206, MGL/CLEC10A/CD301, and Dectin-2/CLEC6A, which act as first line of defense against invading pathogens, are highly expressed in the human immune system, including monocytes, dendritic cells, and macrophages [76, 77]. Gringhuis et al. reported that CLRs like DC-SIGN can modulate Toll-like receptor–induced activation and thus direct host immune responses against pathogens in a glycan-specific manner [78]. A recent study showed that multiple CLRs including DC-SIGN, L-SIGN, MR (C-type lectin domains 4–7), and MGL can bind strongly to the recombinant full-length S produced in human embryonic kidney HEK293 cells [47].

Effects of glycan termini in viral binding

Viruses often target sialylated glycans and cell adhesion molecules to gain entry into the host cell, and the redundancy in such receptor preference indicates evolutionary conservation in the viral targeting to take advantage of host cellular function [79, 80]. Glycans that are terminated by sialic acids are expressed on cell surfaces and act as ligands for intrinsic or extrinsic sialic acid–specific lectins [79]. Most pathogens express sialic acid–specific lectins on their cell surfaces, which facilitates sialic acid–mediated host cell attachment and immune evasion [81]. Several RNA viruses and DNA viruses gain access to the host cells through sialylated glycans. Interestingly, host cell receptors evolve to combat rapidly emerging pathogens without affecting critical endogenous functions. Viruses express sialidases cleaving the sialic acids that enable virus binding to the cell in the first place, thereby affecting their release from infected host cells. Even though sialic acids may mediate virus binding and infection of cells, they can bind to virions as decoy receptors and thus prevent their access to host epithelial cells [82].

There are several strong indications that sialylated glycans can play important roles in COVID-19 infection [83]. In silico studies have shown evidence that the sialic acid termini on receptors can act as potential entry receptors for the SARS-CoV-2 [84]. Glycans are suggested to play crucial roles on specific sites on the receptor-binding domain in viral binding with hACE2 [5, 7, 85] (Fig. 4). A bioreporter based assay showed that deglycosylation of RBD glycans resulted in lack of interaction of RBD with hACE2 [86]. Several viruses, including coronavirus, bind more avidly to host glycoproteins such as hACE2 featuring α-2,3-linked sialylated glycans [60, 79, 87, 88]. In a study on avian coronavirus ligand interaction, six out of 10 N-glycosylation site mutants lost binding to host chicken trachea tissue, and an ELISA showed specific loss of interaction with ligand α-2,3-linked sialic acid [89]. A study demonstrated that human coronaviruses OC43 and HKU1 bind to 9-O-acetylated sialic acid (9-O-Ac-Sia) and identified the specific site through which this binding occurs [90].

Recent developments in the understanding of structural variants of glycans in the human system could guide in deducing the intricate interactions among different endogenous and exogenous sialic acid–specific ligands and host receptors. Moreover, such knowledge can help in understanding how pathogenic viruses fight the host system by modulating immune tolerance.

Post-translational modifications of M, E, and N proteins

M protein glycosylation

Unlike other components of the SARS-CoV-2 virus, the membrane protein (M protein) and envelope protein (E protein) of the SARS-CoV-2 virus have not been extensively studied and characterized. The M protein is a 222-amino acid glycosylated structural protein containing three N-terminal membrane-spanning domains that are essential for viral particle assembly [36]. Additionally, the M protein is the most abundant envelope protein of SARS-CoV-2 [91]. The M protein of SARS-CoV-2 resembles the M protein of bat and pangolin SARS-CoV-1. In silico analysis showed the SARS-CoV-2 M protein and bat and pangolin SARS-CoV-1 M protein resemble SWEET (sugars will eventually be exported transporter). SWEETs and semiSWEETs are unique sugar transporters with homologs found in all kingdoms of life [92]. SWEETs and semiSWEETs catalyze diffusion of sugars driven by their concentration gradients [93]. SWEETs of eukaryotes have 6–7 transmembrane helices; however, the M protein only has 3 transmembrane helices. This difference means the M proteins of SARS-CoV-2 more closely resemble semiSWEET, which possess only 3 transmembrane helices. SWEETs and semiSWEETs are predominantly found in organisms requiring a high efflux of sugars [93]. Enveloped viruses typically use a two-step procedure to infect and release genetic material into the cell. First, they bind to surface receptors of the target cell membrane; then, they fuse the viral and cell membranes. It is currently unknown how the M proteins are fused to the host cell membrane; however, if they do fuse, it is possible they function as a sugar transporter [91]. The presence of a sugar transporter may influence sucrose entry into the endosome, lysosome, and/or autophagosome, aiding in virus release into cells. The presence of this semiSWEET glucose transporter may be an efficient mechanism to induce rapid viral proliferation and immune evasion [91]. Additionally, in silico experiments were used to identify eight novel N-glycosylation sites of the M protein: N5, N21, N41, N43, N117, N121, N203, and N216. Six of these eight sites were common to both human SARS-CoV-2 and SARS-CoV-1. The main difference worth noting is the mutation in SARS-CoV-2 resulting in the addition of one amino acid. Therefore, though there are six sites in common, the location of these sites differs by 1 amino acid [94].

E protein glycosylation

The E protein is the smallest of the four major structural proteins and has the lowest copy number of the membrane proteins found in the lipid envelope of mature virus particles. The E protein has a short outer amino acid terminal domain, a single helix, and a long inner carboxy-terminal domain [91]. For other coronaviruses, the E protein has been shown to be critical for pathogenesis [95]. A comparison of the amino acid sequences of E proteins across six human coronaviruses is shown in Fig. 8, which displays small changes between the SARS-CoV-2 and the SARS-CoV-1 E proteins. The E protein is possibly mono-glycosylated at site N66, which could serve as a C-terminal translocation reporter. Based on the sequence, N48 could also be glycosylated; however, due to this site’s close proximity to the membrane if the hydrophobic region is recognized as transmembrane by the translocon, it is likely not glycosylated. Both sites, N48 and N66, are located C-terminally in the transmembrane segment [95]. A mutant with a highly efficient glycosylation acceptor site at the N-terminus was designed to test N-terminal translocation. When E protein constructs were translated in vitro in the presence of microsomes, the protein was significantly glycosylated when the N-terminal designed glycosylation site was present. However, when the glycosylation acceptor site was absent, E protein molecules were minimally glycosylated [95].

Fig. 8
figure 8

Alignment of E protein amino acid sequences comparing SARS-CoV-2 to six other human coronaviruses. Gray boxes highlight predicted transmembrane segments. SARS-CoV-2 native predicted glycosylation acceptor sites are shown bolded, with + or – symbols depicting charge. Orange highlighted residues are conserved; yellow highlighted boxes display differences between SARS-CoV-2 and SARS-CoV-1. Reprinted with permission from Royal Society Publishing [95]

Post-translational modification on N-protein

Unlike the other structural proteins of SARS-CoV-2, the N protein is located in the nucleocapsid, and does not go through the secretory pathway [96]. The N protein facilitates entering the host cell, binding to viral RNA genome, and forms the ribonucleoprotein core [97]. The N protein is able to form high-order oligomers in the absence of RNA. N protein is secreted in the presence of S protein, but independently of E and M protein expression. This indicates the N protein may help virion budding when co-expressed with S protein [35]. The N protein is capable of forming or regulating biomolecular condensates in vivo by interaction with RNA and other key host cell proteins [98]. This activity could then be harnessed to regulate viral life cycle and host cell response to viral infection. Cascarina et al. proposed that the N protein could harness the ability to form or join biomolecular condensates to dysregulate stress granules, enhance viral replication or translation of viral proteins, and package the viral RNA genome into new virions. Targeting host cell kinases or membraneless organelles could modulate N protein regulation and could be a viable solution for combating existing SARS-CoV-2 infections due to the vital role the N protein plays in multiple stages of the viral lifecycle [98]. N proteins of other coronaviruses have similar crystal structures and sequence homology to that of SARS-CoV-2 and are heavily phosphorylated [99]. As the N protein does not go through the secretory pathway, it is not expected to be glycosylated. The N protein of SARS-CoV-2 expressed on HEK293 is phosphorylated at S176 and not glycosylated unless it is forced through the secretory pathway by the addition of a leader sequence during expression in HEK293. The latter form of N protein is also phosphorylated at a different site (T393) [100]. Intriguingly, a study based on MS on the COVID-19 patient samples showed that N protein is detected in patient saliva after deglycosylation [101]. However, another study demonstrated that the recombinant N proteins which are expressed in the mammalian system with leader sequences for protein secretion can lead to a glycosylated form of N protein. Such artificially glycosylated N protein needs deglycosylation for ELISA-based detection of COVID-19 patient anti N protein antibodies. This also makes it clear that the viral N protein present on COVID-19 patients are unglycosylated, and thus the anti N protein antibodies can only recognize the unglycosylated form of N protein [99, 100].

Conclusions

Since the beginning of the SARS-CoV-2 pandemic, the virus and its host infection mechanism have been the focus of many research articles. The four structural proteins of the SARS-CoV-2 virus, spike (S), membrane (M), envelope (E), and nucleocapsid (N), have been extensively analyzed to determine their function within SARS-CoV-2 infection and severity of the resultant COVID-19 disease. Specifically, the glycans located on these proteins as well as at the cell surface of hosts have been shown to facilitate viral entry. Understanding the glycosylation pattern and the role this plays in viral attachment and infection can lead to therapeutic possibilities.

Recently, several vaccine candidates based on either RBD domain or full-length S proteins were approved and are being administered worldwide [29, 102]. The S protein is heavily glycosylated with both N- and O-glycans and directly interacts with hACE2 to facilitate infection. Several therapeutic strategies involving glycans or based on carbohydrate molecules were found effective in addressing SARS-CoV-2 infection. Employing vaccine candidates that could elicit effective immune responses through appropriate glycan display and viral immunogen with glycans that are unique to viruses are recommended for therapeutic purposes [29]. Preventing or reducing the host N-glycan biosynthesis and thereby preventing the viral glycosylation are other approaches being explored [103]. Molecules such as chitosan which can interact with viral surface glycans, particularly the S2 subunit of the S protein, were shown to reduce SARS-CoV-2 infection [104]. Several aminoglycoside antibiotics and polysaccharides (acarbose) displayed therapeutic effectivity against SARS-CoV-2 by binding with viruses, by preventing viral protein translation, or by improving host immune defense [29]. hACE2 acts as a receptor for human coronaviruses, and N-glycosylation of hACE2 is imperative for infection. The study of glycosylation on hACE2 and other known receptors of SARS-CoV-2 could help in understanding how these receptors present their glycan epitopes to the viruses. Such knowledge will help in development of therapeutics which can act as viral decoys in the case of infections. In addition to hACE2, it has been shown that cell surface GAGs also contribute to host cell binding to SARS-CoV-2. Specifically, cell surface heparan sulfate (HS) facilitates SARS-CoV-2 attachment to human host cells. Researchers have studied the possibility of using GAG and GAG-like structures for competitive binding with the cell surface HS. Treatment with heparin (Hp)/HS as well as with fucoidan-like structures results in a lower percentage of SARS-CoV-2 viral attachment to host cells compared to no treatment [68, 69].

In silico experiments have suggested glycosylation of the E and M proteins, but extensive glycosylation mapping is still needed to confirm these assignments and to help determine their importance. The M protein structure is shown to resemble a sugar transporter, which may influence sucrose entry into the endosome, lysosome, and/or autophagosome, aiding in virus release into cells. This sugar transporter may be an efficient mechanism to induce rapid viral proliferation and immune evasion [91]. Recent studies showed that the N protein of SARS-CoV-2 is not glycosylated but phosphorylated. The N protein can form or join biomolecular condensates to dysregulate stress granules, enhance viral replication or translation of viral proteins, and package the viral RNA genome into new virions [96,97,98]. The ability to modulate N protein regulation could be a viable solution for treating existing SARS-CoV-2 infections, due to the vital role the N protein plays in multiple stages of the viral lifecycle. Understanding the glycosylation and other post translational modifications of these proteins and receptors can help determine viable options to prevent SARS-CoV-2 infection and to treat COVID-19.

Herein, we have reviewed current studies on SARS-CoV-2 and the processes by which the viral proteins influence infection. By understanding these processes, therapeutic methods can be developed to help combat SARS-CoV-2 infection and to treat COVID-19.