Introduction

The global COVID-19 pandemic caused by SARS-CoV-2 has spread rapidly since December 2019 [1]. RNA sequencing of SARS-CoV-2 shows that it belongs to the beta-coronavirus genus and is most closely related to SARS-CoV [2]. The structural proteins of mature SARS-CoV-2 include spike (S) protein, envelope (E) protein, membrane (M) protein, and nucleocapsid (N) protein. All structural proteins can serve as antigens for vaccine development or targets for anti-viral treatment [3]. Among these proteins, the transmembrane S protein is a homotrimer and protrudes from the virus surface, which enables its binding to the angiotensin-converting enzyme 2 (ACE2) on host cells to promote fusion of viral and host cell membranes [4]. Since both the viral spike protein and human ACE2 are glycoproteins, their glycosylation patterns can affect their interactions and consequent vaccine design. Given its key role in virus entry and infectivity, S protein is an ideal target for developing vaccines and medical treatment to block or interfere the interaction between the virus and the host cell. Therefore, comprehensive profiling of the glycosylation patterns of the S protein is urgently needed to address the challenges posed by the pandemic.

Each spike protein consists of S1 and S2 subunits; the S1 subunit mediates binding of the virus to the ACE2 receptor, while the S2 subunit enables fusion of the virion with the cell membrane and initiates viral entry. Each monomer of the S protein consists of 22 predicted N-linked glycan sequons and all of them are occupied by complex, hybrid, and high-mannose glycan structures, which have been confirmed by both LC-MS/MS and Cryo-EM [5,6,7]. Furthermore, three O-glycosylation sites T323, S325, and T678 were identified in published data [8, 9]. Glycans on the S protein are important factors that can affect protein folding, stability, and potential target for immunization. Since glycans can shield the amino acid residues and other epitopes from cells and antibody recognition, glycosylation is able to make coronavirus evade both the innate and adaptive immune responses [10]. Blocking N- and O-glycan modification of the S protein will inhibit the SARS-CoV-2 viral entry [11]. Therefore, comprehensive characterization of the glycoforms and glycosylation sites of the S protein will facilitate the study of SARS-CoV-2 viral entry mechanism and the screening for inhibitors of glycosylation that may help develop the treatment of COVID-19 [12, 13].

In this study, to overcome the ion suppression effects from neutral glycopeptides for the detection of sialylated glycopeptides, we perform dual-functional Ti-IMAC enrichment for a recombinant SARS-CoV-2 S protein expressed in human embryonic kidney (HEK 293) cell line. Ti-IMAC was developed as a technique for enrichment of negatively charged phosphopeptides through electrostatic interaction [14, 15]. We later found that Ti-IMAC material contains numerous hydroxyl, amine, and phosphate groups, which is highly hydrophilic and enables enrichment of glycopeptides through hydrophilic interactions [16]. Therefore, the negatively charged sialylated glycopeptides can be captured by Ti-IMAC through synergetic electrostatic and hydrophilic interactions, while neutral glycopeptides were only captured through hydrophilic interaction. By performing a two-step elution, this strategy enables the simultaneous enrichment and separation of neutral and sialyl glycopeptides, which improves the glycoform coverage of the S protein compared with the conventional HILIC method.

Materials and methods

Protein source and reagents

The recombinant full-length SARS-CoV-2 spike protein (40589-V08H4, Met1-Pro1213-His-tag; R683A, R685A, F817P, A892P, A899P, A942P, K986P, V987P) expressed in the HEK 293 cell line was obtained from Sino Biological US Inc. (Wayne, PA, USA). The purity of the purchased protein was >90%. Trypsin and chymotrypsin, sequencing grade, were obtained from Promega (Madison, WI, USA). Dithiothreitol (DTT), iodoacetamide (IAA), urea, and TEAB buffer were purchased from Sigma-Aldrich (St. Louis, MO, USA).

Protein digestion

Aliquots of SARS-CoV-2 S protein were dissolved in 8 M urea/50 mM TEAB buffer (pH 8) to a final concentration of 8 mg/mL. The S protein was reduced with 10 mM dithiothreitol (DTT) for 120 min at 37 °C, and then alkylated with 30 mM IAA for 30 min in the dark at room temperature. The protein solution was diluted to 1 M urea with 50 mM TEAB and then trypsin was added to the samples with protein:trypsin ratio at 50:1 and incubated at 37 °C water bath for 8 h. After trypsin digestion, the protein was continually digested by chymotrypsin with protein:chymotrypsin ratio at 50:1 and incubated at 37 °C water bath for 16 h.

Glycopeptide enrichment

Aliquots of 2 μg digested S proteins were dried down in SpeedVac and prepared for either HILIC enrichment or dual-functional Ti-IMAC enrichment:

HILIC enrichment was performed with homemade HILIC stage-tip which used a neutral, polar material (PolyHYDROXYETHYL A: 12 μm and 300 Å from PolyLC Inc. (Columbia, MD, USA)). Briefly, 2 mg of cotton wool was weighed and inserted into a 200-μL pipette tip. The pipette tip was placed on a 2-mL microcentrifuge tube with the help of an adapter unit. HILIC beads were dissolved into 1% trifluoroacetic acid (TFA) solvent and the HILIC beads slurry was vortexed for 15 min to activate the HILIC beads. The HILIC slurry was transferred onto the top of the cotton wool, which was then centrifuged to remove the solvent at 500×g for 2 min. A total of 200 μL 1% TFA was added to flush the beads and removed at 500×g for 2 min. HILIC beads were conditioned with 200 μL 80% ACN/1% TFA three times. Then, 2 μg tryptic digested S protein sample was dissolved in 100 μL 80% ACN/1% TFA and loaded onto the HILIC stage-tip. The stage-tip was centrifuged at 500×g for 2 min and flow through of the sample was re-loaded to the HILIC stage-tip twice. The non-glycopeptides were washed away with 100 μL 80% ACN/1% TFA three times, and the glycopeptides were eluted with 100 μL 0.1% FA solution twice. The sample was immediately dried down in SpeedVac. Samples were stored at −20 °C and resuspended in 0.1% formic acid (FA) before LC-MS/MS analysis and 20% of each sample was loaded for analysis.

Dual-functional Ti-IMAC enrichment was performed in HILIC mode. A total of 2 μg S protein tryptic digest sample was loaded onto 100 μg Ti(IV)-IMAC materials with 100 μL 80% ACN/3%TFA buffer, which was similar to the conventional HILIC enrichment. An aqueous layer was generated across the substantial hydroxyl, amine groups, and phosphate chelated Ti (IV) ions on the material surface. The beads were washed twice with 100 μL 80% ACN/0.1%TFA buffer. After enrichment, the standard protein samples were eluted with the following two elution steps: (1) 100 μL 0.1% FA (v/v) twice, (2) 100 μL 6% TFA (v/v) twice. Samples were immediately dried down in SpeedVac. Samples were stored at −20 °C and resuspended in 0.1% FA. For LC-MS/MS analysis, 20% of each sample was loaded.

LC-MS/MS analysis

Enriched S protein glycopeptides were analyzed by Ultimate 3000 nanoLC coupled to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA). Resuspended samples were injected onto a 75 μm i.d. × 15 cm length homemade column with an integrated HF etched emitter tip and packed with 1.7 μm, 130 Å, BEH C18 material from a Waters UPLC column (Waters, Milford, MA). Peptides were separated with a gradient that ramped from 97% solvent A (0.1% FA in H2O) and 3% solvent B (0.1% FA in ACN) to 30% solvent B over 80 min and then kept each of 75% solvent B, 95% solvent B, and 100% solvent A for 10 min. The flow rate was set at 300 nL/min. MS method was set according to the following parameters: MS scan range (m/z) = 400–2000; resolution = 120,000; AGC target = 1.0e6; maximum injection time = 250 ms; included charge state = 2–6; dynamic exclusion duration = 30 s. MS/MS method was a top 20 data-dependent acquisition (DDA) mode in which all MS/MS dissociations were performed with stepped higher energy collisional dissociation (step-HCD): resolution = 60,000; AGC target = 1.0e5; maximum injection time = 150 ms; stepped collision energy = 22%, 30%, 38%.

Data processing

Byonic V3.10 (Protein Metrics, San Carlos, CA) was used to analyze the acquired MS and MS/MS spectra of intact glycopeptides. An advantage of this software was that it allowed manually adding M6P glycans in the glycan database for searching. Raw files (Data are available via ProteomeXchange with identifier PXD026450.) were searched against SARS-CoV-2 spike protein sequences downloaded from Uniprot with proper mutation added. Precursor ion mass tolerance of 10 ppm and fragment ion mass tolerance of 20 ppm were selected. Lysine (K), arginine (R), tyrosine (Y), tryptophan (W), phenylalanine (F), leucine (L), and methionine (M) were set as cleavage sites and up to 4 mis-cleavage sites were allowed. The phosphorylation of serine (S), threonine (T), and tyrosine (Y) and the oxidation of methionine (M) were set as variable modifications. Meanwhile, the carbamidomethylation of cysteine (C) was set as fixed modification. Common N-linked glycopeptide searching used a human N-glycome database containing 182 glycans and a human O-glycome database containing 70 glycans. In addition, M6P glycopeptides were searched by expanding the N-glycome database to include N-linked M6P glycans consisting of HexNAc (2–4)-Hex (3–9)-Phospho (1–2) modification. Peptide identifications were filtered at two-dimensional false discovery rate (2D FDR) <1%, PEP 2D <0.05, |Log Prob| > 1, and Byonic Score > 150. Manual inspection of MS/MS spectra of M6P glycopeptides was performed to examine if Byonic identification results contained phosphorylated hexose diagnostic ions. All searches were allowed for a maximum total of two common modifications. Glycoforms were categorized according to the following rules: (1) high mannose: glycan composition is “HexNAc(2)Hex(5–9)”; (2) paucimannose: glycan composition contains variations in “HexNAc(1–2)Hex(0–4)”; (3) mannose-6-phosphate glycosylation: glycan composition contains “phospho”; (4) sialylation: glycan composition contains “NeuAc”; (5) fucosylation: glycan composition contains “Fuc” but without “NeuAc”; (6) hybrid and complex: any glycan composition not categorized into the above five types.

Results and discussion

N-Glycopeptide profiling

We performed the site-specific glycoform analysis of SARS-CoV-2 S full-length protein with strategies including (1) double enzyme digestion with trypsin and chymotrypsin. Trypsin cleaves carboxy-terminal to lysine and arginine residues, while chymotrypsin prefers to cleave carboxy-terminal to large hydrophobic amino acid (tyrosine (Y), tryptophan (W), and phenylalanine (F)) and leucine (L) and methionine (M) at slower rates, which is complementary to trypsin [17]. The combination of trypsin and chymotrypsin digestion will make the site-specific mapping of N-linked glycopeptide bearing two sites more accurate [18,19,20]. For example, R.SSVLHSTQDLFLPFFSN61VTWFHAIHVSGTN74GTK.R and R.FPN331ITNLC[+57.02146]PFGEVFN343ATR.F are generated by trypsin-only digestion and they both have two potential N-glycosylation sites, which are more difficult to be precisely localized. After additional chymotrypsin digestion, both peptides can be cleaved between the two N-glycosylation sites, producing peptides that bear only one N-glycosylation site, which are F.FSN61VTWF.H and F.HAIHVSGTN74GTK.R, and R.FPN331ITNLC[+57.02146]PF.G and F.GEVFN343ATR.F, respectively. (2) Sample preparation through a novel dual-functional Ti-IMAC enrichment strategy. Samples were loaded on dual-functional Ti-IMAC materials in a HILIC-mode, where neutral glycopeptides were captured by Ti-IMAC material through hydrophilic interaction, and the negatively charged sialylated glycopeptides were captured through synergetic hydrophilic and electrostatic interactions. A two-step elution was performed in this case to release and separate neutral and negatively charged glycopeptides. Aqueous buffer in mild acidic condition will mainly elute neutral glycopeptides, while aqueous buffer in strong acidic condition with the ability to protonate sialic acid will elute the sialylated glycopeptides. This approach enables simultaneous enrichment and separation of neutral and sialyl glycopeptides (Fig. 1). As a result, the ion suppression effects from neutral glycopeptides in the detection of sialylated glycopeptides would be eliminated and the glycoform coverage of the S protein would be improved compared with conventional HILIC method [16, 21, 22]. (3) Stepped higher energy collisional dissociation (HCD) fragmentation for better fragmentation efficiency. We used high-resolution LC-MS/MS at both MS1 and MS2 levels with stepped HCD fragmentation to enhance the fragmentation efficiency for both peptide backbone and its attaching glycans, which further improve the confidence in glycopeptide identification [23, 24].

Fig. 1
figure 1

Scheme of Ti-IMAC and HILIC dual-mode affinity enrichment approach for the simultaneous enrichment and separation of neutral and sialyl glycopeptides

As a comparison, the conventional HILIC method enabled profiling of 18 N-linked glycosylation sites among the 22 potential N-linked glycosites, while our dual-functional Ti-IMAC approach enabled profiling of 19 N-linked glycosites. These two methods appear to have similar performance in glycosylation site profiling (Table 1). However, as each glycosylation site bears multiple glycans, the heterogeneity of glycopeptides is much more complex than other PTMs. As shown in Fig. 2a and b, we identified a total of 624 unique glycopeptides that were assigned to 398 unique glycoforms (a unique glycoform means a glycosite with a unique glycan composition) distributed on 19 glycosylation sites of the S protein by performing the dual-functional Ti-IMAC enrichment approach, suggesting that each N-glycosylation site on the S protein possesses an average of 21 unique glycoforms. The conventional HILIC method enabled profiling of 307 unique glycopeptides and 247 unique glycoforms, which was much fewer than the number obtained by the dual-functional Ti-IMAC enrichment approach (see Supplementary Information (ESM) Table S1).

Table 1 Overview of profiled N- and O- glycosylation sites of the SARS-CoV-2 S protein by two different methods
Fig. 2
figure 2

Comparison of profiled N-glycopeptides (a) and N-glycoforms (b) by HILIC and dual-functional Ti-IMAC enrichment (TiIMAC FA and TiIMAC TFA represent the weak)

As for the sialylated glycopeptides, we found that only 37 unique sialylated glycopeptides attributed to 32 unique sialylated glycoforms were identified in the weak acidic elution buffer, while 148 unique sialylated glycopeptides assigned to 99 unique sialylated glycoforms were profiled in strong acidic elution buffer (Fig. 2a, b). These data demonstrated that the sialylated glycopeptides were separated well from the neutral glycopeptides in this two-step elution procedures. The overlap between profiled N-glycopeptides and sialylated N-glycopeptides, N-glycoforms, and sialylated N-glycoforms by conventional HILIC method and dual-functional Ti-IMAC enrichment approach was also calculated. As shown in Fig. 3a, 87.0% N-glycopeptides identified by conventional HILIC method were found in the results obtained using the dual-functional Ti-IMAC enrichment approach, while 57.2% N-glycopeptides were uniquely found in the latter approach. The increased coverage of S protein glycosylation mapping demonstrated the advantages of this new method. The trend of overlaps between N-glycopeptides and sialylated N-glycopeptides, N-glycoforms, and sialylated N-glycoforms by HILIC and dual-functional Ti-IMAC enrichment approach appeared to be quite similar (Fig. 3b, c, d).

Fig. 3
figure 3

Overlap of profiled N-glycopeptides and sialylated N-glycopeptides (a, b); and N-glycoforms and sialylated N-glycoforms (c, d) by HILIC and dual-functional Ti-IMAC enrichment

In Fig. 4, a representative MS2 spectrum of an annotated N-linked glycopeptide generated by our double-enzyme digestion strategy with trypsin and chymotrypsin was shown. Nonetheless, there is still a limitation of this strategy for SARS-CoV-2 S protein glycosylation profiling. The large tryptic peptide R.AAASVASQSIIAYTMSLGAENSVAYSN709NSIAIPTN717FTISVTTEILPVSMTK.T can be trimmed to a shorter peptide Y.SN709NSIAIPTN717F.T by chymotrypsin digestion; chymotrypsin still lacks the ability to cleave between the two glycosylation sites on such a short peptide. Therefore, the site-specific localization of glycoform containing these two sites might not be accurate and the site-specific annotation provided by the software can only serve as a reference. In the future, accurate localization of glycosylation sites on this peptide may be achieved in two strategies: (1) Add this peptide in the MS2 inclusion list and use electron-transfer/higher-energy collision dissociation (EThcD) fragmentation to fragment the peptide backbone. Fragments with intact glycan on the glycosylation site will be generated, so that the glycoforms can be accurately annotated [25, 26]. (2) Digest protein samples with a third protease, alpha-lytic protease, which cleaves after threonine (T), serine (S), alanine (A), and valine (V) residues. With the complementary ability to cleave between N709 and N717, it may serve as a good supplementary protease to both trypsin and chymotrypsin [6].

Fig. 4
figure 4

Overview of profiled N- and O-linked glycosylation sites and the representative MS2 spectrum of an annotated N-linked glycopeptide located at the RBD region. The diagnostic ions of sialic acid (NeuAc) were labeled with a blue arrow

Mannose-6-phosphate glycopeptide profiling

It is worth mentioning that we also identified a mannose-6-phosphate (M6P) glycopeptide on S protein expressed in HEK293 cells (Fig. 5). The MS/MS spectrum clearly showed that it contains a phosphorylated hexose oxonium ion, which could serve as a unique diagnostic ion of M6P glycopeptides. M6P is initially added to high mannose glycans in the form of GlcNAc-M6P in cis-Golgi apparatus, and then GlcNAc-cap is removed in the trans-Golgi. It is mainly added to lysosomal enzymes and can be recognized by M6P receptor, which is used to transfer the enzymes to lysosomal. Recently, it has been shown that β-coronaviruses can use lysosomes for egress instead of the biosynthetic secretory pathway [27]. Given that SARS-CoV-2 is a β-coronavirus and we have found that the S protein of SARS-CoV-2 possesses M6P glycosylation, we hypothesize that M6P glycosylation may be involved in the pathway of transferring whole viruses from the trans-Golgi to late endosomes/lysosomes and then egressing through lysosomal pathway. Further investigation is needed to evaluate this hypothesis.

Fig. 5
figure 5

A representative MS2 spectrum of an annotated M6P glycopeptide. The diagnostic ion of M6P is indicated with a blue arrow

O-Glycopeptide profiling

In addition to N-linked glycosylation, we also found one O-linked glycosylation site at T323 with three glycoforms (Table 2). They were identified using the novel dual-functional Ti-IMAC enrichment approach, but failed to be detected using the conventional HILIC method. However, since the delta modification score of identified O-linked glycopeptides was only 2.8, it was possible that the other site, S325, on this O-linked glycopeptide could also be a potential O-linked glycosylation site (Fig. 6). It is worth noting that the identified O-linked glycosylation site is located at the receptor-binding domain (RBD, 319–541 residues) together with two other N-linked glycosylation sites, N331 and N343 (Fig. 4). Though the mechanism of N- or O-glycosylation at the RBD region has not been thoroughly investigated, they might be involved in stabilization of the interaction between the RBD and viral entry target ACE2 [4]. Another notable finding is that the three O-linked glycoforms were all sialylated, while only part of the N331 and N343 glycoforms were sialylated [8]. As a negatively charged monosaccharide, one or more sialic acids can be modified at the terminal of a glycan structure and serve as shields for epitopes or key residues on the RBD region of S protein and protect it from immune system attack [2, 28]. Experimental investigations demonstrated that blocking N- and O-glycan elaboration in whole S protein region will inhibit the SARS-CoV-2 viral entry [11]. Further experimental studies to investigate the effect of glycosylation on some specific regions like the RBD domain are needed to confirm if any of the glycosylation sites are used in SARS-CoV-2 infection [29].

Table 2 Summary of identified O-glycopeptide of the SARS-CoV-2 S protein by dual-functional Ti-IMAC approach
Fig. 6
figure 6

A representative MS2 spectrum of an annotated O-linked glycopeptide located at the RBD region. The diagnostic ion of sialic acid (NeuAc) is indicated with a blue arrow

Conclusions

In this work, we performed a comprehensive analysis of the N- and O-linked glycosylation of SARS-CoV-2 S glycoprotein. We demonstrated that dual-functionalized Ti-IMAC material enabled simultaneous enrichment and separation of neutral and sialyl glycopeptides. This feature helped eliminate signal suppression from neutral glycopeptides for the detection of sialylated glycopeptides and increase the overall glycoproteome coverage of S protein. We profiled 19 of its 22 potential N-glycosylated sites with 398 unique glycoforms using the dual-functional Ti-IMAC approach, which showed 1.6-fold improvement in glycoform coverage compared to the conventional HILIC method. We also identified O-linked glycosylation site that was only found using the dual-functional Ti-IMAC approach. In addition, we profiled mannose-6-phosphate (M6P) glycosylation, which further expands our current knowledge of the spike protein’s glycosylation and enables future investigation into the influence of mannose-6-phosphate on its cell entry. These results will prompt further research to understand the role of N- and O-linked glycosylation in viral infection throughout the SARS-CoV-2 process protein, especially in the region of RBD. Furthermore, the developed N- and O-glycosylation profiling approach will provide useful knowledge for elucidating the pathology of viral infection and offer possible strategies for future therapeutics as well as the development of immunogens in the design of suitable vaccines.