Introduction

The development of effective vaccines as well as the universal access for their massive introduction is urgently needed to control the COVID-19 pandemic [1]. Nowadays, there are several vaccine platforms being evaluated according to the draft landscape published by the World Health Organization (WHO), including inactivated and live attenuated virus; non-replicating and replicating viral vectors; DNA-, mRNA-, and virus-like particles; and protein subunit vaccines [2]. Some of them have already been approved by the WHO and regulatory authorities and introduced with favorable results in the clinic [3].

SARS-CoV-2 uses the receptor-binding domain (RBD) of the spike (S) protein for entry into the host cells [4, 5]. The RBD has been proposed for the rational development of protective vaccines against SARS-CoV-2 [6, 7] and nowadays, subunit vaccines are well-represented among the candidates investigated in preclinical studies and clinical trials [2]. For a successful introduction of vaccines, the immunogens need to be produced at scale and prices affordable for all, including middle- and low-income countries [1, 8].

Probably this is one of the reasons why RBD of SARS-CoV-2, besides its production in mammalian cells [9], has also been produced in several systems [10,11,12,13,14], including bacteria [15], despite the challenges of expressing a non-globular protein with four disulfide bonds and the requirement of the N-glycosylation for its proper expression and folding [11].

According to the test procedures and acceptance criteria for Biotechnological/Biological products (ICHQ6B guidelines [16]), mass spectrometry (MS) is the analytical tool of choice for the verification of the amino acid sequence, to demonstrate the integrity of the N- and C-terminal ends, and to detect post-translational modifications (PTMs) in natural and recombinant proteins. The PTMs may modify the physico-chemical and immunological properties of the proteins. In particular, a disulfide bond arrangement identical to the one present in the native protein is mandatory for biotherapeutics as well as for vaccine development in cases where the antigen should be well folded to raise conformational and topological neutralizing antibodies [3, 17].

Sample processing prior to MS analysis also plays a determinant role in the quality of the results. An efficient proteolytic digestion and the recovery of the proteolytic peptides are mandatory to obtain the highest sequence coverage and mapping all PTMs present in the analyzed molecule. In particular, if electrospray ionization mass spectrometry (ESI–MS) is used, a desalting step is needed to ionize properly the proteolytic peptides. This step, although necessary, often comprises the recovery of highly hydrophilic and hydrophobic peptides when micro-columns based on reverse phase chromatography are used.

Arbeitman et al. [11] analyzed by MALDI-MS the in-solution tryptic digests of two reduced and S-alkylated recombinant RBD of SARS-CoV-2. The tryptic peptides, desalted by C18-ZipTips prior to MALDI-MS analysis, allowed the unambiguous identification of RBD expressed in P. pastoris and HEK-293 T cells, but with a sequence coverage of only 40 and 60%, respectively.

The hydrophilic C-terminal peptide (LPETGHHHHHH) tagged with a repeat of six histidine residues (His6 tag) was only detected for the RBD expressed in P. pastoris, suggesting variable results in the desalting step. Other PTMs such as N-, and O-glycosylation were not detected in this study [11]. In this manuscript, the arrangement of disulfide bonds and the presence of free cysteine residues were not verified. Free cysteine residues, even present as low-abundance species, may promote disulfide exchange and generate scrambling variants of proteins [18].

In our laboratory, we initially demonstrated that proteins separated by SDS-PAGE can be efficiently in-gel desalted and digested in water with trypsin in absence of traditional saline buffers [19]. This procedure avoids a desalting step of the proteolytic peptides and allows their direct analysis by ESI–MS, and the sequence coverage [19] was higher than what is achieved by the traditional in-gel digestion protocol.

Recently, the principles of the in-gel buffer-free digestion protocol [19] were extended to in-solution buffer-free digestion (BFD) of other proteins [20]. In-solution BFD protocol improved the sequence coverage of certain regions of proteins represented by short and hydrophilic peptides including some N-glycopeptides, short peptides linked by disulfide bonds, and hydrophilic His6 tag C-terminal peptides [20].

In this work, we adapted the in-solution BFD protocol [20] to the analysis of the products of six SARS-CoV-2 RBD expression constructs from five different expression systems. The implemented in-solution BFD method avoids buffers and desalting is carried out by protein precipitation, allowing very high sequence coverage (≥ 99%) and the detection of PTMs including those located at the N- and the C-terminal end. The in-solution BFD protocol allowed the identification, in a single mass spectrum, of the four native disulfide bonds as well as scrambled disulfide bonds, the presence of free cysteine residues, N- and O-glycosylation, and other PTMs of known and unknown nature linked to an unpaired cysteine residue located at the C-terminal peptide in some of the analyzed RBD molecules. A non-peer-reviewed preprint version of this article was posted in bioRxiv [21].

Materials and methods

Cloning expression and purification of RBD variants

Six RBD recombinant proteins, produced at laboratory scale in a wide range of host cells, were used as model antigens to develop and refine suitable analytical methods for RBD characterization. Table 1 summarizes their sequences. A more detailed description of the procedures for cloning, expression, and purification of these proteins is provided in the Electronic Supplemental File (see Experimental Section in ESM).

Table 1 Sequences of the recombinant receptor-binding domain of SARS-CoV-2 characterized in this work

In-solution buffer-free digestion protocol

Fifty micrograms of the glycoproteins, dissolved in PBS (pH 7.4) containing 0.5 M guanidine hydrochloride, was reacted with 5 mM N-ethylmaleimide (NEM) for 30 min at room temperature (22 °C). Then, 1 μL of PNGase F (New England Biolabs) was added and the deglycosylation reaction was allowed to proceed for 2 h at 37 °C. In the case of N-glycosylated RBD(333–527)-C1, the protein was not deglycosylated but reduced with 10 mM dithiothreitol and 0.2 M Tris–HCl buffer pH 8.0 for 1 h at 37 °C, and then S-alkylated with 25 mM iodoacetamide under exclusion of light for 20 min at 22 °C. All samples were cooled at room temperature and proteins were precipitated with ten volumes of cold acetone (− 20 °C) or 80% ethanol (v/v) and the solution was kept at − 80 ± 5 °C for 1 h. The sample was centrifuged at 9000 × G during 5 min and the supernatant was discarded. The precipitate was washed by vortexing with 75% cold acetone or ethanol (− 20 °C), centrifuged at 10,000 rpm during 5 min and the supernatant was discarded. This procedure was repeated twice and the final precipitate was dried up in a vacuum centrifuge during 15 min. The precipitate was dissolved in 50 μL of 20% (v/v) acetonitrile in water solution with 1 min vortexing and 10 min sonication in a water bath. One microgram of sequencing grade trypsin (Promega) dissolved in water was added to the protein solution and the specific proteolytic digestion proceeded for 16 h at 37 °C in a thermomixer (Thermo Fisher Scientific). Digestion was centrifuged at 9000 × G for 1 min and 4 μL of the resultant mixture of tryptic peptides was mixed with 0.3 μL of 90% formic acid and it was loaded into a metal-coated borosilicate nanocapillar for MS analysis.

Standard digestion (SD) protocol

Fifty micrograms of the protein dissolved in PBS (pH 7.2) containing 0.5 M guanidine hydrochloride reacted with 5 mM NEM during 30 min at room temperature (22 °C). One microliter of PNGase F (New England Biolabs) was added and the deglycosylation reaction proceeded for 2 h at 37 °C. The sample was fourfold diluted and the protein digested in the presence of 0.2 M Tris–HCl buffer pH 8.0 and 1 μg of sequencing grade trypsin (Promega) previously dissolved in 20 mM acetic acid. Tryptic digestion proceeded for 16 h at 37 °C and digestion was stopped by adding formic acid to final concentration of 5% (v/v). The resulting peptides were desalted with ZipTip C18 (Millipore, USA), washed with 0.2% (v/v) formic acid solution, and eluted in 4 μL of 60% acetonitrile in water containing 0.2% formic acid (v/v).

Electrospray ionization mass spectrometry analysis

For measuring the molecular masses of the deglycosylated RBDs and the N-glycosylated RBD(333–527)-C1, 7 μg of the total protein was mixed with equal volume of 6 M guanidine hydrochloride solution and desalted by using ZipTip C18 (Millipore, USA). The proteins were extensively washed with 0.2% (v/v) formic acid solution and finally eluted in 3 μL of 60% acetonitrile in water containing 0.2% formic acid (v/v). The elution was loaded into the metal-coated nanocapillary for ESI–MS analysis.

The mixture of tryptic peptides contained in 4 μL of the 20% acetonitrile hydrolysis solution was acidified by adding 0.5 μL of formic acid (90% v/v) and directly analyzed in a hybrid orthogonal QTof-2 tandem mass spectrometer (Micromass, Manchester, UK) by spraying the sample into the ion source using 1200 and 35 V for the capillary and the entrance cone, respectively. The ESI–MS were acquired from m/z 200–2000 and the multiply-charged ions were manually fragmented by collision-induced dissociation using appropriated collision energies (20–50 eV) to obtain sufficient structural information in the MS/MS spectra. Argon was used as a collision gas and the mass spectra were processed by using MassLynx v4.1 (Micromass, UK). The ESI–MS/MS of tryptic peptides with z ≥ 3 + were deconvoluted by MaxEnt 3.0. The ESI–MS spectrum (m/z 400–3000) of the protein deglycosylated with PNGase F was deconvoluted (mass 5000–70,000) by using the MaxtEnt1.0 tool (Micromass, UK). The theoretical m/z for tryptic peptides as well as for the intact protein was calculated by using the peptide and protein editor available in the MassLynx v4.1 software (Micromass, UK).

SDS-PAGE analysis

RBD(319–541)-HEK_A3, (RBD(319–541)-CHO)2, and RBD(331–529)-Ec proteins were separated by SDS-PAGE as described by Laemmli [22], under reducing and non-reducing conditions. Two micrograms of N-glycosylated and deglycosylated proteins were applied in a 12.5%T, 3%C acrylamide-bisacrylamide separating gel at 30 mA/gel until the tracking dye left the gel. Proteins were detected by silver staining [23] or Coomassie Brilliant Blue G-250; gel images were analyzed with a GS-900 calibrated imaging densitometer (Bio-Rad) and processed with Image Lab v6.0 software (Bio-Rad).

NP-HPLC analysis

N-glycosylation profile was determined by using the procedure described by Guile et al. [24]. Briefly, the N-glycans released by PNGase F treatment were derivatized with 2-amino benzamide (2AB) by reductive amination. The chromatographic separation was carried out in an HPLC Prominence-Shimadzu (Japan) using a linear gradient from 20 to 53% of 50 mM, pH 4.4 ammonium formate (solution A), and pure acetronitrile (solution B). 2AB N-glycan separation was performed on an Amide-80 column (TSKgel 250 × 46 mm, 5 µm, Tosohaas, Japan) and the derivatized oligosaccharides were detected on-line by fluorescence using an excitation and detection wavelengths of 330 nm and 420 nm, respectively. The structural assignment was performed by comparing the experimental GU values with the GlycoStore database (https://glycostore.org/). GU values were calculated from the retention time of each peak using as a reference an HPLC separation ran under similar conditions for the 2AB derivatives of a dextran ladder generated by acid partial hydrolysis. Glycans structures were represented according to GlycoStore nomenclature.

Results and discussion

Comparison between the standard digestion and the in-solution buffer-free digestion protocols

Both the SD (Fig. 1a) and the in-solution BFD [20] (Fig. 1b) protocols start with the S-alkylation of free cysteine residues by adding an excess of N-ethylmaleimide (NEM) or iodoacetamide (IAA). This step blocks the free thiol groups that can be present either because the RBD contains an odd number of cysteine residues or these groups were not quantitatively linked and thus remain partially free by a non-correct folding. At the same time, the alkylating agent added at the beginning of the protocol avoids artifacts due to the disulfide bond exchange during the subsequent steps [18]. This could be more critical in the conventional protocol using a basic pH during tryptic digestion [25, 26]. The use of a slightly acidic pH for trypsin digestion (pH 5.5–6.0) with BFD minimizes artificial modifications introduced during sample preparation such as scrambling due to the presence of free Cys in the analyzed protein. The S-alkylating agent introduces an artificial mass tag that facilitates the assignment when any Cys is partially free and differentiates them from species modified with natural thiol-blocking groups due to alkylating or thiol-containing species present in the culture media [27].

Fig. 1
figure 1

A comparison between the in-solution standard digestion (a) and buffer-free digestion [20] (b) protocols for the ESI–MS analysis of the tryptic digests. Black rectangles at the left and right sides of the figure indicate the time required for the individual steps in each protocol. Square boxes at the bottom-left and bottom-right in the figure indicate the total time consumed for each protocol. NEM and IAA mean N-ethylmaleimide and iodoacetamide, respectively

As a second step, both protocols comprise the deglycosylation with PNGase F of the recombinant RBDs and convert the fully glycosylated asparagines (Asn331/Asn343) into aspartic acids. This step also facilitates the detection and sequencing of two peptides (Phe329-Arg346) and (Ile358-Lys378) linked by an intermolecular disulfide bond between Cys336 and Cys361. For the particular cases of RBD(333–527)-C1 and RBD(331–530)-cmyc-Pp (Table 1), the peptide with the disulfide bond Cys336-Cys361 at the same time contains the N-terminal end of the protein. The identification of the disulfide bonds and the N-terminal sequencing of the protein are aspects inquired by regulatory agencies to develop well-characterized products according to the ICHQ6B guidelines [16].

For the in-solution SD protocol (Fig. 1a), the pH of the solution is adjusted at basic pH and the deglycosylated RBD is digested with trypsin during 16 h due to our interest to guarantee a complete digestion. Also note that even after disulfide reduction, this protein has been digested overnight by other authors [11, 28]. Finally, the digestion is quenched by adding formic acid and the resultant tryptic peptides are desalted by using C18-ZipTips and eluted in a solution compatible with ESI–MS analysis.

For the in-solution BFD protocol (Fig. 1b), a desalting step is achieved at the protein level by conventional precipitation protocols using either cold acetone [29] or ethanol [30]. Here, washing steps are included to minimize inorganic ions that may provoke adduct signals in the mass spectra. Protein resuspension is guaranteed by vigorous vortex and ultrasonic bath in 20% acetonitrile, before adding trypsin previously dissolved in water. There is no appreciable difference in the two workflows (Fig. 1a and 1b) with respect to the processing time before MS analysis.

Characterization of RBD(319–541)-HEK_A3 and RBD(319–541)-HEK proteins.

RBD(319–541)-HEK_A3 (Table 1) expressed in HEK-293 T mammalian cell line has four disulfide bonds and a free cysteine residue (Cys538) located towards the C-terminal region of the protein. The high reactivity of Cys538 can be used for site-directed chemical conjugation to highly immunogenic carrier proteins such as tetanus toxoid [31].

RBD(319–541)-HEK_A3 was analyzed by SDS-PAGE in non-reducing conditions (Fig. 2a, lane 2) showing an intense and diffuse band at 33.3 kDa corresponding to the monomer with the heterogeneity of N-glycosylation. Also, a band detected at 59.7 kDa representing approximately ~ 13% was assigned to the dimer. After treatment with PNGase F and analyzed under non-reducing conditions, these bands migrated at 29.3 and 43.9 kDa (Fig. 2a, lane 3) confirming that RBD(319–541)-HEK_A3 is N-glycosylated. The presence of O-glycosylation was not excluded because PNGase F does not hydrolyze O-glycans covalently linked to serine or threonine. When the same samples were analyzed by SDS-PAGE under reducing conditions, only protein bands corresponding to the glycosylated monomer (Fig. 2a, lane 5) and the deglycosylated monomer (Fig. 2a, lane 6) were detected. No evidence of the dimer was observed suggesting that dimerization of the molecule was mediated by disulfide bonds and was not due to an aggregation artifact.

Fig. 2
figure 2

a SDS-PAGE analysis under reducing and non-reducing conditions of N-glycosylated and deglycosylated RBD(319–541)-HEK_A3 and detected with silver staining. Lane 1: Molecular weight markers of low-range from 31 to 97 kDa (Bio-Rad). Lanes 2–3: N-glycosylated and deglycosylated protein in non-reducing conditions detecting the monomer and a low-abundance (13%) dimer species of RBD(319–541)-HEK_A3. Lane 4: Control of PNGase F used in the N-deglycosylation. Lanes 5–6: N-glycosylated and deglycosylated protein under reducing conditions. b ESI–MS analysis of the RBD(319–541)-HEK_A3 deglycosylated with PNGase F. c Resultant ESI–MS spectrum after deconvolution with MaxEnt v 1.0 software. The inset shown in (c) corresponds to the expanded ESI–MS spectrum in the range shown by a broken line rectangle. The masses between parentheses indicate the expected molecular masses of the detected species. A detailed assignment of this ESI–MS spectrum is shown in Table 2. The ESI–MS spectra shown in (d) and (e) correspond to the ESI–MS analysis of the resultant tryptic peptides of RBD(319–541)-HEK_A3 digested with trypsin following the SD and in-solution BFD (with ethanol precipitation) protocols shown in Fig. 1(a) and (b). Asterisks in (d) correspond to background signals, not assigned to tryptic peptides. The inset shown in (e) corresponds to an expanded region where the O-glycosylated N-terminal end peptide (Val320-Arg328 + [HexNAc:Hex:NeuAc2])2+ and two disulfide bonded peptides (assigned as S-S391-5254+) were detected. Monosaccharide symbols follow the SNFG system [60] and the O-glycan structures as previously reported [33]. The upper and lower mass spectra shown in (f), (g), and (h) correspond to expanded regions of the ESI–MS spectra shown in (d) and (e), respectively. A detailed assignment for all tryptic peptides in this figure is summarized in Table 3

To confirm the integrity, the N-deglycosylated protein was analyzed by ESI–MS (Fig. 2b) and it showed intense multiply-charged ions of the protein. The deconvoluted ESI–MS spectrum (Fig. 2c) shows the most intense signal with molecular mass of 27,195.08 Da that agreed with the expected mass (27,195.46 Da) considering the N-deglycosylated monomer of RBD(319–541)-HEK_A3, cysteinylated and O-glycosylated with HexNAc:Hex:NeuAc2 (Table 2). Other groups that also expressed RBD molecules in HEK-293 with an odd number of cysteine residues reported cysteinylation [28, 31]. O-glycosylation has been reported for the native RBD of SARS-Cov-2 [32, 33] as well as for several RBD versions expressed in mammalian cells [28, 31].

Table 2 Summary of the ESI–MS analysis for the SD and the in-solution BFD protocols and sequence coverage of RBD proteins characterized in this work

Also, other signals observed in Fig. 2c (see inset) and summarized in Table 2 suggest the presence of other modified species of the N-deglycosylated RBD(319–541)-HEK_A3. Separately, the N-deglycosylated protein was digested in-solution with trypsin by using the SD (Fig. 1a) and BFD (Fig. 1b) protocols and the resultant ESI–MS spectra are shown in Fig. 2d and 2e, respectively. The sequence assignments based on the agreement between the expected and experimental m/z of tryptic peptides are summarized in Table 3. The four disulfide bonds present in the native RBD of S protein of SARS-CoV-2 were identified by both protocols (Fig. 2d and 2e) and confirmed by MS/MS analysis (Fig. S1a–S1d).

Table.3 Summary of the 100% sequence coverage assignment by ESI–MS of the tryptic digestion using the in-solution buffer-free (BFD) and 82% by the standard digestion (SD) protocol of RBD319-541-HEK_A3 expressed in HEK293T

In the SD protocol, only the N-terminal peptide R319-R328 containing HexNAc:Hex:NeuAc2 was detected (m/zExp 1066.52 and m/zExp 711.36; Fig. 2d, Table 3), presumably linked to either at Thr323 or Ser325 according to previous reports [28, 33]. On the contrary, by in-solution BFD protocol, two peptides (R319-R328 and V320-R328) linked to HexNAc; HexNAc:Hex; HexNAc:Hex:NeuAc; HexNAc:Hex:NeuAc2; HexNAc2:Hex2:NeuAc and HexNAc2:Hex2:NeuAc2 were detected (Fig. 2e, Table 3). Five out of the six O-glycans structures were detected exclusively by the in-solution BFD protocol and these six O-glycans structures agree very well with the previous reports of O-glycosylation of Thr323/Ser325 in the SARS-CoV-2 spike protein [32, 33]. MS/MS spectra of these O-glycopeptides confirmed this assignment (Fig. S2) by showing intense neutral losses of O-glycans from the precursor ions fragmented by CID according to previous reports [34].

Full-sequence coverage of RBD(319–541)-HEK_A3 was verified by using in-solution BFD protocol, while using the SD protocol 82% of sequence coverage was achieved (Table 2).

Several signals in the low-mass region (m/z 200–700) were exclusively detected when RBD(319–541)-HEK_A3 was analyzed by the BFD protocol and they were assigned to short and hydrophilic internal peptides (356KR357, 536NK537, 455LFR457, 404GDEVR408, 409QIAPGQTGK417, 418IADYNYK424, 529KSTNLVK535, and 530STNLVK535; Table 3). These peptides represent the 18% of the RBD(319–541)-HEK_A3 sequence. Most of them (356KR357, 455LFR457, 409QIAPGQTGK417, 418IADYNYK424, 529KSTNLVK535, and 530STNLVK535) were not detected by Arbeitman et al. [11] when the same RBD protein expressed in P. pastoris and HEK-293 T cell line was digested with a protocol similar to the in-solution SD protocol and analyzed by MALDI-MS.

The C-terminal peptide with the C538 alkylated with NEM (538CVNF541-AAAHHHHHH, m/zExp 548.24, 3 + ; Fig. 2g and Table 3) was detected by both protocols (Fig. 1ab). It confirmed that a fraction of this RBD contains an unpaired free C538 residue. However, the low intensity of the signal assigned to the C-terminal peptide with a C538 alkylated with NEM (m/zExp 548.27, 3 + ; Fig. 2g and Table 3) when BFD protocol was applied suggested us that Cys538 should be modified with other chemical groups.

Cyanylation (m/zExp 541.91, 3 + ; (C538 + CN)3+; Fig. 2f), cysteinylation (m/zExp 546.25, 3 + ; (C538 + Cys)3+; Fig. 2g), and glutathionylation (m/zExp 608.26, 3 + ; (C538 + ECG)3+; Fig. 2h) of the unpaired Cys538 in the C-terminal peptide of RBD(319–541)-HEK_A3 were detected exclusively when using the BFD protocol. The assignment of these modified peptides was confirmed by MS/MS analysis (Fig. 3ac). Signals detected at m/zExp 565.26, 3 + and m/zExp 551.24, 3 + were also only observed when RBD(319–541)-HEK_A3 was analyzed by BFD (Table 3). MS/MS analyses demonstrated that they corresponded to the same C-terminal peptide (C538 + CG)3+ with the C538 linked to a truncated variant of glutathion (+ 176 Da, + CG; Fig. S3) and homocysteine (Fig. 3d), respectively.

Fig. 3
figure 3

modified by a cyanylation, b glutathionylation, c cysteinylation, and d homocysteinylation

ESI–MS/MS spectra of C-terminal peptides (538CVNF541-AAAHHHHHH) of RBD(319–541)-HEK_A3 containing C538

Signals detected at m/zExp 517.24, 3 + (Fig. 2f) and m/zExp 541.91, 3 + (Fig. 2g) were assigned as (C538 + 32 Da)3+ and (C538 + 106 Da)3+, corresponding to the C-terminal peptide with C538 linked to modifying groups of unknown chemical nature. These signals were only detected when in-solution BFD was applied to the characterization of RBD(319–541)-HEK_A3. We also found thirteen other different variants of the C-terminal peptide (confirmed by MS/MS; see Fig. S3) that were not assigned to a known chemical structure of Cys538 (see Table 3).

The alkylation with NEM, inserted in our protocols (Fig. 1a, b), transformed the hydrophilic C-terminal peptide (containing the unpaired C538) in a more hydrophobic species and in consequence, it was detected even using the SD protocol. On the contrary, the remaining Cys‐capping modifications [27] mentioned above (see Fig. S3 and summarized in Table 3) did not increase the hydrophobicity of the C-terminal peptide sufficiently to be retained by ZipTip-C18 and they were detected exclusively when in-solution BFD was applied.

In contrast to the hypothesis proposing that oxidoreductase‐mediated protein disulfide bonding with free cysteine or glutathione in the lumen of endoplasmic reticulum [35,36,37] as the source of these modifications, Zhong et al. have demonstrated that these caping modifications are generated outside mammalian cells and are sensitive to the culture medium composition [27].

Cysteinylation at Cys538 has been reported by other authors [28, 31], but to our knowledge, the other Cys-modifying groups (Table 3) have not previously been reported for recombinant RBDs. The species with Cys538 modifications and O-glycoforms detected at protein level (Table 2) were further confirmed at tryptic peptide level by the in-solution BFD (Table 3).

The use of culture media with defined composition and a well-characterized downstream process would avoid unexpected modifications of free cysteine residues [38,39,40], although endogeneous cell metabolites may also contribute to increase protein heterogeneity at unpaired Cys.

Although Cys‐capping modifications protect the molecule from aggregation and scrambling mediated by inter- and intra-molecular disulfide bonds, respectively, it needs to be addressed if the final outcome is to use the unpaired Cys for further modification, for example, in a drug conjugation process [41, 42]. Another issue also to be addressed is the potential protein heterogeneity if the final intention is the use of the dimer molecule through disulfide bonds [35, 36, 43, 44].

A low-intensity signal at m/zExp 607.28, 5 + and assigned to (S-S5+538–538) in Fig. 2h was exclusively detected when in-solution BFD protocol was applied. It suggests that a fraction of this molecule (~ 13% estimated by SDS-PAGE; Fig. 2a) is a dimer mediated by an intermolecular disulfide bond between two Cys538 residues (Fig. 2a, lane 2 and lane 3). MS/MS of this signal confirmed this assignment (Fig. 4). This result matches with SDS-PAGE of RBD(319–541)-HEK_A3 ran under reducing and non-reducing conditions (Fig. 2a).

Fig. 4
figure 4

MS/MS spectrum of two copies of the C-terminal peptide (538CVNF541-AAAHHHHHH) of RBD(319–541)-HEK_A3 linked by an intermolecular disulfide bond between two Cys538. The nomenclature of fragment ions is in agreement with that proposed by Mormann et al. [61]

The presence of two low-abundance scrambling variants (C538-C379, C538-C432) and the homodimer (C538-C538) of this molecule agrees with the presence of a free Cys538 detected in this preparation (Table 3). These two scrambled species were exclusively detected by using the in-solution BFD protocol. Also, a low-abundance population of the protein with free C336, C391, C432, and C538 was detected by both protocols. All the above-mentioned assignments of scrambled and free Cys variants were confirmed by the MS/MS spectra (Figs. S4 and S5). The presence of an unpaired Cys residue may also promote disulfide exchange [18] and in consequence generates low-abundance scrambling variants of the desired molecule.

Our results indicate that Cys reduction and S-alkylation of the RBD protein before MS analysis are not convenient as important information is lost. The most striking results obtained with the BFD protocol are the detection of the disulfide-containing peptides (including low-abundance scrambled variants) and the finding of several modifications linked to free cysteines that probably most of them would be missed if reduction of disulfides takes place during sample preparation.

The analysis of the same gene construct (RBD(319–541)-HEK) for the expression in HEK-293 T of the same protein without the C-terminal spacer arm of three alanines (Table 1) by in-solution SD and BFD protocol (Fig. S6, Tables 2 and S1) yields similar results to that described here for RBD(319–541)-HEK_A3, at protein and peptide level (Fig. 2, Tables 2 and 3). Full-sequence coverage was achieved in the analysis of RBD(319–541)-HEK by using in solution BFD protocol while using the SD protocol 85% was achieved (Table 2). C-terminal peptide containing C538 modified with NEM was detected in both protocols (Fig. S6e, m/zExp = 477.22, 3 +). However, the same C-terminal peptide containing the His6 tag and other PTMs assigned to (C538 + 106 Da)3+ (Fig. S6e, m/zExp = 470.85, 3 +), cysteinylation (Fig. S6e, m/zExp = 475.18, 3 +), truncated glutathionylation (see in Fig. S6f, (C538 + CG)3+, m/zExp = 494.18, 3 +), and glutathionylation (see in Fig. S6g, (C538 + ECG)3+, m/Expz = 537.20, 3 +), among other PTMs previously described for RBD(319–541)-HEK_A3 were only detected by using BFD protocol (Table S1).

In the characterization of these RBDs, short 2–9 amino acids long tryptic peptides can be detected by our in-solution BFD method; however, they are useful only to verify the sequence of already known proteins. When characterizing unknown protein species, it is preferable to resort to Lys-C and chymotrypsin, which are compatible with in-solution BFD conditions and can provide information on overlapping sequence stretches. Direct proteolysis of the RBD with Glu-C did not yield an efficient digestion with our in-solution BFD protocol except when used in tandem after Lys-C (Table S2). Shorter trypsin digestion times (15 min–4 h) of RBD(319–541)-HEK_A3 did not yield larger peptides containing missed cleavage sites (see Fig. S7), and it provided the same information as overnight digestion, although measurement time had to be increased considerably to obtain ESI–MS spectra with a similar S/N ratio. Increasing the acetonitrile content in the spraying solution up to 50–60% favored the detection in the ESI–MS spectrum of three large and hydrophobic disulfide-bonded peptides containing one to three missed cleavage sites and some of the short 2–9 amino acids long tryptic peptides previously mentioned (see Fig. S8 and Table S3).

Ammonium bicarbonate is probably the most frequently used buffer for trypsin digestion of proteins. The removal of this salt by successive evaporation/dilution steps enables the direct analysis of the sample by ESI–MS without a desalting reverse-phase chromatography step and the consequent loss of valuable hydrophilic peptides. However, ammonium bicarbonate digestions do not yield ESI–MS spectra with high S/N ratio typical of the in-solution BFD protocol. It could hinder the detection of those low-abundance peptides carrying PTMs such as the detected here by applying the in-solution BFD protocol.

While the in-solution BFD protocol (Fig. 1b) was implemented with a considerable amount of recombinant RBD (50 μg, 1.5 nmol), it must be pointed out that ESI–MS analysis requires 1–3 μL out of a 100 μL sample volume. Processing lower amounts of the starting material is also possible if a more efficient protocol for protein precipitation is used (for instance with acetone at room temperature and in the presence of sodium chloride [29, 45]), and in fact, we obtained results similar to those depicted in Fig. S9 from starting amounts of 5 μg (see Experimental section in ESM). Using even lower starting amount of sample is challenging, due to the difficulties in handling small protein pellets and the risk of sample loss during the two subsequent washing steps.

Characterization of (RBD (319–541) -CHO) 2

The RBD dimer (RBD(319–541)-CHO)2resulting from an intermolecular disulfide bond Cys538-Cys538 was originally obtained as a by-product during the attempt to obtain RBD(319–541)-CHO. The increased immunogenicity of RBD-dimer promoted its use in at least two vaccines currently in clinical trials [7, 46].

In the (RBD(319–541)-CHO)2 protein non-treated (lane 2, Fig. 5a) and treated (lane 3, Fig. 5a) with PNGase F and analyzed by SDS-PAGE under reducing conditions, only the presence of a glycosylated and deglycosylated monomer, respectively, was observed. When the same samples were analyzed by non-reducing conditions, the glycosylated (lane 4, Fig. 5a) and the deglycosylated (lane 5, Fig. 5a) dimers were observed. This result confirmed the covalent dimer (RBD(319–541)-CHO)2 and its N-glycosylated nature.

Fig. 5
figure 5

a SDS-PAGE analysis under reducing and non-reducing conditions of N-glycosylated and deglycosylated (RBD(319–541)-CHO)2 and detected with silver staining. Lane 1: Molecular weight markers of low-range from 31 to 97 kDa (Bio-Rad). Lanes 2–3: N-glycosylated and deglycosylated protein under reducing conditions detecting the reduced monomer. Lanes 4–5: N-glycosylated and deglycosylated protein in non-reducing conditions detecting the dimer species [(RBD(319–541)-CHO)2]. b ESI–MS spectrum of a dimeric RBD deglycosylated with PNGase F. c Deconvolution of the ESI–MS spectrum shown in (b) reveals the presence of the three major O-glycoforms of (RBD(319–541)-CHO)2. Between parentheses the expected molecular masses of the different O-glycoforms are shown. (RBD)2 represents an abbreviated form for referring to the (RBD(319–541)-CHO)2 molecule. Monosaccharide symbols follow the SNFG system [60] and the O-glycan structures are as previously reported [33]. The ESI–MS spectra shown in (d) and (e) correspond to the (RBD(319–541)-CHO)2 digested with trypsin following the SD and in-solution BFD (precipitated with acetone) protocol, respectively. Asterisks in (d) correspond to background signals, not assigned to tryptic peptides and (S–S)n+ to peptides containing a disulfide bond between the described cysteines. The insets shown in (d) and (e) correspond to the expanded regions of the mass spectra (m/z 981.5–995.5) shown by rectangles with broken lines showing the O-glycosylated peptides and two disulfide bond peptides (assigned as S-S391-5254+ and S-S379-4324+). The upper- and lower-mass spectra shown in (f) and (g) correspond to two expanded regions (m/z 520.4–524.1 and m/z 650.5–655.2) of the ESI–MS spectra shown in (d) and (e), respectively. A detailed assignment for all tryptic peptides in this figure is summarized in Table S4

The ESI–MS spectrum (Fig. 5b) of the PNGase F deglycosylated dimer after the deconvolution (Fig. 5c) showed three major signals corresponding to the three combinations of two short O-glycan chains linked to the dimer as indicated in Fig. 5c [32]. The assignment of these O-glycoforms is summarized in Table 2.

The N-deglycosylated protein was digested with trypsin by using the in-solution SD and BFD protocols and the resultant ESI–MS spectra are shown in Fig. 5d and e, respectively. Full-sequence coverage was achieved for the in-solution BFD protocol while using the SD protocol, only 80.6% of the sequence was verified (Table 2 and Table S4).

The four disulfide bonds present in the native RBD of SARS-CoV-2 were detected by applying both protocols (Fig. 5d and 5e). O-glycosylated N-terminal peptides (R319-R328 and V320-R328) with O-glycosylation sites located at Thr323/Ser325 residues [32] were detected with appreciable intensities (m/zExp 711.34, 3 + ; 842.90, 2 + and 988.44, 2 + in Fig. 5d and e). The mass shift provoked by these O-glycans observed for the N-deglycosylated protein (Fig. 5c) agreed with the one observed at the peptide level (Fig. 5d and e). Additionally, two low-intensity signals at m/zExp 616.33, 2 + and 697.36, 2 + assigned to peptide V320-R328 linked to HexNAc and HexNAc:Hex were detected only by in-solution BFD protocol (Table S4).

The most striking differences between both ESI–MS spectra (Fig. 5d and e) were observed in the low-mass region where short and hydrophilic peptides [L455-R457 (m/zExp 218.14, 2 +), G404-R408 (m/zExp 288.14, 2 +), K356-R357 (m/zExp 303.21, 1 +), and S530-K535 (m/zExp 331.19, 2 +)] were only detected by applying the in-solution BFD protocol.

The ESI–MS signals that confirm the dimer nature of (RBD(319–541)-CHO)2 are corresponding to the peptide [C538-H547]-S–S-[C538-H547] containing Cys538 and Cys538 linked by intermolecular disulfide bond (m/zExp 522.02, 5 + in Fig. 5f and m/zExp 652.29, 4 + in Fig. 5g). These signals that also enabled the verification of the C-terminal end of this molecule were exclusively detected by applying the in-solution BFD protocol. Probably, the presence of two His6 tags (in total twelve histidine residues) in the structure of [C538-H547]-S–S-[C538-H547] makes its retention difficult by the C18-ZipTip during the desalting step. The verification of the C-terminal end of proteins is a very important aspect included in the ICHQ6B guidelines [16].

Characterization of RBD (331–529) -Ec

The non-correctly folded RBD is not useful for a vaccine against SARS-CoV-2 because a tridimensional structure identical to the native protein is required to generate neutralizing antibodies recognizing conformational epitopes [17]. For this reason, the detection of non-native disulfide bonds, if present, is of tremendous importance [16].

SDS-PAGE analysis under reducing conditions (Fig. 6a, lane 2) of RBD(331–529)-Ec shows a band that migrates with an estimated molecular mass of 27.3 kDa. The good agreement between the expected (25,117.14 Da) and the experimental (25,117.44 Da) molecular masses for the reduced and S-alkylated protein determined by ESI–MS analysis confirmed this result (Fig. 6b and Table 2). However, when RBD(331–529)-Ec was analyzed by SDS-PAGE under non-reducing conditions (Fig. 6a, lane 3), aggregates with molecular masses higher than expected were observed. Probably, these aggregates are formed by multiple and random intermolecular disulfide bonds.

Fig. 6
figure 6

a SDS-PAGE analysis of the recombinant RBD(331–529)-Ec analyzed under reducing (Lane 2) and non-reducing (Lane 3) conditions and detected with Coomassie staining. Lane 1 corresponds to the molecular weight markers of low-range from 14 to 97 kDa (Bio-Rad). b Deconvoluted ESI–MS spectrum of the reduced and S-carbamidomethylated protein. The expected molecular mass is indicated in parentheses. c ESI–MS analysis of the recombinant protein expressed in E. coli and digested with trypsin by using in-solution BFD protocol. Signals assigned as (S–S)n+ correspond to the peptides containing disulfide bonds between the cysteines that are described. The signals labeled with (Nt-His6)n+ correspond to the N-terminal peptide containing a His6 tag in its amino acid sequence. A detailed assignment for all tryptic peptides in this figure is summarized in Table S5

ESI–MS analysis of RBD(331–529)-Ec digested with trypsin by using the in-solution BFD protocol showed several multiply-charged ion signals assigned to peptides containing Cys corresponding to the four native disulfide bonds (signals written in red and assigned as S–S#-#n+; Fig. 6c). The good agreement between the expected and experimental molecular masses of other signals written in black and assigned as S–S#-#n+ (Fig. 6c) were assigned to tryptic peptides containing scrambled disulfide bonds in RBD(331–529)-Ec (Table S5). The MS/MS spectra that confirmed these assignments are shown in Fig. S10.

The results shown here demonstrated that in-solution BFD protocol [20] in combination with ESI–MS analysis of RBD enabled in a single mass spectrum the detection of the four native disulfide bonds, the scrambled variants, and free cysteine residues that might be responsible for promoting disulfide exchange and protein aggregation [18]. Ninety-nine percent of sequence coverage for RBD(331–529)-Ec was achieved when used the in-solution BFD protocol.

Characterization of RBD (333–527) -C1

Thermothelomyces heterothallica was engineered to develop an industrialized protein production host expression system with high yields (> 10 g/L) and a very significant reduction of the protease load thus minimizing unwanted degradation during fermentation [13]. Unlike other proteins characterized in this work, RBD(333–527)-C1 has only one N-glycosylation site located at Asn343.

NP-HPLC profile showed the structural assignment based on the GU indexes for the individual N-glycans released with PNGase F and labeled with 2AB (Fig. 7a). Deconvoluted ESI–MS spectrum of the intact RBD(333–527)-C1 confirmed the presence of several non-fucosylated glycoforms being M4, M5A1, and M4A1 the predominant ones (Fig. 7b). The experimental and expected molecular masses agreed very well (Fig. 7b, Table 2).

Fig. 7
figure 7

a NP-HPLC profile (upper chromatogram) of the 2AB-N-glycans released by PNGase F treatment of the recombinant RBD(333–527)-C1 and corresponding dextran ladder (lower chromatogram) used to calculate the GU indexes for all 2AB-N-glycans and to perform for the structural assignment. The asterisks correspond to non-assigned glycoforms. The numbers above peaks in the dextran ladder indicate the corresponding glucose units. The nomenclature used in the structural assignment of the 2-AB N-glycans agrees with the ones proposed by the SNFG system [60]. The deconvoluted ESI–MS spectrum shown in (b) corresponds to the intact protein with potential N-glycosylation site located at the Asn343 occupied to several glycoforms. A magnification of 10 × is shown in the low molecular mass region of (b). The ESI–MS spectrum shown in (c) corresponds to the RBD(333–527)-C1 treated with PNGase F and digested following the in-solution BFD protocol shown in Fig. 1b. The ESI–MS spectrum shown in (d) corresponds to the reduced and S-alkylated glycosylated RBD(333–527)-C1. Signals assigned as (C# + cam)n+ correspond to tryptic peptides containing carbamidomethyl cysteine residues at position #. The inset shown in (d) corresponds to an expanded region (m/z 1237–1662) showing the presence of several signals assigned to the N-terminal end glycopeptides (T333-R346) with several N-glycans linked to the glycosylated Asn343. Signal assigned as (C480/488 + cam)3+ corresponds to the peptide D467-R509 containing the Cys480 and Cys488 S-alkylated with iodoacetamide. A detailed assignment for all tryptic peptides in this figure is summarized in Table S6

The ESI–MS spectrum of this protein (Fig. S11a and S11b) after treatment with PNGase F showed an intense signal with a mass of 22,590.33 Da (Table 2). This result agrees very well with the expected (22,590.26 Da) assuming the RBD(333–527)-C1 N-deglycosylated monomer with four disulfide bonds.

The N-deglycosylated protein digested with trypsin by in-solution BFD protocol (Fig. 7c) and analyzed by the ESI–MS allowed a full-sequence coverage (Table 2) and allowed the identification of the four native disulfide bonds (S-S379-432, S-S336-361, S-S480-488, and S-S391-525; Table S6). Very low-abundance signals (Fig. 7c) were detected at m/zExp 847.87, 2 + and 1167.52, 2 + and assigned to the peptides T333-R346 and L425-K444 containing the Cys336 and Cys432 modified with NEM (Fig. S12a and S12b). It indicates that a minor fraction of RBD(333–527)-C1 contains Cys336 and Cys432 with free thiols in the original molecule. In addition, the same Cys336 and Cys432 were also detected in three low-intensity signals detected at m/zExp 889.72, 4 + ; m/zExp 944.43, 4 + ; and m/zExp 1131.26, 4 + (see Table S6) that were assigned to (T333-R346)-S–S-(L387-R403), (T333-R346)-S–S-(L425-K444), and (I358-K378)-S–S-(L425-K444) linked by the scrambled disulfide bonds between Cys336-Cys391, Cys336-Cys432, and Cys361-Cys432, respectively (Fig. S12cS12e). Scrambled Cys361-Cys525 was also detected and the MS/MS spectrum supporting this assignment was identical to the shown in Fig. S10h. The presence of free cysteine in the molecule probably is responsible for the generation of these two low-abundance scrambling variants according to the proposed mechanisms [18].

The size heterogeneity of N-glycans linked to Asn343 in RBD(333–527)-C1 was not revealed by ESI–MS analysis of the tryptic digestion (Fig. 7c) due to the removal of N-glycans by a PNGase F treatment. A variant of the in-solution BFD protocol without the PNGase F treatment did not provide this information because the N-terminal peptide (T333-R346) of the RBD(333–527)-C1 containing the glycosylated Asn343 is linked to the peptide (I358-K378) by a disulfide bond (Cys336-Cys361).

Probably, the microheterogeneity of N-glycosylation gives rise to low-abundance N-glycopeptides that combined with their high molecular masses (over 4 kDa) have an ionization suppressed by the presence of shorter tryptic peptides in the sample. The combination of all these aspects made it difficult for the ESI–MS analysis of these N-glycopeptides.

However, when the N-glycosylated RBD(333–527)-C1 was reduced and S-alkylated with iodoacetamide and digested using the in-solution BFD, all cysteine-containing peptides were detected (Fig. 7d, Fig. S13ad, Table S6) including the N-terminal peptide T333-R346 containing Cys336 and several glycoforms as shown in the inset of Fig. 7d. MS/MS spectra supporting these assignments are shown in Fig. S13ef.

Characterization of RBD (331–530) -Cmyc-Pp

RBD of SARS-CoV-2 was also expressed in P. pastoris with a His6 tag and the Cmyc tag fused at the C-terminal end (RBD(331–530)-Cmyc-Pp; see Table 1) to be used for analytical purposes. The ESI–MS spectrum of RBD(331–530)-Cmyc-Pp deglycosylated with PNGase F (Fig. 8a) after deconvolution (Fig. 8b) yields an intense signal with a molecular mass of 25,835.29 Da that is 400.88 Da higher than expected (25,434.41 Da; Table 2).

Fig. 8
figure 8

a ESI–MS analysis of the deglycosylated RBD(331–530)-cmyc-Pp expressed in P. pastoris. b Deconvoluted ESI–MS spectrum. The expected mass of the N-deglycosylated protein is shown in parentheses. c ESI–MS analysis of the in-solution BFD trypsin digestion of the N-deglycosylated RBD(331–530)-cmyc-Pp. The inset shows the isotopic ion distribution of a 4 + ion corresponding to peptides [Leu387-Arg403]-S–S-[Val510-Lys528] linked by a disulfide bond between C391-C525. A summary of the above results is shown in Tables 23 and the detailed assignment for all signals in (c) is shown in Table S7. d ESI–MS/MS spectrum of peptides [EAEAEFS-Asn331-Arg346]-SS-[Ile358-Lys378] linked by a disulfide bond between C336 and C361. This species contains an extension of seven amino acids (EAEAEFS-) added to the expected N-terminal end [Asn331-Arg346] due to an incomplete processing of the propeptide (alpha mating factor) during protein expression. Asn331 and Asn343 are transformed into Asp residues due to the action of PNGase F. The nomenclature for the fragment ions observed in the MS/MS spectrum agrees with the proposed by Mormann et al. [61]

The N-deglycosylated protein was digested with trypsin by the in-solution BFD protocol (Fig. 1b) and the resultant ESI–MS spectrum (Fig. 8c) showed an unexpected signal of appreciable intensity at m/zExp 1219.32, 4 + . The MS/MS spectrum of this signal (Fig. 8d) confirmed that two peptides [EAEAEFS-(D331-R346)-S–S-(I358-R378)] were linked by an intermolecular disulfide bond between Cys336 and Cys361. One of these peptides [EAEAEFS-(D331-R346)] contains an incomplete processed fragment of the alpha mating factor signal peptide (EAEA-) [47] linked to the expected N-terminal end EFS-(D331-R346) of the mature RBD(331–530)-Cmyc-Pp. The expected molecular mass of the residues (EAEA-) linked to the N-terminal end (400.39 Da) agrees with the mass difference observed between the experimental and calculated molecular mass for the N-deglycosylated protein (400.88 Da; Fig. 8b).

Table S7 shows a summary for the assignment of all signals observed in the ESI–MS spectrum of Fig. 8c. In-solution BFD protocol in combination with ESI–MS analysis achieved a sequence coverage of 99% (Table 2).

The α-mating factor prepro peptide secretion signal is the most commonly used signal sequence for recombinant proteins expressed in P. pastoris [48]. Processing of the alpha mating factor should occur in three steps; in particular, the last step involves the Ste13 protein that cleaves the Glu-Ala repeats in Golgi [49]. All the purified protein was detected exclusively with the EAEA-linked to the N-terminal end. Probably the high expression level of this protein (40 g/L) impaired the complete processing of the propeptide. The characterization of the N-terminal end is also one of the aspects requested by the ICHQ6B guidelines [16].

Artificial modifications introduced during sample processing by the in-solution BFD protocol

In the characterization of all RBDs by using in-solution BFD protocol, we initially used acetone for protein precipitation (Fig. 1b). We noticed in the ESI–MS spectra an unexpected doubly-charged signal at m/zExp 629.81 (Fig. 9a) having a variable intensity. This signal was not detected when RBDs were processed by using in-solution SD protocol (Fig. 2d, 5d, and S6c) and when the protein precipitation step (Fig. 1b) was carried out with cold ethanol (Fig. 9b) instead of acetone (Fig. 9a).

Fig. 9
figure 9

The ESI–MS spectra shown in (a) and (b) correspond to expanded regions of the tryptic peptides derived from RBD(319–541)-HEK_Adigested by in-solution BFD protocol after precipitation with acetone and ethanol, respectively. The signals assigned in (b) as (C538 + ECG)3+ and (C538 + 374 Da)3+ correspond to the C-terminal peptide 538CVNF541-AAAHHHHHH with the C538 modified with glutathione and a chemical modification of unknown chemical nature that increased its molecular mass by 374 Da, respectively. The MS/MS spectra shown in (c) and (d) correspond to the internal non-modified Val445-Arg454 peptide (m/zExp 609.80, 2 +) and the same peptide with a modification that increased its molecular mass by 40.02 Da (m/zExp 629.81, 2 +), respectively. This chemical modification introduced in the precipitation step with acetone is located alternatively at the N-terminal end (V + 40) or at the second position glycine (G + 40). The MS/MS spectra shown in (e) correspond to the cysteinylated peptide C-terminal end peptide (538CVNF541-AAAHHHHHH) with the C538 linked by a disulfide bond (-SS-) to a Cys residue (C–OH) modified at the N-terminal end with an N-ethylmaleimide group (NEM-) introduced during the sample processing. Peptide and C–OH have been assigned as P1 and P2, respectively. The nomenclature of fragment ions is in agreement with the proposed by Mormann et al. [61]

Comparison between the MS/MS spectra of the unmodified peptide (445VGGNYNYLYR454, m/zExp 609.80, 2 + ; Fig. 9c) and the signal detected at m/zExp 629.81, 2 + (Fig. 9d) revealed that it corresponds to the same internal peptide (Val445-Arg454) modified by adding 40 Da alternatively at Gly446 (445 V[G + 40]GNYNYLYR454) and at the N-terminal end (445[V + 40]GGNYNYLYR454).

Although in literature a structure for this modification has not been proposed yet, a previous work indicated that it is specific only for those peptides having Gly at position n + 2 that were derived from tryptic digests of proteins previously precipitated with acetone [50]. All RBDs characterized here have only one internal tryptic peptide 445VG*GNYNYLYR454 with this characteristic.

The acetone traces that remain adhered in the pellet, during trypsin digestion at 37 °C for 16 h, are responsible for this modification [50]. The intensity of this modified peptide can be reduced considerably if a 15 min vacuum drying step is inserted in the protocol after acetone protein precipitation. However, care should be taken because an extensive drying makes dissolving the protein pellet in water/acetonitrile more difficult.

In-solution BFD of proteins precipitated with ethanol and acetone yield very similar results and they can be used indistinctively. However, during the analysis of RBD(319–541)-HEK_A3 after acetone precipitation, the isotopic ion distributions of the modified 445Val-Arg454 + 40 Da peptide (m/zExp 629.81, 2 + ; Fig. 9a) and the C-terminal peptide (538CVNF541-AAAHHHHHH) carrying a + 374 Da modification at Cys538 (Fig. 9b) were partially overlapped and thus, it impaired its detection. This modification at Cys538 was only detected when the protein RBD(319–541)-HEK_A3 was precipitated with ethanol and analyzed by in-solution BFD (Fig. 9b).

Another artifact originated by the sample processing was the partial addition of NEM to the N-terminal end of the RBD proteins despite the fact that maleimide has 1000-fold selectivity for thiols over amine groups at neutral pH [51].

The addition of NEM was verified by ESI–MS analyses of the RBD deglycosylated with PNGase F (Table 3) and confirmed by the ESI–MS/MS analysis of the N-terminal tryptic O-glycopeptides (Fig. S14). Despite the abundant fragmentation of glycans in the MS/MS of Fig. S14, three bn ions (b1, b3, and b4) were detected containing the N-terminal end of the peptide R319-R328 and increased their masses by 125 Da due to the addition of NEM.

In addition, the cysteinylated RBD(319–541)-HEK_A3also partially added two molecules of NEM, one at the N-terminal end of Arg319 (Fig. S14aS14b) and a second one to the N-terminal end of Cys linked to Cys538 (Fig. 9e). The ESI–MS/MS spectrum of the cysteinylated C-terminal peptide (538CVNF541-AAAHHHHHH, m/zExp 587.92, 3 +) of RBD(319–541)-HEK_A3 (Fig. 9e) confirms this finding. This result is in agreement with a publication that reports the alkylation (+ 125 Da) at the N-terminal end of proteins treated with NEM [52].

Using the in-solution BFD protocol (Fig. 1b), the remaining internal tryptic peptides were not modified with NEM at their N-terminal ends because this S-alkylating reagent was eliminated during sample precipitation and the subsequent washing steps before proceeding to the proteolytic digestion. However, three low-intensity signals in the ESI–MS analysis of tryptic digestion corresponding to peptides 356KR357 (m/zExp 428.26, 1 +), 458KSNLKPFER466 (m/zExp 622.36, 2 +), and 529KSTNLVK535 (m/zExp 457.78, 2 +) with the epsilon amino group of Lys356, Lys458, and Lys529 modified with NEM (+ 125 Da) were detected using the in-solution BFD protocol and confirmed by MS/MS (Fig. S15).

On the contrary, when the RBD is digested in-solution by using the SD protocol and NEM is present even at a very low concentration (≤ 5 mM) during all sample processing, it will be added to the N-terminal end of most of the internal tryptic peptides (data not shown).

NEM is added in excess at a concentration of 5 mM and it remains during the N-deglycosylation step (2 h at 37 °C) at a pH slightly over neutral (7.2–7.4). It seems that these conditions make this side reaction favorable at the N-terminal end of the deglycosylated RBDs as well as for the cysteine linked by disulfide bond to Cys538. In a minor extension, few epsilon amino groups of Lys residues were partially modified. Therefore, the partial addition of NEM at the N-terminal end of the protein is a side reaction to be considered when in-solution BFD is used.

We also observed hydrolysis of the thiosuccinimide ring after derivatization of free Cys residues by NEM, especially when digesting the resulting RBD preparation according to the SD protocol at basic pH [53] (Fig. 1a).

Side reactions associated with the addition of NEM [52] will be present in both protocols. Using other alkylating agents (e.g., iodoacetamide, iodoacetic acid, 4-vinylpiridine, and acrylamide) to block free cysteine residues at the initial steps of the protocol was not evaluated here, but they could also be useful. Potential side reactions related to the presence of Cys-blocking groups should definitively be explored in depth to develop a well-characterized protocol [52,53,54,55,56].

Conclusions

In-solution BFD in a single ESI–MS spectrum enabled the full-sequence coverage for most recombinant RBD sequences characterized in this work and outperformed the in-solution SD protocol in this aspect. The in-solution BFD protocol in combination with ESI–MS analysis has been demonstrated to be sensitive for the detection of PTMs present in the recombinant RBDs produced in different expression systems. Most of these PTMs were only detected when in-solution BFD was applied. The identification of the highly hydrophilic C-terminal peptides of these RBD proteins containing a His6 tag and twelve histidine residues, an important aspect requested in the ICHQ6B guidelines, was always possible by applying the in-solution BFD while with the SD sample processing, the identification was achieved only in few cases. The results shown here support that in-solution BFD protocol in combination with ESI–MS analysis can be implemented successfully for the characterization of RBDs used as active pharmaceutical ingredients of SARS-CoV-2 subunit-based vaccines [31, 57] including those derived from mutated variants of the virus [4, 58, 59].