Introduction

Coronaviruses (CoVs) named for the crown-like spikes on their surface are grouped into four main types, known as α, β, γ and δ1. In recent decades, new human β-CoVs have evolved from animal-infecting types, leading to intercontinental pneumonic outbreaks. Severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) are two highlight examples, which engendered epidemics in numerous countries worldwide in 2002 and 2012, respectively. SARS-CoV-2 was a return of coronaviruses in December 2019, and is the etiologic agent for the ongoing coronavirus disease 2019 (COVID-19). The viral spill-over originated from a Wuhan seafood market, the place where horseshoe bat SARS-like coronavirus was believed to transmit to humans. By 21st September 2020, over 30.6 million cases and 950,000 deaths have been reported to World Health Organization2.

Identifying an appropriate preventive, i.e. a COVID-19 vaccine, is critical for apt response to the SARS-CoV-2 mass contagion. There are diverse candidate types for immunogen development, including but not limited to live-attenuated or inactivated whole virus, DNA vaccines, vectored vaccines, subunit vaccines, and self-assembling virus-like particle vaccines, each with particular advantages and disadvantages. Vaccines based on chemically inactivated CoV virions have led to production of neutralizing antibodies with different levels of protection, but they raise potential side effects and biosafety concerns3. DNA vaccines expressing full-length spike protein or its fragments, as well as DNA priming coupled with protein boosting, have also been effective against MERS-CoV infection4,5. By intracellular expression of immunogenic antigens, vectored vaccines allow activation of cellular immune response in addition to humoral immunity; they have a proven safety record, however these vectors are limited to presenting one or a reduced number of CoV antigens to the immune system6. Subunit vaccines based on recombinant spike protein have been shown to be very well suited for creating immunogenicity against SARS and MERS7,8. While DNA, vectored and subunit candidates presenting viral epitopes can elicit a focused antibody response, their subviral components may not portray the full antigenic complexity of the virus, resulting in limited protective efficacy or immunopathology due to unbalanced immune responses6. One promising option to confront this issue is the subunit vaccines harbouring numerous and diverse antigenic elements, allowing to produce an inclusive spectrum of native viral antigens.

The process of vaccine development takes at least ten years from bench research to approved vaccine use9. Identification of apt elements for developing vaccine immunogens can be promoted using chemistry and topology. As a current trend, this is becoming more of a rational design exercise, which may considerably reduce both the time and costs required for vaccine development and production10. In this study, it was hypothesized that advanced computational screening could identify suitable peptides for the construction of a subunit vaccine. We used the full set of the putative structural proteins from SARS-CoV-2 as the substance for extracting antigenic elements creating both B-cell and T-cell immunity. By integrating these peptides together, we developed a multi-epitope-based vaccine polypeptide with appropriate binding to Toll-like receptor 3 (TLR-3) to elicit an effective immune response, and with least possible toxicity and hypersensitivity. We also considered incorporating an adjuvant for augmenting the host antigen-specific immune response. As this method has been validated experimentally with designs against other pathogenic species11, we suggest that the proposed structural formulation would be potential to generate an effective immune response against the novel virus. The wet lab researchers are expected to validate our design, hoping to reach protection for the healthy community against the COVID-19 pandemic.

Results

The basic steps of the procedure for designing the multi-epitope vaccine are shown in Fig. 1.

Figure 1
figure 1

Systemic flowchart of the multi-epitope subunit vaccine building against COVID-19.

Structural T-cell and B-cell epitopes of SARS-CoV-2

Final selected cytotoxic T lymphocyte (CTL) antigenic epitopes harboured in the structural proteins of SARS-CoV-2 are listed in Table 1, indicating the epitopes for spike glycoprotein (S), envelope protein (E), membrane protein (M) and nucleocapsid phosphoprotein (N). The full sets of predicted epitopes and their properties can be found in Supplementary Tables S1S4 online, for S, E, M and N proteins, respectively. The IEDB MHC-II prediction tool was applied to predict helper T lymphocyte (HTL) (15-mer) epitopes and their MHC-II binding, results of which are shown in Table 2, and with more detail in Supplementary Table S5 online. Both Tables 1 and 2 also indicate the HLA restriction of the single predicted epitopes. The peptide DVSLVKPSFYVYSRVK contained in the E protein was identified as the only linear B lymphocyte (LBL) epitope, when applying the score value > 0.75 as the criterion for selection (More detail in Supplementary Table S6 online). Using appropriate server tools, antigenic, non-toxic and non-allergenic T-cell or B-cell epitopes that were able to induce interferon-gamma (IFN-γ), interleukin-4 (IL-4) and interleukin-10 (IL-10) cytokines were selected for the design of a multi-epitope vaccine.

Table 1 The final set of cytotoxic T lymphocyte (CTL) epitopes selected for multi-epitope vaccine construction.
Table 2 The final set of helper T lymphocyte (HTL) epitopes selected for multi-epitope vaccine construction.

Multi-epitope vaccine polypeptide construction

The total of 34 CTL, and 12 HTL epitopic peptides were fused to each other by KK, and GPGPG linkers, respectively, followed by adjoining a single LBL epitope using the KK linker, to create the multi-epitope peptide-based vaccine construct12,13,14. Furthermore, to boost the immunogenicity of the multi-epitope vaccine, β-defensin (GIINTLQKYYCRVRGGRCAVLSCLPKEEQIGKCSTRGRKCCRRKK) was added as an adjuvant to the amino terminus of the polypeptide using an EAAAK linker to the first CTL epitope. The primary structure of the multi-epitope subunit vaccine construct included the total of 694 amino acids (Fig. 2 and Supplementary Fig. S1 online).

Figure 2
figure 2

Schematic profile of the multi-epitope subunit vaccine construct of length 694 residues. An adjuvant was added at the N-terminal tail of the vaccine using EAAAK linker, followed by 34 CTL, 12 HTL and 1 LBL epitopes fused by KK and GPGPG linkers.

Immunogenicity, allergenicity and physiochemical properties of the vaccine candidate

Assessment of immunogenic, allergenic and solubility showed the multi-epitope vaccine candidate was of a suitable design (Table 3). The calculated molecular weight of the final vaccine polypeptide (78.35 kDa) indicated its good antigenic nature13,15, and the pI value of 10.38 showed the basic nature of the final vaccine candidate. The instability index of the final polypeptide (36.73) indicated its high stability. The Grand average of hydropathicity value was − 0.261, showing that the final structure is slightly hydrophilic in nature, which can lead to better connection with other proteins. Moreover, the aliphatic index of the peptide was equal to 81.44, which implied the final peptide had high thermo-stability (Table 3). Estimated half-life was 30 h in mammalian reticulocytes in vitro, and > 20 h in yeast, and > 10 h in E. coli in vivo.

Table 3 Antigenic, allergenic and physiochemical assessments of the primary sequence of final vaccine protein.

3D structure of the vaccine polypeptide

The sequence of the multi-epitope vaccine polypeptide was submitted to appropriate tools, and a refined model of its 3D structure was obtained (Supplementary Fig. S1 online). The output from ProSA-web validation tools showed a Z-score of -4.33, indicating the good quality of the final vaccine polypeptide structure (Supplementary Fig. S1 online). In addition, RAMPAGE analysis indicated that 92.3% amino acids of the final structure were in the favoured area, 6.1% were in the allowed area, and 1.6% were in the disallowed area of Ramachandran plot, reflecting a high structural quality of the constructed vaccine model (Supplementary Fig. S1 online).

Conformational and linear B-cell epitopes in the vaccine polypeptide structure

The total of 7 conformational epitopes with scores of 0.702 to 0.882, and 9 linear epitopes with scores of 0.7 to 0.878 were selected as the final B-cell epitopes (Fig. 3, Supplementary Table S7 online). PI value (the score given by ElliPro) of 0.882 shows that 88.2% residues are locating in the predicted ellipsoid area of the epitope and this epitope features the highest solvent accessibility.

Figure 3
figure 3

B lymphocyte epitopes present in the designed multi-epitope vaccine. (a) Conformational B cell epitopes shown by spheres, (b) Linear B cell epitopes shown by spheres.

Binding of the vaccine structure to TLR-3

The best docked complex of the immune receptor TLR-3 with the vaccine model was identified from among various server outputs by comparing the binding free energy. Analysis of the residues contributing at the protein–protein interface showed that the C-terminal domain of the polypeptide is involved in the interaction, where residues from vaccine polypeptide form polar or non-polar contacts with three domains on the TLR-3 structure (Supplementary Fig. S2 online). The docked complex was further applied for running molecular dynamic (MD) simulation investigations.

MD relaxation and analysis of the receptor-vaccine complex

The MD simulations of the docked multi-epitope-based subunit vaccine with TLR-3 as the receptor were carried out to achieve information about the conformational changes of TLR-3-vaccine polypeptide complex. Such studies were essential for several vital facets: (1) to comprehend whether the designed vaccine was stable at the bound pocket; (2) to verify that the induced conformational mobility of both TLR-3 receptor and the multi-epitope vaccine structure did not exert undesired impact on the conformation of the docked proteins; and (3) to corroborate that the epitopes of the multi-epitope vaccine were potential for efficiently being recognized by the human immune system, causing strong immune response.

Three statistical factors were assessed based on 24,000 ps of simulation trajectory (Fig. 4). The root mean squared deviation (RMSD) values of TLR-3 and multi-epitope vaccine in the complex reflected the high conformational stability of the docked molecules. An average RMSD of 0.29 nm with maximum of 0.44 nm realized at 14,000 ps was noted for the TLR-3 molecule (Fig. 4A). The RMSD value for the multi-epitope vaccine model (Fig. 4B) showed that it mostly remained stable during simulation time, with a plateau at about 10,500 ps. The observed trends can be attributed to the moving multi-epitope vaccine at TLR-3 binding pocket in an effort to obtain a suitable and stable docked conformation.

Figure 4
figure 4

Illustration of the molecular dynamic equilibration for simulation outputs. (a) Root mean squared deviations (RMSDs) of Cα for Toll-like receptor-3 (TLR-3), and (b) for the multi-epitope vaccine polypeptide; (c) Root mean squared fluctuations (RMSFs) of Cα atoms for TLR-3, and (d) for the multi-epitope vaccine polypeptide; (e) Radius of gyration for TLR-3, and (f) for the multi-epitope vaccine polypeptide.

Region-wise structural fluctuations of the TLR-3-vaccine polypeptide complex were studied by calculating the root mean squared fluctuation (RMSF) parameter (Fig. 4C,D). The mean RMSF calculated for TLR-3 and the multi-epitope vaccine were 0.2 nm and 0.83 nm, respectively, which were overall in favour of protein residues’ local stability. Along the vaccine sequence, residues located at the loop regions (such as I137, T138, W204, N376 and R654) had high fluctuation (Fig. 4D). The flexibility of the loop regions is essential for proper holding of the vaccine at the binding pocket.

The compactness of the complex structure was estimated by calculating the radius of gyration (Rg) of TLR-3 and multi-epitope peptide molecules. The related graphs (Fig. 4E,F) showed that during the simulation time, TLR-3 and multi-epitope vaccine molecules had mean Rg values of about 3.56 nm and 4.87 nm, respectively. Rg fluctuations is an indication of movements and conformational changes in flexible regions of the multi-epitope vaccine peptide in the TLR-3 binding pocket. This dynamics seems essential to suitably identify the vaccine and incorporating it in the binding pocket.

Free energy of the binding between vaccine polypeptide and TLR-3

To figure out the strength of the contact between multi-epitope vaccine and TLR-3 structures, the binding free energy between the two molecules was calculated using MMPBSA approach. According to Table 4, the nonpolar element (− 136.92 kcal/mol) was an important energy term in the binding free energy of the complex. Our findings clarified that the favourable electrostatic energy (Eele = − 241.5 kcal/mol) was covered up by the huge polar energy component (ΔGGB = 240.3 kcal/mol) in the binding process of the multi-epitope vaccine polypeptide. Therefore, the nonpolar energy was known as the main driving force in the vaccine binding to TLR-3, and this hydrophobic contribution leads to a thermodynamically favourable interaction (∆Gbinding = − 138.11 kcal/mol). To further clarify the binding mode, the calculated energy was broken down into residue-residue pairs through the binding free energy decomposition analysis. According to the data, there were several residues of vaccine polypeptide with less than − 1.5 kcal/mol free energy of contribution in the binding mechanism. The binding pose of vaccine with the key residues are illustrated in Fig. S3.

Table 4 Binding free energy calculation for the multi-epitope vaccine candidate-TLR-3 complex.

Discussion

Classic live or attenuated vaccines have a long history of success, however their deployment is associated with issues of biosafety such as autoimmune or strong allergic responses, plus difficulties with their synthesis and manufacture. These drawbacks might be addressed by developing fully synthetic peptide-based vaccines16. Research on COVID-19 prevention through this approach was begun immediately after release of relevant basic data such as its full genomic sequence. Specific epitope regions in SARS-CoV-2 with high homology to SARS-CoV, the best-characterized coronavirus in terms of epitope responses, were identified17,18,19. Antigenic properties of spike glycoprotein were more focused by theory and experimental researchers20,21,22. Applying this basic knowledge, a number of peptide-based designs have been presented to date, which encompass those integrating epitopes from a single or few viral protein(s)23,24,25,26,27 to those considering the whole proteome of the virus28. The present research contributes to current understanding in this field by offering an alternative approach for developing a potential COVID-19 vaccine, where the entire set of structural proteins were searched for most efficacious epitopic segments.

One rationale for the choice of the whole set of structural proteins of SARS-CoV-2 for epitope identification in this study, was the ample evidence on immunity-related advantages of each of the S, E, M and N proteins of SARS-CoV or SARS-CoV-217,29,30,31,32. In addition, the choice of structural proteins as the source of epitopes implies a highly-conserved epitope set, as structural proteins are subject to less sequence or structural evolution18. We found that our proposed epitopes demonstrate overlap with similar studies which have extracted the epitopes from individual structural proteins from different sequence database entries; the similarity of findings and the conservation analyses by those research groups may also indicate the high epitope conservancy among geographical strains26,27. Aside from these advantages, our selection of structural epitopes aimed to cope with potential vaccine side effects, particularly the antibody-dependent enhancement of infectivity (ADEI). ADEI is a great challenge of subunit vaccines, referring to the reduced specificity in response elicitation because of the numerous conformations adopted by a peptide vaccine10. Strategies to mitigate this concern include: 1- immunofocusing by considering several most antigenic epitopes33; 2- inclusion of structural proteins: structurally flexible coronaviral epitopes may be of limited value for in vivo immunotargeting and require to be replaced by conserved epitopes with low structural plasticity6. Structural proteins are characterized by highly conserved sequences, thus the linear epitopes remain highly stable in these proteins and the conformational epitopes preserve their structural patterns during various steps of the virus cycle. Our choice to include the full set of SARS-CoV-2 structural proteins has thus combined the advantages of augmented immunogenicity and geographical conservancy with precautions of both strategies proposed for reducing ADEI.

Integrating multiple immunodominant sites of viral pathogens in the vaccine structure helps augment the antigenic effect. Reports revealed that multiple antigenic peptides induce stronger B and T cells immune responses than un-conjugated peptide epitopes34,35,36. Thus, consecutive sequences of HTL and CTL epitopic peptides were fused to each other using accepted linkers. GPGPG has been introduced as a spacer between adjacent epitopes, because of several positive contributions to the immune-modulatory function of polypeptide vaccines, including reduction of junctional epitopes, increased proteasome processing, and augmented immunogenicity37. The KK linker is the target sequence of cathepsin B, a lyzosomal protease of high importance for antigen processing. Connecting two peptide antigens together using the lysine linker helps avoid induction of antibodies to the amino acid sequence that is generated by joining of two peptides and most antibodies would be reactive to each peptide38.

Epitope-based peptide vaccines induce relatively weak immune response, when used alone39. The immunoreactivity could be improved by appending proper adjuvants. The vaccine construct in this study was prepared by fusing the epitopic peptides to β-defensin as an immunogenic adjuvant. β-Defensin has previously been reported as a potent adjuvant when conjugated with MERS-CoV antigens. Vaccines containing defensins as adjuvants have been shown, both in vivo and in vitro, to activate the primary innate antiviral immune response and mediate other immunomodulatory activities against a number of viruses, including coronaviruses40,41. Use of appropriate adjuvants has also been shown to help induce durable IFN-γ responses6.

In this study, the binding of the built vaccine model with the immune receptor TLR-3 was assessed by performing molecular docking, MD simulation and free energy calculations. Triggering TLR-3 may help induce the TLR signalling networks which activate the immune pathways evolved specifically against viral pathogens. In addition, this TLR choice was a consensus from vaccine design research works on viral/retroviral pathogens, specifically coronavirus species12,14,25,26,42,43. While we note that other Toll-like immune receptors such as TLR-2, -4, -5, -7 and -8 may be stimulated by the COVID-19 virus, they were not considered here as their exact roles are not established, with some possibly functioning even to the advantage of the coronavirus44,45.

Conclusion

In the ongoing urgent situation brought about by SARS-CoV-2, it is hard to fast counteract the circulating disease through preventative or therapeutic measures. The multi-epitope-based subunit vaccine design obtained through an immunoinformatic pipeline in this study could be promising, as it incorporates a priori bioinformatics predictions and up-to-date immunological knowledge. Despite the current highly active research community as well as the efforts of the industry section in the road to find an optimal immunogenic formulation, the extreme diversity in available design choices, with no a priori image of their effectiveness, appears as a fundamental obstacle in this route. To overcome this issue, a rapid mass design and screening strategy incorporating the same immunoinformatic approach is recommended to be developed by the research community, with the hope to find one formulation with the highest immunogenicity and biosafety.

Methodology

SARS-CoV-2 structural protein sequences

The amino acid sequences of SARS-CoV-2 structural proteins, including the spike glycoprotein (S), envelope protein (E), membrane protein (M), and nucleocapsid phosphoprotein (N), were retrieved using the NCBI reference genome of the virus (accession number NC_045512.2).

Identifying cytotoxic T lymphocyte (CTL) epitopes

Predicting peptides that are capable of inducing CTL responses is a crucial step in the design of epitope-based vaccine. The MHC-I Binding tool of Immune Epitope Database and Analysis Resource (IEDB; http://tools.iedb.org/mhci) was used to predict the CD8+ T-cell epitopes borne in S, E, M and N proteins12,46,47. In this step, the ANN 4.0 method was set as the prediction method. The human was selected as the source species. The maximum IC50 value was set to 50 nM, and percentile rank < 1 was considered as threshold since lower score indicates high affinity48,49.

Identifying helper T lymphocyte (HTL) epitopes

IEDB (http://www.iedb.org) was used to predict MHC-II binding of 15-mer epitopes for viral structural proteins against human HLAs such as HLA-DRB1*15:01, HLA-DRB4*01:01, HLA-DRB3*01:01, HLA-DRB5*01:01, HLA-DRB1*03:01, HLA-DRB3*02:02 and HLA-DRB1*07:01 (Supplementary Table S8 online), using NN-align 2.3 method13,14,50,51. The maximum IC50 value of 50 nM, 500 nM and 5000 nM indicate high, intermediate and low affinity of epitopes, respectively52. The 15-mer epitopes with IC50 values < 50 nM were considered to be included in the vaccine polypeptide52,53,54.

B-cell epitopes prediction

The ABCpreds server was used to predict 16-mer linear B-lymphocyte (LBL) epitopes, using a threshold of 0.5155. In addition, the ElliPro tool of IEDB was utilized to predict conformational and linear B-cell epitopes of the vaccine polypeptide.

Assessment of identified epitopes for antigenicity, allergenicity, and toxicity

The antigenic potential of each of the T and B cells epitopes was predicted by VaxiJen v2.0 applying a threshold of 0.456. The predicted T and B cells epitopes were then further evaluated in terms of toxicity and allergenicity, with ToxinPred and AllergenFP v1.0 servers, respectively57,58. Then, the ability of each of the HTL and B cell epitopes (CD4+) to induce the secretion of cytokines, such as interferon-gamma (IFN-γ), interleukin-4 (IL-4) and interleukin-10 (IL-10), to overcome the inflammatory response and prevent tissue damage was predicted by IFNepitope, IL4pred and IL10pred server tools, respectively59,60,61. The IL4pred and IL10pred operations were carried out based on SVM method and hybrid method, respectively, with other parameters kept as default12,13.

Designing the multi-epitope vaccine polypeptide construct

To develop a multi-epitope vaccine construct, we selected those predicted epitopes which demonstrated high antigenic potential, were not identified as allergic or toxic, and had good solubility when highly expressed. To construct the vaccine polypeptide, the selected CTL, HTL and LBL epitopes were fused together using appropriate linkers62,63. In addition, a 45-amino acid sequence prepared from β-defensin-2 protein was added to the N terminus of the vaccine sequence using the EAAAK linker, to increase the immunogenic capacity of the multi-epitope vaccine12,52,63.

Immunogenic, allergenic and physiochemical evaluation of vaccine construct

The antigenicity of the multi-epitope vaccine polypeptide was predicted utilizing the VaxiJen v2.0 tool, with the threshold value of 0.456. The allergenicity of the vaccine was analysed using AllerTOP v.2.0 and AllergenFP v.1.0 servers57,64. The ProtParam server was employed to evaluate the physical chemistry properties of the vaccine construct, such as amino acid composition, molecular weight, theoretical isoelectric point (pI), grand average of hydropathicity (GRAVY), aliphatic and instability index, and in vitro and in vivo half-life65.

Vaccine polypeptide structure modelling, refinement and validation

The SOPMA server was applied for analysing the secondary structural properties of the multi-epitope vaccine polypeptide66. The GalaxyWEB server was employed for modelling and refinement of the 3D structure model67. The server relaxes the model structure using repacking and molecular dynamics (MD) simulation. Next, RAMPAGE server and ProSA-web tools were used to validate the refined 3D model68,69. All of these tools give us the overall quality of the 3D structure of the peptide vaccine.

Molecular docking of the vaccine polypeptide to TLR-3

The binding of antigenic agents with the target immune cell protein is crucial for the creation of a suitable immune system response. For analysing the binding pattern of the multi-epitope vaccine polypeptide with TLR-3 (PDB ID: 2A0Z)70, molecular docking analysis was performed by Hdock, Zdock, Cluspro and Hawkdock71,72,73,74. Among the molecular species docked by each of these applications, the best outputs were extracted, followed by the four docked molecules uploaded into the Hawkdock program, and the free energy of binding of each complex calculated. Based on this screening, it was determined that the model selected from the Cluspro output had the best binding free energy. Therefore, this molecular species was chosen as the primary structure for initiating the MD simulation.

MD simulation of the vaccine-receptor complex

To determine the structural stability and to study the molecular details of the interactions between TLR-3 and the multi-epitope vaccine polypeptide in the docked conformation, MD simulation was performed. Briefly, the system including vaccine polypeptide-TLR-3 was simulated by the Gromacs-2020 package applying OPLS-AA force field75. The complex system was solvated using TIP3P water model. Then, the genion module was utilized to neutralize the whole system. Next, the conjugate gradient algorithm was applied to minimize the energy of the system. In an NVT ensemble, the temperature of the system gradually increased from 0 to 310 K during 400 ps. Subsequently, in an NPT ensemble, 500 ps simulation was carried out at the pressure of 1 atm and the temperature of 310 K. Production simulation for 24,000 ps was then implemented. The particle-mesh Ewald (PME) and the LINCS algorithms were applied to assess all electrostatic connections and to restrain all bond lengths in the protein, respectively. Moreover, periodic boundary condition was utilized during the simulation. The final coordinates obtained for the complex system were analysed with classic MD analyses, plus the MMPBSA method for calculating the free energy of intermolecular interactions76. The results were visualized by Pymol (Schrodinger L.L.C.).