Keywords
2019-nCoV, Wuhan coronavirus, SARS, MERS
This article is included in the Emerging Diseases and Outbreaks gateway.
This article is included in the Coronavirus collection.
2019-nCoV, Wuhan coronavirus, SARS, MERS
The manuscript is slightly modified in accordance with the suggestions of the second Reviewer. This includes brief comments of predictions in the article which concern the SARS-CoV-2 origin and its natural reservoirs.
To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.
Fears are mounting worldwide over the cross-border spread of the new coronavirus (denoted as SARS-CoV-2) that originated in Wuhan, the largest city in central China, after its spread to many countries around the world. The newly emerging pathogen belongs to the same virus family as the deadly severe acute respiratory syndrome and Middle East respiratory syndrome coronaviruses (SARS-CoV and MERS-CoV, respectively). The World Health Organization (WHO) has recently published surveillance recommendations for a possible “large epidemic or even pandemic” (pandemic declared on March 11th, 2020) of the novel coronavirus and it has issued guidelines for hospitals across the world. However, many questions about SARS-CoV-2 remain unanswered: (i) what is the origin and/or natural reservoir of the virus? (ii) is it easily transmitted from human to human? and (iii) what are the potential diagnostic, therapeutic and vaccine targets? Currently, only nucleotide sequences of eight human SARS-CoV-2 isolates are available without any additional information about biological properties of the virus, beyond the morphological confirmation of the virion using electronic microscopy. This is likely not enough information to answer the important abovementioned questions.
The informational spectrum method (ISM), a virtual spectroscopy method for analysis of proteins, is based on the fundamental electronic properties of amino acids and requires only nucleotide sequence availability to investigate proteins1. For this reason, ISM was previously used for analysis of novel viruses for which little or no information were available2–5. Here, the SARS-CoV-2 was analyzed with ISM to identify its possible origin and natural host, as well as putative therapeutic and vaccine targets.
The S1 surface protein sequences from the first 8 human SARS-CoV-2, deposited in the publicly available GISAID database (assessed on January 19, 2020), were analyzed by ISM. The studied sequences were BetaCoV/Wuhan/IVDC-HB-04/2020, BetaCoV/Wuhan/IVDC-HB-01/2019, BetaCoV/Wuhan/IVDC-HB-05/2019, BetaCoV/Wuhan/IPBCAMS-WH-01/2019, BetaCoV/Wuhan/WIV04/2019, BetaCoV/Wuhan-Hu-1/2019, BetaCoV/Nonthaburi/61/2020, and BetaCoV/Nonthaburi/74/2020.
In the phylogenetic analysis, different amino acid sequences of other coronaviruses were also included: (i) S1 proteins from the following viruses: AVP78042, AVPvp78031, AY304486, AY559093, JX163927, YN2018B, KY417146, used already by other authors in the study of the phylogenetic relationship between SARS-CoV-2 and nearest bat and SARS-like CoVs (GISAID database); and (ii) S1 proteins from three first isolated human MERS-CoV: AGG22542, AFS88936, AFY13307, deposited in the GISAID database
Detailed description of the sequence analysis based on ISM has been published elsewhere2. According to this approach, sequences (protein or DNA) are transformed into signals by assignment of numerical values of each element (amino acid or nucleotide). These values correspond to electron-ion interaction potential6, determining electronic properties of amino acid/nucleotides, which are essential for their intermolecular interactions. The signal obtained is then decomposed in a periodical function by the Fourier transformation. The result is a series of frequencies and their amplitudes. The obtained frequencies correspond to the distribution of structural motifs (primary structure) with defined physico-chemical characteristics responsible for the biological function of the putative protein corresponding to the analyzed sequence. When comparing proteins that share same biological or biochemical function, the technique allows detection of code/frequency pairs that are specific for their common biological properties. The method is insensitive to the location of the motifs and, therefore, does not require previous alignment of the sequences. In addition, this is the only method that allows immediate functional analysis.
The phylogenetic tree of S1 proteins from coronaviruses was generated with the ISM-based phylogenetic algorithm ISTREE, previously described in detail elsewhere7. In the presented analysis, we calculated the distance matrix with the amplitude on the frequency F(0.257) as the distance measure between sequences.
In order to compare informational similarity between SARS-CoV-2, SARS-CoV, MERS-CoV and Bat SARS-like CoV, the cross-spectra (CS) of S1 proteins from these viruses were calculated. Figure 1a shows the CS of SARS-CoV-2, SARS-CoV and MERS-CoV. These CS contain only one dominant peak corresponding to the frequency F(0.257). Figure 1b displays the CS of S1 proteins from SARS-CoV-2 and Bat SARS-like CoV. Amplitudes in these latter CS are significantly lower than in those CS presented in Figure 1a. These results show that (i) S1 proteins from SARS-CoV-2, SARS-CoV, MERS-CoV and Bat SARS-like CoV encode common information, which is represented with the frequency F(0.257), and (ii) S1 proteins from SARS-CoV-2 are remarkable more informationally similar with S1 from SARS-CoV and MERS-CoV than with S1 from Bat SARS-like CoV. This suggests that biological properties of SARS-CoV-2 are apparently more similar to SARS-CoV and MERS-CoV than to Bat SARS-like CoV.
To confirm this conclusion, the ISM-base phylogenetic tree for S1 proteins was calculated (Figure 2). In this calculation the amplitude on the frequency F(0.257) was used as the distance measure. As observed in Figure 2, all analyzed SARS-CoV-2 S1 amino acid sequences are grouped with SARS-CoV and MERS-CoV and separated from Bat SARS-like CoV. This indicates that SARS-CoV-2 are more phylogenetically similar to SARS-CoV and MERS-CoV than to Bat SARS-like CoV. This result differs from those obtained with the homology-based phylogenetic analysis, which showed that SARS-CoV-2 are closely related to Bat SARS-like CoV (https://platform.gisaid.org/epi3/frontend#lightbox1296857287).
It has been previously shown that the dominant frequency in the informational spectrum of viral envelope proteins corresponds to interaction between the virus and its receptor2,3,7,8. The ISM analysis showed that the frequency component F(0.257) is present in the CS of S1 SARS-CoV and its receptor angiotensin converting enzyme 2 (ACE2)9, but not in the CS of S1 MERS-CoV and its main receptor dipeptidyl peptidase 4 (DPP4)10. Of note is that both receptors ACE2 and DPP4 are expressed in airway epithelia. Presence of F(0.257) in the informational spectrum of MERS-CoV (Figure 1) suggests also possible interaction between this virus and the ACE2. The dominant peak on the frequency F(0.257) in the CS of S1 from SARS-CoV and MERS-CoV and ACE2 supports this possibility (Figure 3), although this has not been formally proved for MERS-CoV11.
As it is shown in Figure 1a, the frequency F(0.257) is also present in the informational spectrum of the SARS-CoV-2, suggesting that ACE2 might be the receptor for this novel coronavirus too. This prediction was subsequently confirmed by functional studies in vitro12. Calculation of the CS for S1 protein from the SARS-CoV-2 and all ACE2 sequences available at the UniProt database revealed that the highest amplitudes on the frequency F(0.257) correspond to ACE2 from civet and chicken. This result indicates that these species can be included as potential candidates for the natural reservoir of the SARS-CoV-2. However, it is possible that SARS-CoV-2 viruses use very different receptors in the natural host(s) and not only the ACE2 as it is the putative case in humans. An experimental study performed on chicken, however, indicated lack of susceptibility of this species to the novel virus13; civets so far have not been tested, but the indicated study confirmed susceptibility of domestic cat to SARS-CoV-2.
Finally, the S1 amino acid sequence from the SARS-CoV-2 was scanned to look for the domain that gives the highest contribution to the information represented by the frequency F(0.257) (Figure 4a). This analysis revealed domain 266–330 (numbering concerns the maturated protein) is essential for interaction of SARS-CoV-2 with ACE2. Of note is the striking homology between these domains of S1 proteins from SARS-CoV-2 and SARS-CoV, but not from MERS-CoV for which ACE2 is not the main receptor (Figure 4b).
Further, S1 spike proteins from SARS-CoV (Table 1) and SARS-CoV-2 (Table 2) were compared. The CS of S1 proteins from SARS-CoV (Figure 5a) and SARS-CoV-2 (Figure 5b) were assessed. Principal information encoded in S1 proteins from SARS-CoV and SARS-CoV-2 is represented with two different frequencies F(0.222) and F(0.478), respectively. This result indicates some potential difference(s) in the virus-host interaction of these two viruses although they apparently use the same receptor ACE2.
To identify the host proteins involved in the attachment and/or internalization of the SARS-CoV-2, the UniProt database (https://www.uniprot.org) was screened by ISM for human proteins with the dominant peak on the frequency F(0.478). The list of human proteins that have a dominant peak in IS at the frequency F(0.478) are given in Table 3. According to the IS criterion, these proteins are potential candidate interactors with the SARS-CoV-2 S1 protein. Further, literature data mining was performed to identify which proteins presented in Table 3 might be involved in the processes of infection with human coronaviruses. This analysis revealed that the actin protein plays an important role in the early entry events during human coronavirus infections14. Actin proteins were selected as the best candidate interactors for the SARS-CoV-2 among the host proteins that are characterized with frequency F(0.478). Figure 5c shows that CS of actins from different mammalian species (Table 4) contains the dominant peak on F(0.478), suggesting that actin probably encodes the conserved information important for their biological function.
The data mining of the PubMed database (www.ncbi.nlm.nih.gov/pubmed/) also showed that actin protein plays an important role in the rapid virus cell-to-cell spread and dissemination of infection15. Additionally, the actin filament reorganization is a key step in lung inflammation induced by systemic inflammatory responses caused by infectious agents16. These findings indicate that interaction between actin proteins and the S1 could be involved in the infection and pathogenesis of SARS-CoV-2. In consequence, the possibility to interfere on this interaction might represent a valid hypothesis for development of promising prevention and therapeutic strategies.
Interestingly, further data mining revealed that ibuprofen (FDA approved drug with excellent safety record) attenuates interleukin-1β-induced inflammation as well as actin reorganization17. Actin was also found to be the primary component by which ibuprofen can bind to the tissue in different organs18. This suggests that ibuprofen might impact the SARS-CoV-2-induced disease by indirect interaction with actin proteins. Previously, ibuprofen was predicted as a candidate entry inhibitor for Ebola virus using the same in silico approach19, and this prediction was confirmed experimentally at a later time point20,21. These results prompt the possibility to experimentally test the effects of ibuprofen on SARS-CoV-2 infection under in vitro and in vivo conditions.
In silico methods are considered very important tools to generate first hypotheses and identify first drug candidates against newly discovered agents, like in the case of SARS-CoV-2, especially in the short-term. ISM, a technology based on electronic biology, allowed identifying potential importance of human actin proteins for viral infection/dissemination as well as one FDA approved drug that may have an indirect antiviral activity within weeks of the initial outbreak. However, additional experiments are required to confirm our initial findings.
In conclusion, results of the presented in silico analysis suggest the following: (i) the newly emerging SARS-CoV-2 is highly related to SARS-CoV and, to a lesser degree, MERS-CoV, and ACE2 is a likely receptor of it; (ii) civets and poultry are potential candidates for the natural reservoir of the SARS-CoV-2, (iii) human actin proteins possibly participate in attachment/internalisation of SARS-CoV-2, (iv) drugs which interact with actin proteins (e.g. ibuprofen) should be investigated as possible therapeutics for treatment of SARS-CoV-2 infection, and (v) domain 266-330 of S1 protein from the SARS-CoV-2 represents promising therapeutic and/or vaccine target. Further research on these issues are needed, including the development of reverse genetics and animal models to study the biology of SARS-CoV-2. Due to the fast evolving of scientific knowledge on SARS-CoV-2, the first prediction has been already confirmed, while the chicken as potential candidate as intermediate host has not been supported. Importantly, link between ibuprofen/actin interactions and viral entry remains an exciting path for future therapeutic investigations.
Sequence data of the viruses were obtained from the GISAID EpiFlu™ Database. To access the database each individual user should complete the “Registration Form For Individual Users”, which is available alongside detailed instructions. After submission of the Registration form, the user will receive a password. There are not any other restrictions for the access to GISAID. Conditions of access to, and use of, the GISAID EpiFlu™ Database and Data are defined by the Terms of Use.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Immunology, viral pathogenesis, viral genetics
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Immunology, viral pathogenesis, viral genetics
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
References
1. Yan R, Zhang Y, Li Y, Xia L, et al.: Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2.Science. 2020. PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Computational Biology, Structural bioinformatics, Functional annotation, Machine and deep learning
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 4 (update) 06 Jan 21 |
read | |
Version 3 (revision) 27 Apr 20 |
read | |
Version 2 (update) 31 Jan 20 |
read | read |
Version 1 27 Jan 20 |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)