Keywords
SARS-CoV-2, Sus scrofa, Mus musculus, phylogenetic analysis, homologous recombination analysis
This article is included in the Emerging Diseases and Outbreaks gateway.
This article is included in the Coronavirus collection.
SARS-CoV-2, Sus scrofa, Mus musculus, phylogenetic analysis, homologous recombination analysis
A novel coronavirus, SARS-CoV-2, was recently reported in the city of Wuhan, Hubei province, China, causing severe respiratory diseases as well as epidemic all around the China. The inflection point of confirmed cases didn’t occur until February 2020, with the first patient hospitalized on the 12th of December 20191. On 29th February, 573 new confirmed cases of novel coronavirus infection were reported on the Chinese mainland, bringing the total to 78,630. There were 35 new fatalities reported daily, with the cumulative fatalities were up to 2761. Meanwhile, outside the Chinese mainland, more than 5000 cases have been confirmed in Asia, in places such as Japan, Singapore, Thailand and South Korea, in Europe, in places like Germany and France, and in the Americas. Consensus that the SARS-CoV-2 originated from bats has been reached2. However, intermediate hosts are deemed as having mediated human infection via gradually adapting the mechanism of transcription and translation in the human body.
The exact putative parent of SARS-CoV-2 remains uncertain. As a matter of fact, either wild animal or rear livestock deserve the suspicion. In the past ten years, there were more than 300 strains of porcine coronavirus had been reported in China (Data collected from NCBI Virus). Frequent contact between humans and swine could lead to a higher risk of cross-species transmission or virus recombination. For the sake of identifying intermediate host, relative synonymous codon usage (RSCU) analysis was applied to evaluate the potential diversity of species acting as reservoirs. Phylogenetic and homologous recombination analysis were also used to illuminate the correlated coronavirus hosting in the most possible host.
More than 300 genome sequences were obtained and analyzed from GenBank (a list of downloaded sequences is provided with Extended data A)3. Porcine and murine coronavirus from China were refined and downloaded from NCBI virus database (Table S1, Extended data B)4. Here, the mitochondrial genes represent the whole genome of potential host (Table S2, Extended data B)4. ClustalX 1.83 was applied to align the sequences. Coronavirus hosting in Sus scrofa from different regions of China were calculated (Extended data A)5. To be more specific, results from NCBI Virus database were refined using terms “coronaviridae”, “China” and “sus scrofa” from Jan 1, 2010 to Mar 2, 2020. More than 300 nucleotide outputs were identified. Eventually, confirm the exact geographic region according with every GenBank accession. The porcine coronavirus belonging to diverse regions was presented as a heatmap.
MEGA X (v1.0.3) was used to construct the phylogenetic trees using the Neighbor-Joining method6 (The input GenBank accessions are shown in Table S1, Extended data B)4. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches7. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. All genome sequences were aligned before implementing of phylogenetic analysis.
In order to identify the relative synonymous codon usage (RSCU) bias of the SARS-CoV-2 and its potential host, the coding sequences of suspect species were calculated using CodonW 1.4.2. (Table S4, Extended data C)5. Whole coding sequences of the genomes, downloaded from NCBI, were utilized for identifying the host among different viruses8 (the input GenBank accessions are shown in Table S1, Extended data B)4. The RSCU were calculated with CodonW 1.4.2. The heat map clustering of RSCU was realized via applying MeV 4.9.0. The homologous analysis was executed via analyzing the pairwise distance of mitochondrial genes from the potential host animal. The pairwise distances were computed with MEGA X via the bootstrap test (1000 replicates) to evaluate potential hosts9.
The genome sequences of bat RaTG13 (MN996532.1), murine JHM (AC_000192.1), Human HKU1 (NC_006577.2), PEDV H11-SD2017 (MH708243.1) and PEDV YN15 (KT021228.1) were obtained from the NCBI Virus database. Potential recombination events of SARS-CoV-2 were first implemented with Recombination Detection Program v4 (RDP4), then Simplot (version 3.5.1) was utilized to verify the possible breakpoints. The BootScan method were taken into the RDP4 analysis. Potential recombination events were characterized with similarity plots, with possible regions of recombination indicated.
The distribution of PEDV with Geographical heat map indicated the risk of pandemic of porcine epidemic diarrhea in different regions10 (Figure 1). In the past 10 years, more than 300 strains of PEDV have been reported in China11. Guangdong province ranked top with 81 strains, followed by Jiangsu with 27 strains. The distribution of PEDV in Hubei province ranked third with the amount of 21 strains. Higher amounts of PEDV means higher infection exposure. Therefore, we believe that the outbreak of SARS-CoV-2 is associated with porcine coronavirus to some extent.
Note that the Porcine Epidemic Diarrhea Virus (PEDV) strains locating accurately in different provinces were collected from NCBI virus database between 2010~2020(Table S3). The depth of the color represents the amount of PEDV related strains in distinctive regions. Note: Chinese map has been reproduced with permission from www.d-maps.com.
RSCU has been widely utilized to analyze the association between virus and potential host8. The output from RSCU heat map, based on the Euclidean distance, inferred that both Sus scrofa and Mus musculus have similar synonymous codon usage bias with SARS-CoV-2 (Figure 2A). The Euclidean distance between Sus scrofa and SARS-CoV-2 is minimal, hinting that SARS-CoV-2 could effectively use porcine translation machinery better than that of other animals, suggesting that the epidemic SARS-CoV-2 might originate from swine. The pairwise distance between the SARS-CoV-2 complete genome and the potential host mitochondrial genome also supports the above judgments. Regarding the pairwise distances of SARS-CoV-2, both swine and mice presented shorter than snakes, marmots, mink, bats and humans and (Figure 2B). Pairwise distance between the SARS-CoV-2 and Sus scrofa is 14.89, and for SARS-CoV-2 and Mus musculus is 14.94. This result indicates that SARS-CoV-2 may originate from both swine and mice.
(A)Heat map of relative synonymous codon usage (RSCU) derived from the complete genome of SARS-CoV-2, as well as the mitochondrial genome of diverse animals (Sus scrofa, Mus musculus, Najaatra, Mustela Pulourius, Marmota flaviventris, Rhinolophus sinicus, Homo sapiens). (B)Homology analysis between the SARS-CoV-2 and different animal species (Table S4), Pairwise Distance was applied to evaluate the homology as compared with SARS-CoV-2 via MEGA-X.
The 21 PEDV strains were filtered from 2010~2020 in Hubei based on the NCBI virus database (The GenBank accessions were shown in Figure 3). Phylogenetic analysis of the coronavirus (HKU15, PHEV, NL63, H11-SD2017, 229E, HKU2, TGEV, etc.) derived from varied species (bats, swine, humans), representing the sister lineage to SARS-CoV-2 with 99% bootstrap support (Figure 3). RSCU revealed that related coronaviruses have similar synonymous codon usage bias with SARS-CoV-2 (Figure 4). The other PEDV strains obtained from Hubei also showed the closely phylogenetic correlation with SARS-CoV-2. Overall, the close phylogenetic relationship to Sus scrofa provides evidence for bat-swine axis, being one of the origins of SARS-CoV-2.
The neighbor-joining tree (bootstrap n=1,000; p-distance) was exerted to represent the evolutionary history of the taxa analyzed. Phylogeny-based geographical dissection of 21 PEDV strains (Brown box) derived from Hubei. Information about the 21 PEDV strains is shown in Table S1 Extended data B4.
The heat map of RSCU derived from the complete genome of SARS-CoV-2 and other polyphyletic coronaviruses. The Euclidean distance was calculated to cluster the related coronavirus.
Meanwhile, the result of RSCU also presented that both SARS-CoV-2 and related coronavirus lean towards to having similar synonymous codon usage bias (Figure 4 and Table S5, Extended data C)5. Therefore, the origination of SARS-CoV-2 could be further focus on the coronavirus isolating from swine and mice. Particularly, the relationship with PEDV H11-SD2017 and PEDV YN15 needed to be further studied.
To our knowledge, this is the first study to report that porcine and murine coronavirus may attend the reorganization of SARS-CoV-2. The RDP4 estimated the possible reorganization regions for SARS-CoV-2 (Table 1, Figure 5A). Furthermore, SimPlot analysis confirmed the homologous recombination of sequence similarity between SARS-CoV-2 and coronavirus from potential hosts. The potential recombination breakpoints (16205-16358nt) are shown in red dashed lines (Figure 5B), indicating recombination between PEDV YN15 and the murine hepatitis virus JHM when SARS-CoV-2 was queried. In addition, the region between 20923 and 21181 nt also indicated a recombination event taking place between PEDV H11-SD2017 and Human coronavirus HKU1 when SARS-CoV-2 was queried (Figure 5C).
Reorganization sequence | Major and minor parent sequence | P value | Reorganization region (nt) |
---|---|---|---|
MN908947.3 | KT021228.1—AC000192 | 4.764×10-3 | 16205–16358 |
MH708243.1—NC006577.2 | 2.579×10-2 | 20923–21181 |
MN908947.3: Wuhan_seafood_market_pneumonia_virus_isolate_Wuhan-Hu-1_complete_genome; KT021228.1: Porcine_epidemic_diarrhea_virus_strain_YN15_complete_genome; AC_000192.1: Murine_hepatitis_virus_strain_JHM_complete_genome; NC_006577.2: Human_coronavirus_HKU1_complete_genome; MH708243.1: Porcine_epidemic_diarrhea_virus_strain_H11-SD2017_complete_genome.
The similarities to different reference sequences are indicated by different colors shown in the legend box at the top. (A) The sequence of SARS-CoV-2(query), Bat coronavirus RaTG13, Human coronavirus HKU1, Murine hepatitis virus JHM, PEDV YN15 and PEDV H11-SD2017 were assessed with similarity plot. (B) The enlarged figure identifies the homologous recombination region (16205–16358 nt) of PEDV YN15 and Murine hepatitis strain JHM. (C) The enlarged figure identifies the homologous recombination region(20923–21181nt) of PEDV H11-SD2017 and Human coronavirus HKU1.
In China, the cumulative number of patients diagnosed as infected with SARS-CoV-2 is thought to over 100,000, with more than 2000 deaths. It is the most severe public health emergency since the outbreak of SARS 17 years ago12. Local residents were reined with anxiety and confused during the virus outbreak, although the Chinese government has taken substantial action to prevent and control the spread of the virus. However, multiple different messages could disrupt the attention of medical workers and science researchers. Current research about the origin of SARS-CoV-2 mostly focuses on wildlife, since the first case was thought to be highly associated with wild animals in Wuhan sea food market.
The origin of SARS-CoV-2 had caused great concern to the public. Natural variation was the dominant view holding by most scholars, believing that bats and other wild animals provide reservoirs for the virus. However, coronavirus from bats is unlikely to infect humans directly; one or two intermediate hosts may facilitate the homologous recombination, enabling the coronavirus to gradually adjust to the human genetic code, then survival and breeding successfully.
From our perspective, natural variation would be a more reasonable explanation for the origin of SARS-CoV-2. However, the wild animals in Wuhan seafood market should not accept all the liability, since there is evidence that patients with early infection didn’t have any contact with the market12. Some researchers13 have also pointed out that the Wuhan seafood market is not the only area of origin, indicating that another creature may act as intermediate host apart from those animals being sold at the market. In our study, the captivity animal, Sus scrofa (swine) and Mus musculus (mice) were suspected to be critical hosts of SARS-CoV-2.
Our finding supports the theory of natural variation. Natural variation believes that people get infected because they eat or come into contact with intermediate hosts. SARS-CoV-2 was found to be 96% identical at the whole-genome level to the bat coronavirus2. Other reports state that snakes, mink, and pangolins could be potential hosts for SARS-CoV-214.
Either wild animals or reared livestock could serve as hosts. It has been reported that Wuhan seafood market may not be the only source of novel virus spreading globally because the earliest patient became ill on 1 December 2019 and had no epidemiological link to the seafood market or later cases. The official details about the first 41 hospitalized patients showed 13 of the 41 patients had no link to the marketplace at all12. One possible hypothesis is that the cross-species transmission has occurred in other places before the outbreak of Wuhan Huanan Seafood market.
Previous study into fatal swine acute diarrhea syndrome (SADS) revealed that SADS-related coronavirus was responsible for a large-scale outbreak of fatal disease in pigs in China10. Here we discovered Porcine epidemic diarrhea virus (PEDV) periodicity burst in China from 2010~202011 (Figure 1, Extended data A)3. Among one of the strains, H11-SD2017 showed closely affiliation with SARS-CoV-2 via implementing relative synonymous codon usage (RSCU) and phylogenetic analysis. Swine-to-human cross species transmission may explain why many patients with coronavirus disease-19 (COVID-19) not only suffer from severe respiratory diseases, but also diarrhea12.
In the past 10 years, PEDV has spread into most provinces with the swine industry in China. Hubei was the first region to be infected with PEDV strain CH/HBQX/10 in 2010; from then on, more and more provinces reported the presence of PEDV11. The spread of PEDV has gone beyond its initial geographical limitation. There were more than 300 PEDV strains clustered into pandemic, meaning the significant natural variation took place in the spread of PEDV in different regions. According to the statistics in China from 2010–2020, up to 21 strains of PEDV had been founded in Hubei, ranking as the third highest in China (Figure 1). Further analysis among 21 strains of PEDV and coronavirus from varied species indicate that swine and mice could be other hosts of SARS-CoV-2 (Figure 2A, Figure 3 and Figure 4).
RDP4 and Simplot analysis helped us better understand the homologous recombination of SARS-CoV-2 (Table 1, Figure 5). It verified that not only porcine coronavirus, but also murine coronavirus, experienced recombination events. Therefore, we speculate that SARS-CoV-2 may originate from the bat firstly, undergoing a series of recombination events, with swine and mice playing critical role in mediating cross species transmission.
Pairwise analysis of distance also indicated that Mus musculus could be a possible host of SARS-CoV-2 (Figure 2B). Previously, the Chinese Centers for Disease Control said that 33 of the samples were positive for the novel coronavirus nucleic acid. The positive samples were distributed among 22 stalls and a garbage truck in the Wuhan Huanan seafood market, so the outbreak is highly suspected to be related to the wildlife trade. But how could wild animal movement around the market lead to cross-infection with disparate species in different regions? One plausible phenomenon is that mice could be infected with SARS-CoV-2 firstly, transferring the infection to other wild species in the market. Thus, the role of mice in the market deserves more attention. In order to verify the exact intermediate host, mice and swine living around Wuhan should be collected for further proof test basing on Koch’s postulates
Eventually, the rear livestock should be deserved more notice apart from wild vertebrate creatures. To our knowledge, this is the first study to illustrate that swine and mice are the probable livestock reservoir for the SARS-CoV-2. Furthermore, the mice around the seafood market may also be involved in the cross transmission of the virus to some extent. All of those output based on the bioinformatics analysis, further identification of host should be verified from isolation and other experiments.
Figshare: Identification Sus scrofaand Mus musculus as potential parasitifersof SARS-CoV-2 via phylogenetic and homologous recombination analysis. https://doi.org/10.6084/m9.figshare.119255883.
This project contains Extended data A: PEDV distribution in China from 2010–2020. (The 329 PEDV genomes assessed in this study.)
Figshare: Identification Sus scrofaand Mus musculus as potential parasitifersof SARS-CoV-2 via phylogenetic and homologous recombination analysis. https://doi.org/10.6084/m9.figshare.119256934.
This project contains Extended data B: The GenBank accession numbers used for analysis. (Contains accession numbers, strain name and species used for phylogenetic analysis.)
Figshare: Identification Sus scrofaand Mus musculus as potential parasitifersof SARS-CoV-2 via phylogenetic and homologous recombination analysis. https://doi.org/10.6084/m9.figshare.119256545.
This project contains Extended data C: RSCU analysis of diverse genome. (Contains RSCU analysis of codons derived from SARS-CoV-2.)
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
We are appreciated with the sharing of SARS-CoV-2 complete genome (GenBank accession MN908947) from Prof. Yongzhen Zhang research team, as well as the related genome GenBank accessions from other researchers.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 2 (revision) 22 Apr 20 |
read |
Version 1 16 Mar 20 |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
A separate comment is that the authors' use of "natural variation" as the proposed mechanism for transmission is not a very precise description of what they mean, which is that the virus "naturally" transmits among people and animals frequently encountered in every day life and does not require unusual encounters of "wild" animals in meat markets. There must be a better term for this mechanism. Perhaps "routine contact"? Epidemiology must have a term for this, but that is not my area of expertise.
A separate comment is that the authors' use of "natural variation" as the proposed mechanism for transmission is not a very precise description of what they mean, which is that the virus "naturally" transmits among people and animals frequently encountered in every day life and does not require unusual encounters of "wild" animals in meat markets. There must be a better term for this mechanism. Perhaps "routine contact"? Epidemiology must have a term for this, but that is not my area of expertise.