Keywords
Coronavirus, SARS-CoV-2, Covid-19, Morocco, Molecular Epidemiology, Genomics Surveillance, Phylogenetics, Variant Network Analysis
This article is included in the Emerging Diseases and Outbreaks gateway.
This article is included in the Coronavirus collection.
Coronavirus, SARS-CoV-2, Covid-19, Morocco, Molecular Epidemiology, Genomics Surveillance, Phylogenetics, Variant Network Analysis
Following its onset in December 2019, several cases of a new respiratory illness were reported in the city of Wuhan, Hubei province, China1. The disease, named coronavirus disease 2019 (COVID-19), was later on confirmed as caused by a novel coronavirus that was subsequently called the severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2)2. On the 12th of March 2020, the ongoing SARS-CoV-2 outbreak was declared as a pandemic by the World Health Organization (WHO)3. As of June 16th 2020, there have been 8,228,025 confirmed cases with 444,442 deaths around the world (John Hopkins Center (JHC))4. Currently, Morocco has reported 8931 confirmed cases and 212 deaths associated with COVID-19 (JHC update of June 16th 2020). To gain further understanding of the molecular epidemiology of the outbreak in Morocco, we conducted a phylogenetic and a Variant Network Analyses of the full-genome sequences of 21 SARS-CoV-2 strains, 19 were isolated from Covid-19 patients in Morocco, one sequence was isolated from a patient in Melilla and another one was isolated from a Moroccan patient in Cadiz, Andalusia.
The Morocco related-sequences used in this study have been deposited in the GISAID database by the Laboratory of the Royal Gendarmerie of Morocco (LRAM) (one sequence), the Pasteur Institute of Morocco (IPM) (17 sequences), the Anoual Laboratory in Casablanca (one sequence), and the SeqCovid Spanish Project (two sequences). The sequences were collected from the GISAID database and analyzed in comparison with 500 selected genomes, that were collected between the 15th of February 2020 (2 weeks before the first detected case in Morocco) and the 30th of March 2020 (2 weeks after the international borders of Morocco were closed; lockdown). Only high-quality sequences have been included in this study. All sequences and metadata are provided as Extended data5 and can be found at the GitHub repository https://github.com/covidmor/covid19_morocco.
The Moroccan sequences were mapped to the Wuhan reference genome (NC_045512) using Bowtie 2.3.5.16. Variants were called using mpileup 1.7 and bcftools 1.97. The variants were annotated using the China National Center for Bioinformation Annotator. Variant network analysis was performed using Gephi 0.9.2.
A total of 500 sequences from different countries were retrieved from the GISAID database (www.gisaid.org) and aligned using Muscle 3.8.1551 multiple sequence alignment8; see Extended data, metadata5 for details on each. A maximum likelihood model was created using RaxML 8.2.129 with a bootstrap of 100 using the Wuhan reference genome (NC_045512) as outgroup. The phylogenetic tree was generated using Figtree 1.4.410.
To put the complete genomes of SARS-CoV-2 that were isolated from 19 Moroccan patients, a patient from Melilla and a Moroccan patient from Cadiz (Andalusia, Spain), into the context of the global pandemic, they were aligned together with the dataset of 500 SARS-CoV-2 complete genomes from different countries. Extended data, Figure 15 shows the estimated maximum likelihood phylogeny. To have a good presentation of the phylogeny, we created a circular presentation subset of 120 sequences from our dataset (collected from 18th of February 2020 (15 days before the first Moroccan case) to 15th of March 2020 (Date of closing international borders in Morocco)) (Figure 1b).
The extended phylogenetic tree is subdivided into seven clades that correspond to the main seven SARS-CoV-2 strain types GR (red), GH (green), G (yellow), S (pink), V (magenta), L (light gray) and O (others; dark gray). The four strain types S, G, GR, and GH are scattered all over the world (Extended data, Figure 2)5. According to GISAID (update of the 12th of June 2020), they made their first appearance in February 2020, being G the strain type that gave origin to the GH and GR ones, while the S strain was mainly found in Spain11.
Zooming into the trees (Extended data, Figure 1)5, Figure 1a shows how the Moroccan isolates group into four independent clusters. In the GR clade, the sequences GISAID IDs: EPI_ISL_459973, EPI_ISL_459966 and EPI_ISL_459965 (the first infected patient in Morocco) cluster together with sequences from Italy (GISAID ID: EPI_ISL_417922), Thailand (GISAID ID: EPI_ISL_430837), Mexico (GISAID ID: EPI_ISL_452139), and Iceland (GISAID ID: EPI_ISL_417829) (Figure 1a: A1–3). Interestingly, the hosts of these four sequences were all travelling in Italy during the same period (Extended data, Metadata)5. This suggests that the first Moroccan patient (GISAID: EPI_ISP_459965), that returned to Morocco from recent travel to Bergamo, Lombardy (the most affected region in Italy), and the three other individuals from Mexico, Iceland and Thailand (GISAID: EPI_ISL_452139, EPI_ISL_417829 and EPI_ISL_430837, respectively) were probably infected by the same person in Italy (GISAID: EPI_ISL_417922). From this cluster, we infer, based on the sample collection dates (Extended data, Metadata)5, that the first Moroccan patient (GISAID: EPI_ISP_459965) probably infected the two other Moroccan patients whose infecting virus sequences are found in the same cluster (GISAID: EPI_ISL_459973 and EPI_ISL_459966). The Moroccan sequences (GISAID IDs: EPI_ISL_458150, EPI_ISL_459980 and EPI_ISL_459984) that cluster together in the GR clade, are close to sequences from Iceland (GISAID IDs: EPI_ISL_417861, EPI_ISL_417863 and EPI_ISL_417852) whose carriers have travel history to Italy. Furthermore, all these three Icelandic sequences are related to one Italian sequence (GISAID ID: EPI_ISL_460081). For their part, the Moroccan sequences EPI_ISL_459983, EPI_ISL_459977 and EPI_ISL_459978 cluster with another sequence from Italy (GISAID ID: EPI_ISL_452186). Surprisingly, many sequences from the GR clade were isolated from patients with known travel history to Italy (Figure 1a: A1–3). Therefore, we can conclude that all the viral sequences in this clade, where 9 Moroccan viral sequences clustered, were of Italian origin.
The four Moroccan sequences that cluster together within the G type strain (GISAID IDs: EPI_ISL_459968, EPI_ISL_459974, EPI_ISL_459972 and EPI_ISL_459981) were close to French sequences (GISAID ID: EPI_ISL_417333, EPI_ISL_443270 and EPI_ISL_416752). These four Moroccan sequences were collected the 17th of March, the 20th of March, the 20th of March and the 19th of April 2020, respectively. Interestingly, the third Covid-19 case in Morocco was a French male tourist, in his 30s that arrived in Morocco on the 7th of March 2020 as reported by the Moroccan Health authorities. He, therefore, was most likely the source of the Moroccan viral strains that belong to the G type. The case reported on the 19th of April 2020 (sequence with the GISAID ID: EPI_ISL_416752) seems to be the result of a local community transmission.
The Moroccan strain isolated from Ouarzazate (South of Morocco) (GISAID ID: EPI_ISL_451400) is close to the Portuguese sequences (GISAID ID: EPI_ISL_413648 and EPI_ISL_453947) in the G type suggesting a third introduction path to Morocco from Portugal. For their part, the Moroccan sequence (GISAID ID: EPI_ISL_459967), that also belongs to the G type, is close to the Spanish sequences (GISAID IDs: EPI_ISL_455349 and EPI_ISL_455332) making Spain another point of introduction to Morocco. Furthermore, the Moroccan sequence (GISAID ID: EPI_ISL_459975) is closely related to the Italian G strain (GISAID: EPI_ISL_417921), which represents another Italian introduction to Morocco. It’s also similar to a sequence from Iceland (GISAID ID: EPI_ISL_417730), where the host patient has travel history to Italy (Figure 1a: E) and could have been either infected in Italy or by another asymptomatic carrier coming from there.
Three other viral sequences isolated from Moroccan patients (GISAID IDs: EPI_ISL_459976, EPI_ISL_459982 and EPI_ISL_459979) appear in the clade of the GH strain types where they cluster with French sequences (GISAID IDs: EPI_ISL_416748 and EPI_ISL_416501) (Figure 1a: B) suggesting a further introduction to Morocco, once more from France.
The viral sequence isolated from a patient in Melilla (GISAID ID: EPI_ISL_455344) is clustered within the G type, with Spanish sequences (GISAID IDs: EPI_ISL_419235, EPI_ISL_450337 and EPI_ISL_419236). While the sequence isolated from a Moroccan patient in Cadiz, Andalusia (GISAID ID: EPI_ISL_452463) (Figure 1a: G) that belongs to the S type, also clusters with Spanish sequences; however, this sequence is quite far from the other Spanish related sequences. Thus, supporting the apparent neutrality of the host’s genetics for the transmission and potential selection on the type of strain.
Finally, similarity analysis of the Moroccan sequence (GISAID ID: EPI_ISL_467299) collected on May 21st, 2020 and sequenced by the LRAM Laboratory, is close to the first identified French cluster, suggesting that the virus strain circulating in Morocco did not experience any major mutations and, once again, confirms the local community-based transmission of the virus. This sequence was not included in the overall analysis, as it was deposited when the writing of this work was finalizing.
Thus, the phylogenetic analysis shows that the virus outbreak in Morocco was likely the result of multiple introductions. We can highlight four independent introductions to Morocco from Italy, France, Spain, and Portugal. This finding has obvious implications for the epidemiological tracing of the pathogenic agent’s initial introduction that caused the current outbreak in Morocco. Generally, the viral sequences isolated from Moroccan patients showed close relationships primarily to European strains. The geographic nearness and tourist and migratory connections, therefore, play a key role in the spread of the virus. We caution that further analyses are needed to evaluate the statistical robustness of the interference suggested herein.
To further understand the evolution of the SARS-CoV-2 virus within the Moroccan population and trace the infection pathways, we performed a Variant Network Analysis using Gephi. We used the complete sequences of the 21 SARS-CoV-2 genomes, 19 from Morocco, and two sequences from Melilla and Andalucia (Figure 2). The variant network in Figure 2 shows three main clusters, one isolated sequence from Portugal and another one from Cadiz sharing no variants with the other sequences. The first cluster comprising the following viral strains, all closely related to each other and diagnosed in Casablanca (west Morocco) (GISAID ID: EPI_ISL_458150), IPM_1 (GISAID ID: EPI_ISL_459965), IPM_2 (GISAID ID: EPI_ISL_459966), IPM_13 (GISAID ID: EPI_ISL_459980), IPM_15 (GISAID ID: EPI_ISL_459978), IPM_7 (GISAID ID: EPI_ISL_459973), IPM_16 (GISAID ID: EPI_ISL_459977), IPM_10 (GISAID ID: EPI_ISL_459984) and IPM_17 (GISAID ID: EPI_ISL_459983) were closely related to each other. This is in agreement with the phylogenetic tree where all these sequences belong to the GR clade and appear to have been originated from Italy (Figure 1 and Extended data, Figure 1)5. The second cluster (Blue) made out of the sequences labeled IPM_5 (GISAID ID: EPI_ISL_459974), IPM_6 (GISAID ID: EPI_ISL_459968), IPM_8 (GISAID ID: EPI_ISL_459972) and IPM_14 (GISAID ID: EPI_ISL_459981) belong to the clade G and has a French sequence (GISAID ID: EPI_ISL_418222) as a root in the phylogenetic tree. The sequences IPM_4 (GISAID ID: EPI_ISL_459976), IPM_11 (GISAID ID: EPI_ISL_459982) and IPM_12 (GISAID ID: EPI_ISL_459979) belong to the GH clade and seem to have originated from another French sequence (GISAID: EPS_ISL_418219). Interestingly, this sequence belongs to the G clade and is close to a strain that was isolated from an Icelandic patient that was most probably infected during his stay in Italy (GISAID ID: EPI_ISL_417730). Another separated Moroccan sequence, labelled IPM_3 (GISAID ID: EPI_ISL_459967), is closer to Spanish sequences (GISAID ID: EPI_ISL_455349, EPI_ISL_455332). The sequence of the viral strain from Ouarzazate (GISAID ID: EPI_ISL_451400) is closely related to a Portuguese sequence (GISAID ID: EPI_ISL_413648), just as shown in the phylogenetic tree. Yet, all these sequences belong to the G clade, which has an Italian sequence as a root. Hence, the Variant Network Analysis is in accordance with the phylogenetic tree and further supports our deduction of multiple SARS-CoV-2 introductions to Morocco from at least four European countries. Thus, while geography marks the variation that the virus undergoes in a way that we can identify geographically separated strains, population interconnection, via travel and migratory flows, seems more important than the geographical nearness for the worldwide spread of the virus.
To further test the multiple introductions to Morocco from the aforementioned four European countries, we performed a Variant Network analysis using all the Moroccan sequences as one group and compared them with randomly selected sequences from different countries (Italy, France, Spain and Portugal, in addition to Austria, Australia, and Brazil). These results show that the sequences detected during the early stages of the epidemic in Moroccan share variants with Italian (seven variants), French (four), Portuguese (three), and Spanish (three) strains. They also share one variant with strains from Austria, one with Brazil and another one with Australia (Figure 3). Interestingly, there was one shared variant between all countries located in the Spike gene (23403A>G) which induces the amino acid change D614G; it began spreading in Europe in early February and, when introduced to new regions, it rapidly became the dominant form. This mutation is suspected to increase the transmissibility of the virus12.
The genome-wide SNPs and the corresponding amino-acid positions and variations of the virus proteins are described in Table 1. Our results showed that all 20 sequences including the sequence coming from the Melilla patient, shared four mutations in common and 13 novel mutations exclusively present in the 19 sequences isolated in Morocco (Table 2). The four common mutations happened to be the same recurrent mutations described in several reports13. The mutation in the leader sequence (241C>T) is one of the most common mutations in the SARS-CoV-2 genome; it affects an important genomic site for discontinuous sub-genomic replication. The mutation in the 5’UTR genomic region co-evolved with two other mutations, 14408C>T and 23403A>G. They both affect critical RNA replication proteins (241C>T, 14408C>T) and the ACE2 receptor binding protein S (23403A>G). We noticed that these four mutations are prevalent in the virus isolates from Europe, where the infections seem to be more severe. In fact, Yin et al.14 suggested that these mutations could increase the transmissibility of the virus. The sharing of these mutations between European and Moroccan isolates further indicates that strains circulating in Morocco came mainly from Europe. Moreover, at least one variant of the Moroccan sequences of SARS-CoV-2 is shared with one of the other countries (Table 3). The mutations found in the sequences identified in Morocco, and the identification of amino acid change, should be further investigated in order to understand whether they affect the transmission and clinical characteristics of the SARS-CoV-2 virus circulating in Morocco.
Mutation | Position | Genomic region | Type of mutation |
---|---|---|---|
C>T | 241 | 5’UTR | upstream_gene_variant |
C>T | 3037 | ORF1ab | Synonymous_variant |
C>T | 14408 | ORF1ab | Missense_variant |
A>G | 23403 | Gene S | Missense_variant |
Phylogenetic and variant network analyses of SARS-CoV-2 sequences from early patients with COVID-19 in Morocco showed multiple spatiotemporal introductions, primarily from Italy (ten), France (seven), Spain (one) and Portugal (one). The results also provide evidence for early community-based transmission. A variant calling analysis allowed us to catalogue new mutations in SARS-CoV-2 isolates from Morocco. Interestingly, the recurrent missense variant A>G at position 23,403 in the spike gene known to be associated with virus severity has been identified in all Moroccan isolates. However, the lack of demographic and clinical information on most of the sequences of the Moroccan isolates deposited in the database prevented us from inferring the potential link between the mutations and clinical effects of the strains. Also, the number of Moroccan genomes available at the time of this analysis hampers robust statistical inferences. Still, the results of the present work offer interesting insights into how the virus got into Morocco and primary lessons as to understanding the dynamics of the initial introductions and local transmission of SARS-CoV-2 in Morocco. This being a global pandemic, the conclusions of this study, taken together with those of recent similar studies such as those done in Egypt15, South Africa16 and Brazil11, might be applicable to other developing countries. The results of the methodology of genomic surveillance that we developed, and the methodology itself would be helpful in the case of resurgence or the emergence of any new pandemic-causing pathogen. Integrative analysis of SARS-CoV-2 should promote our understanding of the virus dynamics and interactions with hosts and environments. The method can even be further extended to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens17.
Sequences analyzed in this study were downloaded from the GISAID database. The identity of each sequence is given in the Extended data, Metadata file5.
Zenodo: Genomic evidence of multiple SARS-CoV-2 introductions into Morocco. http://doi.org/10.5281/zenodo.39094635.
This project contains the following extended data:
Extended data Figure 1 (PDF). (Full size image of the phylogenetic tree seen in Figure 1.)
Extended data Figure 2 (PNG). (Full genome tree derived from all outbreak sequences.)
Extended data Labels (PDF). (Labels of the Moroccan sequences)
Extended data Metadata (PDF). (Metadata for the sequences assessed in this study.)
All supplementary material and figures can be found at GitHub: https://github.com/covidmor/covid19_morocco.
Extended data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Phylogenetics, population genetics, molecular evolution, bioinformatics, infectious disease
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 24 Aug 20 |
read | |
Version 1 06 Jul 20 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)