Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Phylogenetic analysis of 17271 Indian SARS-CoV-2 genomes to identify temporal and spatial hotspot mutations

  • Nimisha Ghosh ,

    Contributed equally to this work with: Nimisha Ghosh, Suman Nandi, Indrajit Saha

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Software, Validation, Writing – original draft

    Affiliations Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland, Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India

  • Suman Nandi ,

    Contributed equally to this work with: Nimisha Ghosh, Suman Nandi, Indrajit Saha

    Roles Conceptualization, Formal analysis, Software, Validation, Visualization, Writing – review & editing

    Affiliation Department of Computer Science and Engineering, National Institute of Technical Teachers’ Training and Research, Kolkata, West Bengal, India

  • Indrajit Saha

    Contributed equally to this work with: Nimisha Ghosh, Suman Nandi, Indrajit Saha

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing

    indrajit@nitttrkol.ac.in

    Affiliation Department of Computer Science and Engineering, National Institute of Technical Teachers’ Training and Research, Kolkata, West Bengal, India

Abstract

The second wave of SARS-CoV-2 has hit India hard and though the vaccination drive has started, moderate number of COVID affected patients is still present in the country, thereby leading to the analysis of the evolving virus strains. In this regard, multiple sequence alignment of 17271 Indian SARS-CoV-2 sequences is performed using MAFFT followed by their phylogenetic analysis using Nextstrain. Subsequently, mutation points as SNPs are identified by Nextstrain. Thereafter, from the aligned sequences temporal and spatial analysis are carried out to identify top 10 hotspot mutations in the coding regions based on entropy. Finally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. For both temporal and spatial analysis, there are 21 non-synonymous hotspot mutations which are unstable and damaging.

Introduction

It is now close to two years since the emergence of SARS-CoV-2, the virus behind the deadly COVID-19 disease and the scientific community is still struggling to put an end to this pandemic. Though India was able to contain the spread in the first wave, the second wave put the entire system in turmoil. In September 2021, around 30,000 https://www.covid19india.org/ cases were being registered on a daily basis while in the month of May, this figure surpassed 300,000. Scientists and researchers had attributed this surge due to the evolution of this contagious virus which has resulted in Delta (B.1.617.2) variant. Though the vaccination drive in India is in full swing, doubts regarding the efficacy of the vaccine against such mutations cannot be undermined. Apart from Delta, other variants of concern as declared by W.H.O making their rounds are Alpha (B.1.1.7) [1], Beta (B.1.351) [2] and Gamma (P.1) [3] variants. All these variants, especially Delta resulted in new spurts of lockdown in the country. Thus, to understand its frequent mutations, a study pertaining to the evolution of SARS-CoV-2 virus is inevitable [4, 5].

To understand these evolutionary mutations, 103 SARS-CoV-2 sequences have been analysed by Tang et al. [6] which revealed two major lineages, L and S. These lineages are defined by two tightly linked SNPs at positions at 28144 (ORF8: C251T, S84L) and 8782 (orf1ab:T8517C, synonymous) and might influence virus pathogenesis. Raghav et al. [7] have used RTIC primers–based amplicon sequencing to profile 225 Indian SARS-CoV-2 sequences. Their analysis showed that apart from local transmission, Europe and Southeast Asia are the two major routes for introduction of the disease in India. Their study also revealed that D614G in the Spike protein as a very common mutation that increases virus shedding and infectivity. In [8], Wang et al. have proposed a h-index mutation ratio criteria to evaluate the non-conserved and conserved proteins with the help of over 15K sequences. As a result, Nucleocapsid, Spike and Papain-like protease are found to be highly non-conserved while Envelope, main protease, and Endoribonuclease protein are considered to be conservative. They have further identified mutations on 40% of nucleotides in Nucleocapsid gene, thereby reducing the efforts on the ongoing development of various COVID-19 diagnosis and cure which targets Nucleocapsid gene. Similar analysis conducted by Yuan et al. [9] with 11183 sequences revealed 119 high frequency substitutions as SNPs around the globe. Among the nucleotide changes in SNPs, C to T is the major one which indicates adaptation and evolution of the virus in the human host which can pose new challenges. Also, they have found Nucleocapsid to have the highest mutational changes in frequency. Thus both the works by Wang et al. [8] and Yuan et al. [9] refute the claim by Ascoli [10] that Nucleocapsid can be a possible diagnostic target. Thus, it is important to understand the evolution of SARS-CoV-2 over time. Cheng et al. [11] have identified five major mutation points such as C28144T, C14408T, A23403G, T8782C and C3037T in almost all strains for the month of April 2020. Their functional analysis show that these mutations lead to a decrease in protein stability and eventually a reduction in the virulence of SARS-CoV-2 while A23403G mutation increases the Spike-ACE2 interaction leading to an increase in its infectivity. Phylogenetic analysis done by Maitra et al. [12] shows that mutations such as C14408T in RdRp and A23403G in Spike majorly encompass A2a clade in 9 Indian sequences. Moreover, a triplet based mutation such as 2881–3 GGG/AAC in Nucleocapsid gene which might be responsible for affecting miRNAs bindings to original sequences has also been reported in their work. Guruprasad et al. [13] has analysed 10333 spike protein sequences out of which 8155 proteins comprised of one or more mutations, leading to a total of 9654 mutations that correspond to 400 distinct mutation sites. According to this analysis the top 10 mutations according to the total number of occurrences are D614 (7859), L5 (109), L54 (105), P1263 (61), P681 (51), S477 (47), T859 (30), S221 (28), V483 (28) and A845 (24). Other important works like [1417] have also revealed different mutations after analysis of several SARS-CoV-2 sequences. Looking at these varied mutations as reported by all the aforementioned works, it can be easily concluded that the evolutionary study of SARS-CoV-2 genomes is very relevant in the current pandemic scenario of the ongoing waves in India.

Motivated by the aforementioned studies, in this work we have performed multiple sequence alignment (MSA) of 17271 Indian SARS-CoV-2 genomes using multiple alignment using fast fourier transform (MAFFT) [18] followed by their phylogenetic analysis using Nextstrain [19] to eventually identify hotspot mutations both month-wise (temporal) and state-wise (spatial). Thereafter, from the aligned sequences, temporal and spatial analysis are carried out to identify top 10 hotspot mutations in the coding regions based on entropy, thereby resulting in 130 and 250 hotspot mutations respectively. Finally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. The hotspot mutations which are unstable and damaging and common in both the categories are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. Furthermore, as recognised by virologists, E484K in Spike which is identified in temporal analysis is yet another major mutation which is responsible for improving the ability of the virus to escape the host’s immune system [20].

Material and methods

In this section, the dataset collection for the 17271 Indian SARS-CoV-2 genomes are discussed along with the proposed pipeline.

Data acquisition

To perform the multiple sequence alignment and phylogenetic analysis, 17271 Indian SARS-CoV-2 genomes are collected from Global Initiative on Sharing All Influenza Data (GISAID) https://www.gisaid.org/ and the Reference Genome (NC 045512.2) https://www.ncbi.nlm.nih.gov/nuccore/1798174254 is collected from National Center for Biotechnology Information (NCBI). The SARS-CoV-2 sequences are mostly distributed from January 2020 to September 2021 across the states of India. Moreover, for mapping the protein sequences and the subsequent changes in the amino acid, protein PDB are collected from Zhang Lab https://zhanglab.ccmb.med.umich.edu/COVID-19/. These PDBs are then used to model and identify the structural changes in the protein. All these analyses are performed on High Performance Computing facility of NITTTR, Kolkata while MATLAB R2019b is used for checking the amino acid changes.

Pipeline of the work

The pipeline of the work is provided in Fig 1. Initially, multiple sequence alignment (MSA) of 17271 Indian SARS-CoV-2 genomes is performed using MAFFT which is followed by their phylogenetic analysis using Nextstrain, thereby leading to the identification of mutation points as SNPs. In this work, MAFFT is used as the MSA tool. As MAFFT uses fast fourier transform thus, it scores over other alignment techniques. So, MAFFT is used in this work for MSA. On the other hand, by taking the advantage of Nextstrain, in this work the evolution and geographic distribution of SARS-CoV-2 genomes are visualised by creating the metadata in our High Performance Computing environment.

Once the alignment and the phylogenetic analyses are completed and the mutation points as SNPs are identified, temporal (month-wise) and spatial (state-wise) analysis are performed for the aligned sequences to identify top 10 hotspot mutations both month-wise and state-wise. Furthermore, amino acid changes in the SARS-CoV-2 proteins are also identified considering the codon table. The top 10 hotspot mutations are identified for each month and each state based on their entropy values for the coding regions and are computed as follows: (1) where represents the frequency of each residue α occurring at position β and 5 represents the four possible residues as nucleotides plus gap. Subsequently, the amino acid changes for the temporal and spatial non-synonymous hotspot mutations are visualised graphically. Finally, the amino acid changes of the non-synonymous hotspot mutations are considered to evaluate their functional characteristics and they are visualised in the respective protein structure as well.

Results

All the experiments in this work are carried out according to Fig 1. In this regard, MSA of 17271 Indian SARS-CoV-2 genomes is initially carried out using MAFFT. Thereafter, their phylogenetic analysis using Nextstrain reveals 5 virus clades viz. 19A, 19B, 20A, 20B and 20C and also the corresponding mutation points as SNPs. Subsequently, temporal (month-wise) and spatial (state-wise) analysis are performed for the aligned sequences to identify the top 10 hotspot mutations in each category, resulting in 190 and 250 mutation points respectively. The phylogenetic trees in radial and rectangular views considering temporal analysis are shown in Fig 2(a) and 2(b) while Fig 2(c) and 2(d) show the views considering spatial analysis. The normal and zoomed views of the geographical distribution of the sequences clade-wise are shown in Fig 2(e) and 2(f) respectively. In unsupervised learning feature selection is a non-trivial task; entropy of the aligned sequences is considered to be the selected feature in this work. For example, temporal analysis of January-March-2020 with 191 sequences shows that G11083T in NSP6 has the highest entropy value of 0.82391 while for spatial analysis of Maharastra with 3674 sequences, the highest entropy value of 1.02173 is borne by G28881A and G28881T in Nucleocapsid. Such results are reported in Tables 1 and 2 for the top 10 hotspot mutations for temporal and spatial analysis along with the associated details while S1 and S2 Tables in S1 File report the list of all temporal and spatial hotspot mutations. Table 2 reports the spatial analysis for the states of India. The entropy values corresponding to the nucleotide changes are shown in Fig 2(g) while the temporal and spatial changes in entropy are reported in S3 and S4 Tables in S1 File respectively. The evolution of the virus genome in terms of entropy for both temporal and spatial analysis is another crucial result reported in this work. For example, from a temporal perspective E484Q/K which is a much circulating variant in India has evolved over time but is on the wane now while for spatial analysis it can be seen that E484Q is one of the most prevalent variant in West Bengal. These evolution are visualised in Figs 3 and 4 respectively. It is to be noted that due to the lack of appropriate number of sequences, temporal data of January to March 2020 have been merged for the analysis. Please also note that non-coding regions of SARS-CoV-2 do not produce any protein to bind with human proteins. Thus, they are not considered for hotpot mutations. Moreover, since entropy calculation is performed on aligned sequences, only coding regions are considered for identification of hotspot mutations as the non-coding regions exhibit high entropy values and can be misleading while selecting such mutation points as hotspot mutations.

thumbnail
Fig 2.

Phylogenetic analysis of 17271 Indian SARS-CoV-2 Genomes where (a) and (b) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for temporal analysis, (c) and (d) show the phylogenetic tree in radial and rectangular views for 17271 Indian SARS-CoV-2 genomes for spatial analysis, (e) and (f) are the geographical distribution in normal and zoomed views and (g) shows the value of entropy for the change in nucleotide.

https://doi.org/10.1371/journal.pone.0265579.g002

thumbnail
Fig 3. Month wise (temporal) entropy of Indian SARS-CoV-2 genomes to show the changes in non-synonymous hotspot mutations.

https://doi.org/10.1371/journal.pone.0265579.g003

thumbnail
Fig 4. State wise (spatial) entropy of Indian SARS-CoV-2 genomes to show the changes in non-synonymous hotspot mutations.

https://doi.org/10.1371/journal.pone.0265579.g004

thumbnail
Table 1. List of top 10 hotspot mutations based on temporal analysis.

https://doi.org/10.1371/journal.pone.0265579.t001

thumbnail
Table 2. List of top 10 hotspot mutations based on spatial analysis.

https://doi.org/10.1371/journal.pone.0265579.t002

Once the top 10 temporal and spatial hotspot mutations are identified, thereafter, 62 and 65 unique hotspot mutations are identified respectively for each category from 190 and 250 mutation points. For temporal analysis, 62 unique mutations result in 50 non-synonymous deletions and substitutions with corresponding 8 and 48 amino acid changes while for spatial analysis 57 non-synonymous deletions and substitutions are identified from 65 unique mutations with corresponding 16 and 47 amino acid changes. These non-synonymous mutations along with their amino acid changes in protein are visualised in Fig 5. Fig 6(a) depicts the common and unique nucleotide changes for all hotspot mutations for temporal and spatial analysis in the form of Venn diagram while Fig 6(b) shows the common and unique nucleotide changes for non-synonymous hotspot mutations and the common and unique amino acid changes in protein for such analysis are visualised in Fig 6(c). Fig 6(a) shows that there are 18 and 21 unique hotspot mutations considering temporal and spatial analysis while the number of such common mutations are 44. Fig 6(b) depicts 12 and 19 unique non-synonymous hotspot mutations while 38 changes are common in both. Finally, Fig 6(c) shows that there are unique 14 and 21 amino acid changes for temporal and spatial analysis with 42 changes common in both. All the amino acid changes in the protein for the non-synonymous hotspot mutations for temporal analysis are highlighted in Fig 7 while such mutations for the spatial analysis are shown in Fig 8. Please note that though 48 and 47 substitutions corresponding to temporal and spatial analysis are reported in Figs 5 and 6, only 47 and 46 such changes are highlighted in Figs 7 and 8 respectively. This is because the structure for ORF7b is not found in the literature and thus the corresponding hotspot mutation in the structure of ORF7b cannot be highlighted in either of the cases.

thumbnail
Fig 5. Illustration of amino acid changes in SARS-CoV-2 proteins for the temporal and spatial non-synonymous hotspot mutations.

https://doi.org/10.1371/journal.pone.0265579.g005

thumbnail
Fig 6.

Venn diagrams of Indian SARS-CoV-2 Genomes to represent common (a) Nucleotide (b) Non-synonymous mutations and (c) Amino acid changes for the hotspot mutations.

https://doi.org/10.1371/journal.pone.0265579.g006

thumbnail
Fig 7.

Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on temporal analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Exon (g) Spike (h) ORF3a (i) Membrane (j) ORF8 (k) Nucleocapsid.

https://doi.org/10.1371/journal.pone.0265579.g007

thumbnail
Fig 8.

Highlighted amino acid changes in the protein structures for the non-synonymous hotspot mutations based on spatial analysis for (a) NSP2 (b) NSP3 (c) NSP4 (d) NSP6 (e) RdRp (f) Helicase (g) Spike (h) ORF3a (i) Membrane (j) ORF7a (k) ORF8 (l) Nucleocapsid.

https://doi.org/10.1371/journal.pone.0265579.g008

Discussion

India has gone through the second wave of the SARS-CoV-2 pandemic and according to experts a third wave is inevitable as the virus is evolving and new strains are being identified. Thus, the study of the evolving virus strains is very crucial in the current pandemic scenario, In this regard, we have performed temporal and spatial analysis of 17271 SARS-CoV-2 sequences which has resulted in the identification of hotspot mutation points as SNPs in each category.

Changes in protein translations which can lead to functional instability in proteins are often attributed to structural alterations in amino acid residues. In this regard, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 (Polymorphism Phenotyping) [21] while I-Mutant 2.0 [22] evaluates their structural stability. Such results for temporal and spatial analysis are reported in Tables 3 and 4 respectively. The tools used for such prediction are PolyPhen-2 and I-Mutant 2.0. The prediction of Polyphen-2 http://genetics.bwh.harvard.edu/pph2/ works with sequence, structural and phylogenetic information of a SNP while I-Mutant 2.0 https://folding.biofold.org/i-mutant/i-mutant2.0.html uses support vector machine (SVM) for the automatic prediction of protein stability changes upon single point mutations. PolyPhen-2 is used to find the damaging non-synonymous hotspot mutations while protein stabilities are determined by I-Mutant 2.0. The score generated by Polyphen-2 lies between the range of 0 to 1. A score close to 1 denotes that the mutations can be more confidently considered to be damaging. Considering the prediction of Polyphen-2, it can be seen from Table 3 that out of the 56 unique amino acid changes, 27 changes are damaging for temporal analysis while for spatial analysis as can be seen from Table 4, out of 63 unique amino acid changes, 24 changes are damaging. It is important to note that in case of protein, damaging mostly defines instability. Generally, this is used for human proteins. As a consequence, if the human protein is damaging in nature because of mutations, then the human protein-protein interactions may occur with high or low binding affinity. Now in case of virus, similar consequences may happen which means if the virus protein is damaged because of mutations, it may interact with human proteins with similar binding affinity. As a result, the virus may acquire characteristics like transmissibility, escaping antibodies [23, 24] etc.

thumbnail
Table 3. Characteristics of non-synonymous hotspot mutations for temporal analysis.

https://doi.org/10.1371/journal.pone.0265579.t003

thumbnail
Table 4. Characteristics of non-synonymous hotspot mutations for spatial analysis.

https://doi.org/10.1371/journal.pone.0265579.t004

Stability is yet another parameter which is crucial to judge the functional and structural activity of a protein. Protein stability dictates the conformational structure of the protein, thereby determining its function. Any change in protein stability may cause misfolding, degradation or aberrant conglomeration of proteins. In I-Mutant 2.0 the changes in the protein stability is predicted using free energy change values (DDG). A zero or a negative value of DDG indicates that the stability of a protein is decreasing. The result from I-mutant 2.0 infers that of the 27 and 24 unique deleterious or damaging changes for temporal and spatial analysis, 21 changes for both decrease the stability of the protein structures. The common mutations in both the categories are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. It is to be noted that, apart from these mutations, other important mutations as recognised by virologists in the multiple variants of concern like Alpha, Beta and Delta are L452R, E484K, D614G, P681H and P681R in Spike.

Furthermore, the entropy change of the hotspot mutations for the different variants like Alpha, Beta and Delta are shown in Fig 9(a)–9(c) respectively. For example, hotspot mutation E484K in Alpha variant in Fig 9(a) which was dominant in the months of February-April 2021 has declined over the next few months. Also, D614G which is a common hotspot mutation in all the variants has also declined over time. Moreover, mutations like L452R and P681R which are part of the Delta variant are also two of the hotspot mutations as identified by the analysis. It is to be noted that Delta variant was responsible for the catastrophic 2nd wave in India. Fig 10(a) and 10(b) show the plot of confirmed and deceased cases in India till 31st October 2021. For example, western part of India has a very high number of confirmed and deceased cases which can be attributed to the Delta variant. As is shown in Table 2, Maharashtra which lies in the western part of India has both of the aforementioned mutations identified as hotspots. All these figures are considered from https://www.covid19india.org/.

thumbnail
Fig 9.

Month wise evolution of (a) Alpha (B.1.1.7) (b) Beta (B.1.351) and (c) Delta (B.1.617.2) variants for non-synonymous hotspot mutations.

https://doi.org/10.1371/journal.pone.0265579.g009

thumbnail
Fig 10.

Illustration of (a) Confirmed and (b) Deceased cases of India to show the effects of SARS-CoV-2 in the different regions of the country.

https://doi.org/10.1371/journal.pone.0265579.g010

Conclusion

As the second wave of COVID pandemic had hit India really hard, understanding the evolution of SARS-CoV-2 virus is most crucial in this scenario. In this regard, temporal (month-wise) and spatial (state-wise) analysis are carried out for 17271 aligned Indian sequences to identify top 10 hotspot mutation points in the coding regions based on entropy for each month as well as for each state. Additionally, to judge the functional characteristics of all the non-synonymous hotspot mutations, their changes in proteins are evaluated as biological functions considering the sequences by using PolyPhen-2 while I-Mutant 2.0 evaluates their structural stability. As a result, for both temporal and spatial analysis, the common damaging and unstable mutations are T77A and V149A in NSP6, T95I and E484Q in Spike, Q57H and T223I in ORF3a, I82S and I82T in Membrane, D119V and F120L in ORF8, R203K, R203M and G215C in Nucleocapsid. Also, investigation of the effects of the characteristics of the hotspot mutations of SARS-CoV-2 on human hosts can be conducted with the help of virologists. The authors are working in this direction as well.

Supporting information

S1 File. This file contains 4 supplementary tables named as S1-S4.

https://doi.org/10.1371/journal.pone.0265579.s001

(PDF)

Acknowledgments

We thank all those who have contributed sequences to GISAID database.

References

  1. 1. Tang J, Toovey O, Harvey K, et al. Introduction of the South African SARS-CoV-2 variant 501Y.V2 into the UK. Journal of Infection. 2021. pmid:33472093
  2. 2. Makoni M. South Africa responds to new SARS-CoV-2 variant. The Lancet. 2021;397:267. pmid:33485437
  3. 3. Alam I, Radovanovic A, Incitti R, et al. CovMT: an interactive SARS-CoV-2 mutation tracker, with a focus on critical variants. The Lancet Infectious Diseases. 2021;21:602. pmid:33571446
  4. 4. Fountain-Jones NM, Appaw RC, Carver S, et al. Emerging phylogenetic structure of the SARS-CoV-2 pandemic. Virus Evolution. 2020;6(2). pmid:33335743
  5. 5. Tan Y, Schneider T, Shukla PK, et al. Unification and extensive diversification of M/Orf3-related ion channel proteins in coronaviruses and other nidoviruses. Virus Evolution. 2021;7(1). pmid:33692906
  6. 6. Tang X, Wu C, Li X, et al. On the origin and continuing evolution of SARS-CoV-2. National Science Review. 2020. pmid:34676127
  7. 7. Raghav S, Ghosh A, Turuk J, et al. Analysis of Indian SARS-CoV-2 Genomes Reveals Prevalence of D614G Mutation in Spike Protein Predicting an Increase in Interaction With TMPRSS2 and Virus Infectivity. Frontiers in Microbiology. 2020;11:2847. pmid:33329480
  8. 8. Wang R, Hozumi Y, Yin C, et al. Decoding SARS-CoV-2 transmission, evolution and ramification on COVID-19 diagnosis, vaccine, and medicine. Journal of Chemical Information and Modeling. 2020;XXXX.
  9. 9. Yuan F, Wang L, Fang Y, et al. Global SNP analysis of 11,183 SARS-CoV-2 strains reveals high genetic diversity. Transboundary and Emerging Diseases. 2020. pmid:33207070
  10. 10. Ascoli C. Could mutations of SARS-CoV-2 suppress diagnostic detection? Nature Biotechnology. 2021;39:1–2. pmid:33603204
  11. 11. Cheng L, Han X, Zhu Z, et al. Functional alterations caused by mutations reflect evolutionary trends of SARS-CoV-2. Briefings in Bioinformatics. 2021; p. 1–9. pmid:33580783
  12. 12. Maitra A, Sarkar M, Raheja H, et al. Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility. Journal of Biosciences. 2020;45. pmid:32515358
  13. 13. Guruprasad L. Human SARS CoV-2 spike protein mutations. Proteins: Structure, Function, and Bioinformatics. 2021;89(5):569–576. pmid:33423311
  14. 14. Weber S, Ramirez C, Doerfler W. Signal hotspot mutations in SARS-CoV-2 genomes evolve as the virus spreads and actively replicates in different parts of the world. Virus Research. 2020;289:198170. pmid:32979477
  15. 15. Nagy A, Pongor S, Győrffy B. Different mutations in SARS-CoV-2 associate with severe and mild outcome. International Journal of Antimicrobial Agents. 2020;57:106272. pmid:33347989
  16. 16. Saha I, Ghosh N, Pradhan A, et al. Whole genome analysis of more than 10000 SARS-CoV-2 virus unveils global genetic diversity and target region of NSP6. Briefings in Bioinformatics. 2021;22(2):1106–1121. pmid:33725111
  17. 17. Sarkar R, Mitra S, Chandra P, et al. Comprehensive analysis of genomic diversity of SARS-CoV-2 in different geographic regions of India: an endeavour to classify Indian SARS-CoV-2 strains on the basis of co-existing mutations. Archives of Virology. 2021;166(3):801–812. pmid:33464421
  18. 18. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform. Nucleic Acids Research. 2002;30(14):3059–3066. pmid:12136088
  19. 19. Hadfield J, Megill C, Bell S, et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics (Oxford, England). 2018;34. pmid:29790939
  20. 20. Jangra S, Ye C, Rathnasinghe R, et al. SARS-CoV-2 spike E484K mutation reduces antibody neutralisation. The Lancet. 2021;2(7):E283–E284. pmid:33846703
  21. 21. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nature methods. 2010;7(4):248–249. pmid:20354512
  22. 22. Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acid Research. 2005;33:306–310.
  23. 23. Alenquer M, Ferreira F, Lousa D, et al. Signatures in SARS-CoV-2 spike protein conferring escape to neutralizing antibodies. PLOS Pathogens. 2021;17(8):e1009772. pmid:34352039
  24. 24. Harvey WT, Carabelli AM, Jackson B, et al. SARS-CoV-2 variants, spike mutations and immune escape. Nature Reviews Micrbiology. 2021;19:409–424. pmid:34075212