Abstract
Since the onset of the pandemic, many SARS-CoV-2 variants have emerged, exhibiting substantial evolution in the virus’ spike protein1, the main target of neutralizing antibodies2. A plausible hypothesis proposes that the virus evolves to evade antibody-mediated neutralization (vaccine- or infection-induced) to maximize its ability to infect an immunologically experienced population1,3. Because viral infection induces neutralizing antibodies, viral evolution may thus navigate on a dynamic immune landscape that is shaped by local infection history. Here we developed a comprehensive mechanistic model, incorporating deep mutational scanning data4,5, antibody pharmacokinetics and regional genomic surveillance data, to predict the variant-specific relative number of susceptible individuals over time. We show that this quantity precisely matched historical variant dynamics, predicted future variant dynamics and explained global differences in variant dynamics. Our work strongly suggests that the ongoing pandemic continues to shape variant-specific population immunity, which determines a variant’s ability to transmit, thus defining variant fitness. The model can be applied to any region by utilizing local genomic surveillance data, allows contextualizing risk assessment of variants and provides information for vaccine design.
Similar content being viewed by others
Main
SARS-CoV-2 is still spreading at high rates6, with thousands of patients in intensive care units and reported fatalities monthly7, notwithstanding long-term effects8. Although approximately 775 million cases have been reported so far9, true numbers are magnitudes larger, and it is reasonable to assume that almost the entire world population has been exposed to viral antigen so far10.
Since the onset of the pandemic, SARS-CoV-2 has evolved substantially in the spike protein1, which harbours all epitope sites for neutralizing antibodies elicited by vaccines or infection2. For example, the Omicron lineages BA.1 and BA.2 contained multiple alterations in spike epitopes, allowing them to infect vaccinated and convalescent individuals11,12,13,14, who represented most of the population at the time. Although inter-country differences in viral evolution existed early in the pandemic15, the geographical variation of emerging Omicron sublineages reflects an increasing complexity of the global immunological landscape, in which the course of infection waves with new (sub)variants in a particular region could be substantially influenced by the infection history of that region (that is, which variants dominated the preceding waves and at what time). This circumstance poses challenges in optimizing the design of adapted mRNA vaccines to protect vulnerable groups or heavily exposed individuals. Despite the appreciation of the problem16 and the abundance of rich data sources for SARS-CoV-2, there has been limited progress in integrating the available data to inform variant risk assessment and vaccine design.
In this work, we reason that SARS-CoV-2 evolution is driven by infection history and the ability of elicited humoral immunity to cross-neutralize emerging variants. To this end, we conjecture that the variant-specific relative number of susceptible individuals predicts the relative fitness (and evolution) of SARS-CoV-2 in a region of interest, over time. To calculate this quantity, we integrated available data as summarized in Fig. 1: deep mutational scanning (DMS) data5,17 identified epitopes of neutralizing (spike-targeting) antibodies, which, when combined with lineage-specific spike alteration profiles, allowed us to compute cross-neutralization between any pair of variants. By integrating antibody pharmacokinetics, we could then compute how much and how long a recovery from a recent infection with a variant will protect against another variant. Last, we reconstructed the infection timeline in the region of interest and combined this timeline with historical SARS-CoV-2 variant dynamics to estimate the expected number of susceptible individuals for each variant. If SARS-CoV-2 evolution was decisively driven by population immunity, the variant-specific number of susceptible individuals should be directly proportional to the variant-specific transmission rate, and hence our model would allow us to estimate the competitive growth (dis-) advantage (that is, the relative fitness) of any given variant.
DMS data are used to define fold resistance to cross-neutralization (FR; that is, factor of decreased antibody potency), conferred by mutational differences between lineages at amino acid positions relevant to antibodies of an epitope class (top right). This information is combined with the mutational profiles of the distinct lineages (top panel) to compute ‘fold resistance maps’ to cross-neutralization between any pair of lineages (x, y) for each epitope class (left, second panel from top). Asterisks (for example, XBB.1*) indicate spike pseudo groups; that is, the group of lineages with identical mutations in the spike protein (for example, identical to XBB.1, in the case of XBB.1*). Antibody pharmacokinetics (right, second panel from the top) after antigen exposure are then combined with the fold resistance maps to compute the temporal profiles of virus cross-neutralization between any pair (x, y) of immunity-inducing variant x and another variant y. Finally, these variant-resolved immune waning dynamics are combined with the timeline of infection (lower right panels) with distinct variants in any region of interest, to compute the relative number of susceptible individuals γy(t) (lower left) for each variant over time. The relative number of susceptible individuals γy(t) indicates the competitive fitness advantage of each variant: if γy(t) > 0, variant y has the potential to spread, and if γy(t) < 0, the variant will decrease in frequency and eventually vanish.
After introducing the main methodological concepts, we will first demonstrate our analytical approach using data from the German COVID-19 epidemic, followed by an application to international datasets. Finally, we will highlight regional differences in the immune landscape and their impact on variant dynamics.
Utilization of DMS data
We utilized DMS data to compute cross-neutralization profiles for each possible pair of ‘antibody-inducing’ variant x and cross-neutralized variant y (upper panels in Fig. 1). In the original DMS data, ‘escape fractions’ were assigned to all sites in the receptor-binding domain (RBD) region for a large panel of 836 antibodies4,5,17, which were aggregated into 10 epitope classes18 as highlighted in Fig. 2a,b (details on antibodies and classes in Supplementary Tables 1 and 2). We processed the original DMS data to derive ‘fold resistances’ to cross-neutralization (Extended Data Fig. 1a–c) between antibody-inducing and antigen-presenting lineages (Fig. 2a and Methods). Intuitively, fold resistance values denote how much more antibody is needed to neutralize mutant virus y to the same extent as antibody-inducing lineage x. A comparison between fold resistances obtained through our method from DMS data with fold resistances obtained through neutralization assays yielded overall good agreement for monoclonal sera (Extended Data Fig. 1d), as well as polyclonal sera (Extended Data Fig. 1e–g). Next, we computed a fold resistance for each epitope class and for each pair of antibody-inducing lineage x and cross-neutralized lineage y on the basis of their genetic profiles and the DMS data. Fold resistances for three representative epitope classes and a set of relevant Omicron lineages are shown in Fig. 2c.
a, Fold resistance induced by mutational differences against antibodies of distinct epitope classes at indicated sites. b, Relative potency IC50(DMS) of antibodies of specific epitope classes in the DMS data. The solid horizontal lines show the respective average IC50(DMS). The dotted horizontal line marks the average across all epitope classes. c, Fold resistance to neutralization against immunity-inducing variants (on y axis) for antibodies of epitope classes A, B and C. Asterisks indicate spike pseudo groups; that is, lineages with identical mutations in the spike protein. d, Predicted neutralization probability of the Delta variant (blue range, left panel) and the Omicron variant (BA.1) (orange range, right panel) after exposure to the Wuhan-Hu-1 antigen PNeut(t, Wuhan-Hu-1, Delta). Ranges depict minimum–maximum ranges resulting from pharmacokinetic variability. The corresponding clinical vaccine efficacy is also indicated (blue and orange markers, central estimate; vertical line, 95% confidence interval; horizontal line, range of time post vaccination as stated in the original study; see Supplementary Table 5).
Protection against viral infection
Although cross-neutralization potency can qualitatively describe the antigenic overlap between variants, the ability to neutralize the virus (and prevent infection) depends on the neutralizing antibody concentrations at the time of viral re-exposure19.
Neutralizing antibody levels rise within 1 to 2 weeks after initial antigen exposure and slowly decay afterwards. We parameterized a simple pharmacokinetic model to capture the antibody concentration–time profile after initial antigen exposure (Methods and Extended Data Fig. 1h). Then, on the basis of DMS data, we computed a relative weighting of antibodies belonging to different epitope classes, as shown in Fig. 2b. This data-derived weighting may reflect the accessibility of the distinct epitopes, the strength of binding or the capability to neutralize the virus when an antibody is bound. Our analysis showed that antibodies of epitope classes A, B, D2 and F3 more potently neutralize the virus, whereas antibodies of classes E3 and F1 are less potent.
On the basis of the previously computed cross-neutralization potency, antibody pharmacokinetics and relative antibody potencies, we subsequently estimated the probability of neutralizing the Delta variant (genetic profile in Supplementary Table 3) after exposure to the Wuhan-Hu-1-variant antigen (Fig. 2d; data sources in Supplementary Table 4). Assuming that neutralization probability approximates infection risk reduction, we calibrated the model by estimating its only free parameter (the average normalized half-maximal inhibitory antibody concentration \(\widehat{\,{{\rm{IC}}}_{{50}_{(x)}}}\)). After this step, no further model calibration was performed.
Notably, utilized clinical data vary considerably in the level of reported risk reduction owing to statistical limitations (few observed ‘infection events’), as well as heterogeneity in the study groups, which is a well-known phenomenon for prevention trials20. Regarding immune waning dynamics after exposure to the Wuhan-Hu-1-variant antigen, we observe that Delta can be almost completely neutralized shortly after antigen exposure. However, neutralization probabilities decrease to 50%, depending on individual antibody pharmacokinetics, within 100 to 250 days after exposure to the Wuhan-Hu-1-variant antigen (Fig. 2d).
We then applied the calibrated model to predict Omicron (BA.1) neutralization after exposure to the Wuhan-Hu-1-variant antigen. Our predictions align well with booster-vaccine efficacy data (summarized in Supplementary Table 5): our model predicts that Omicron BA.1 (genetic profile in Supplementary Table 3) is initially neutralized with 45–85% probability approximately 2 weeks after exposure to the Wuhan-Hu-1-variant antigen, with neutralizing immunity decaying rapidly and reaching about 10% infection risk reduction between 80 and 350 days post antigen exposure. Note that these predictions are also broadly in agreement with in vitro neutralization data21.
Immune and variant dynamics in Germany
Next, we utilized the developed model to estimate the immune landscape in Germany and assess whether it predicts variant dynamics. To apply the model to Germany, we integrated data from the national virus genomics surveillance, selecting randomly collected virus genomes between 1 July 2021 and 1 July 2024 (≈607,000 sequences; Supplementary Table 6) to predict immune dynamics between 1 March 2022 and 1 July 2024. The time from 1 July 2021 to 1 March 2022 served as a ‘burn-in’ phase to converge to the initial immunological landscape. The viral genomes from the German dataset belonged to 1,718 Pangolin lineages and harboured 124 and 219 alterations in the RBD and amino-terminal domain (NTD), respectively. Of these initial 1,718 lineages, 756 had unique spike profiles (‘spike pseudo-groups’). A total of 227 spike pseudo-groups occurred at a frequency greater than 1% on at least 1 calendar day (representing 1,098 pangolin lineages; Extended Data Fig. 2).
We reconstructed the timeline of infection, and subsequently estimated the variant-specific infection timeline on the basis of lineage frequencies. The variant-specific timeline of infection was then incorporated into the DMS-derived cross-neutralization and immune waning model (previous paragraphs) to reconstruct the immunological landscape in Germany (Fig. 1 (lower panels) and Methods).
To overcome increasingly unreliable case reporting data, we reconstructed SARS-CoV-2 incidence trajectories from time-stamped viral genomes using the recently developed computational tool GInPipe22 in combination with available wastewater viral load data. In principle, any unbiased incidence estimate could be used. Here we used GInPipe for the time horizon 1 July 2021 to 31 March 2023, for which substantial amounts of viral genomic sequencing data are available (≈10,000 sequences per week). We then use wastewater viral load data for the remainder of the period, during which sequencing efforts dropped massively (<100 sequences per week from April 2023; Extended Data Fig. 3). We used the temporal overlap between wastewater virus load data and GInPipe incidence reconstructions (1 June 2022 to 31 March 2023) to estimate incidences from wastewater virus loads and to cross-validate incidence predictions (Extended Data Fig. 3). As an additional validation of the incidence reconstruction method, we show in Extended Data Fig. 4 that official case numbers substantially under-report the actual infection trajectory from the COVID-19 Infection Survey of the Office for National Statistics6 in the UK and show that GInPipe estimates are coherent with Office for National Statistics data and overcome under-reporting biases.
Next we reconstructed the timeline of all lineage frequencies, as illustrated for the most abundant groups in Fig. 3a. If SARS-CoV-2 evolution is driven by population immunity, the relative fitness of a variant will be determined by the expected number of individuals who are susceptible to the variant, relative to the expected number of susceptible individuals across competing (that is, simultaneously circulating) variants. In other words, on the basis of immunity, the relative fitness will be determined by whether the variant can infect more (or fewer) individuals than the current viral population. To test this hypothesis, we computed the expected number of susceptible individuals for each variant by integrating infection history, cross-neutralization and immune waning (Fig. 1 and Methods). We calculated this immunity-driven relative fitness from our model and compared the predictions with historical changes in variant frequencies (Fig. 3a). We found a strong match between our predictions and the real-world data: our model predicts the inflection point for BA.2 in April–May 2022, for the BA.4 + BA.5 pseudo-group between July and October 2022, and for the variants BF.7, BQ.1.1, XBB.1.5, XBB.1.9 and EG.5 in December 2022, January, mid-April, summer and autumn 2023, respectively (compare areas (predictions) versus lines (data) in the insets of Fig. 3a). For the more recent variants JN.1 and KP.1, KP.2 and KP.3, the model correctly predicts the inflection points during spring and summer 2024, respectively. Moreover, the data-derived change in variant frequency and our model-predicted immunity-driven relative fitness correspond in magnitude (see Methods for a theoretical justification). Notably, availability of sequencing data in Germany changed markedly in April 2023 (vertical dashed line in Fig. 3a,b and highlighted in Extended Data Fig. 3), which strongly affected confidence intervals regarding lineage frequency.
a, Historical variant dynamics in Germany during 1 March 2022 to 1 July 2024 of spike pseudo-groups BA.2,.X BA.5.X (+ BA.4.X + BE.1.1), BF.7.X, BQ.1.1.X, XBB.1.5.X (+ EG.1.X), XBB.1.9.X, EG.5.X, JN.1.X (+ BA.2.86.X) and KP.X, where ‘.X’ denotes inclusion of the respective sublineages. Confidence intervals were computed using Wilson’s method. Insets show model-predicted relative fitness γy(t) (representing the relative number of susceptible individuals; coloured areas), with superimposed changes in frequency (πy(t − 1)/πy(t) − 1) seen in the data (solid lines). Note that the sequencing intensity changed markedly in April 2023 (vertical dashed line) as highlighted in Extended Data Fig. 3. b, Predicted variant dynamics. Left: model-predicted relative growth advantage γy of emerging spike pseudo-groups XBB.1.5* (+ EG.1*), XBB.1.9.X (+ EG.1.3), XBB.1.16.X, EG.5.1* and BA.2.86 (colours in key) using data until 15 April 2023 (asterisks (for example, EG.1*) indicate spike pseudo groups; that is, the group of lineages with identical mutations in the spike protein). Note that BA.2.86 was first detected on 24 July 2023 in Denmark and represents the dominating lineage (together with daughter lineage JN.1 globally, as of January 2024). Thick lines show average values, and light lines show minimum–maximum ranges resulting from pharmacokinetic variability. Right: data-derived lineage frequencies in the time after the prediction horizon (15 April 2023 to 27 July 2023). Confidence intervals (95%) were calculated using Wilson’s method with sample size n = 2,919, 500, 165 and 164 for April, May, June and July, respectively.
We next evaluated whether our model could forecast variant dynamics by using a dataset that ends on 16 April 2023. On the basis of this dataset, we evaluated the immunity-mediated relative growth advantage of the circulating variants XBB.1.5 (+ EG.1), XBB.1.9 (+ EG.1.3), XBB.1.16, emerging variant EG.5.1 and variant BA.2.86 (‘pirola’), which was detected for the first time on 24 July 2023 (3 months after the dataset ended). We visualized these predictions together with the actual variant dynamics observed after 16 April 2023 in Germany in Fig. 3b. Our predictions indicate that XBB.1.5 and XBB.1.9 had no growth disadvantage by mid-April 2023, XBB.1.16 had a slight growth advantage and EG.5.1 had a substantial growth advantage. These forecasts precisely match the actual variant growth dynamics in Germany during April–July 2023: XBB.1.5 and XBB.1.9 had reached their inflection points and declined subsequently from 35% and 20%, respectively, to 5%, XBB.1.16 slightly increased from 2% to 10%, and EG.5.1 substantially increased from <1% to >30%. The not-yet-emerged variant BA.2.86 had a model-predicted growth advantage that is slightly lower than EG.5.1, but with an increasing trend in contrast to all other lineages. Notably, BA.2.86 and its daughter lineage JN.1 dominated in Germany and globally by January 2024.
Global immune and variant dynamics
We next evaluated our model to predict the adaptive immune landscape for 11 additional countries (Australia, Brazil, Canada, Denmark, France, Japan, Mexico, South Africa, Sweden, the UK and the USA). Essentially, we changed only the input of our model to the respective local infection history. In terms of data, this required only local genomic surveillance data, which we retrieved from GISAID (the Global Initiative on Sharing All Influenza Data; summary of data in Supplementary Table 6). We then used GInPipe22 to reconstruct the incidence timeline as shown in Extended Data Fig. 5, highlighting that case reporting changed over time for all countries and almost ceased at the beginning of 2023. The genomic surveillance data were then used to reconstruct lineage frequencies. From these two components (number of infections and lineage frequencies), we computed the time-dependent regional immune landscapes, which determine relative fitness of each lineage of interest. In Fig. 4, we depict lineage frequencies in the 11 countries together with model-predicted relative fitness γy for the 20 most common lineages during October 2022–October 2023. By utilizing the individual infection history in each country, the model predicts well whether a lineage is going to rise (relative fitness γy > 0) or decline (γy < 0) with an average accuracy of 0.92 (Extended Data Fig. 6a). Besides the three major lineages (BA.5, BQ.1 and XBB.1.5) that spread to some extent in all countries, our model was also able to predict the dynamics of some country-specific lineages, such as BR.2.1, XBC.1.3 and XBC.1.6 in Australia, FE.1 and GK.1 in Brazil, BN.1.X in Denmark, and HK.3 in Australia and Japan at the end of the investigated time horizon. Moreover, differences in the magnitude and duration of lineage-specific waves in the distinct countries became apparent, and the diversity of lineages increased towards the end of the investigated time horizon.
a–k, The relative abundance of major lineages when they surpass 3% relative abundance (solid lines, top) along with their model-calculated relative fitness γy(t) (bottom lines represent mean estimates, shaded areas represent minimum–maximum intervals resulting from pharmacokinetic variability) for Australia (a), Brazil (b), Canada (c), Denmark (d), France (e), Japan (f), Mexico (g), South Africa (h), Sweden (i), the UK (j) and the USA (k). Note that in Brazil, the lineages BQ.1.X include BE.9, and lineages XBB.1.5.X include XBB.1.18.1.
Infection history dictates lineage success
Next we wanted to investigate why particular lineages were successful in only some, but not all countries. For example, BA.2.12.1 dominated in the USA, with variant proportions >50% between the beginning of May 2022 and mid-June 2022, but did not reach similar dominance elsewhere. Notably, BA.2.12.1 is a daughter lineage of BA.2 that acquired the alteration 452Q. To test our hypothesis that infection history and variant-specific immunity may have determined the success of BA.2.12.1 in the USA, but not elsewhere, we used our model to calculate its immunity-driven relative fitness (Fig. 5a). We observed that although BA.2.12.1 emerged in the USA as soon as March 2022, by the time it reached Germany and Japan (May 2022), its relative fitness was already declining and reached γy < 0 soon after, which predicts that the variant would be less fit than the average viral population, and fail to invade Germany and Japan. In other words, BA.2.12.1 could not spread in Germany and Japan, because it entered the country ‘too late’ and the preceding BA.2 wave (March to June or July 2022) had created substantial cross-neutralizing immunity.
a, Dynamics of BA.2.12.1 in the USA (green dashed line) versus Germany (black dashed line) and Japan (red dashed line) along with the model-predicted relative number of BA.2.12.1 susceptible individuals (representing relative fitness γy) in the USA, Germany and Japan (green, grey and red areas, respectively). b, Dynamics of XBB.1.16 in Japan and Sweden (red and magenta dashed lines, respectively) along with the model-predicted relative fitness (red and magenta areas, respectively). c, Dynamics of BF.7 in Germany, Denmark and USA (dashed black, yellow and green lines, respectively) along with the model-predicted relative fitness (grey, yellow and green areas, respectively). d, Dynamics of endemic variant BR.2.1 in Australia (orange dashed line) versus the UK (cyan dashed line) along with the corresponding model-predicted relative fitness (orange and cyan areas, respectively). e, Model-predicted relative fitness of BA.2.86 and JN.1 in the USA, UK and Denmark (green, cyan and yellow areas, respectively) up to mid-November 2023 and variant dynamics in the following 2 months (green, cyan and yellow, respectively; error bars represent mean frequency and 95% confidence interval computed using Wilson’s method). f, Dynamics of convergently evolved lineages FE.1 in Brazil (green dashed line) versus EG.5.1 in France (purple dashed line) and predicted relative fitness (shaded areas). g, Dynamics of convergently evolved lineages GK.1 (Brazil, purple dashed line) versus HK.3 (Japan, red dashed line) along with the model-predicted relative fitness (shaded areas).
We looked at XBB.1.16, an XBB.1.5 descendent with the 478K alteration, in Japan versus Sweden, as another example. Whereas the lineage quickly spread in Japan towards frequencies of 25% within 2 months, it rose only modestly to ≤15% in Sweden. Our model precisely predicted these dynamics, indicating that XBB.1.16 had a substantial immunological advantage in Japan that dissipated by the end of April 2023. By contrast, the lineage had only a modest model-predicted fitness advantage in Sweden, which vanished around the same time (Fig. 5b). An explanation of this model prediction (and observed variant dynamics) lies in the length of the preceding XBB.1.5 wave between the two countries: in Sweden, the XBB.1.5 (+EG.1) wave was substantial, creating cross-immunity towards XBB.1.16. By contrast, the corresponding infection wave in Japan contained a mix of lineages (compare also Fig. 4f,i and Extended Data Fig. 5), which left an immunological niche to be filled by XBB.1.16 in Japan.
A notable phenomenon can be observed with strain BF.7, which spread substantially in Germany and Denmark, but not in the USA (Fig. 5c). BF.7 is a descendent of BA.5 that acquired an alteration at 346T (an alteration that occurred in several independent lineages by convergent evolution23,24). On the first appearance of BF.7, its predicted relative fitness is low in the USA, and in fact, BF.7 occurs less frequently in the USA than in Germany or Denmark. At the same time, BQ.1 is circulating in the USA, but not in Germany or Denmark. BQ.1 seems to better fill the existing immunological niche in the USA. Although BQ.1 never succeeds in these two European countries, BQ.1.1, which denotes a BQ.1 daughter lineage that acquired 346T, occurs instead, putting an end to the further spread of BF.7 (compare insets in Fig. 3a).
In Australia, lineage BR.2.1 replaced BQ.1.X from November 2022 as predicted by our model (Figs. 4a and 5d). Both lineages evolved alterations at the same positions within the RBD (339 and 486), but the respective alterations are distinct (339H and 486I in BR.2.1, and 339D and 486V in BQ.1.1). Moreover, whereas BR2.1 acquired 446S, BQ1.1 acquired an alteration at 444T. However, the main difference is that BR.2.1 emerged from BA.2.75 with five alterations in the antigenic super-site of its NTD that are not present in BA.5 descendent strains, such as BQ.1.X. Considering that a massive wave of BA.5 preceded the emergence of BR.2.1, it is not unexpected to find (and predict) that BR.2.1 would initially have a larger immunological niche than BQ.1.X, as predicted for Australia. This raises the question of whether BR.2.1 would have spread elsewhere. Our model predicted that it could have (for example, in the UK). However, only sporadic BR.2.1 imports occurred, with all infection chains ending, before the variant could overtake (Fig. 5d).
Next, we wanted to assess whether our model would predict the dynamics of the emerging lineage BA.2.86 (and JN.1) at the end of 2023. This lineage lacks many alterations that were previously acquired in convergently evolved lineages (for example, 346T, 368I, 445P, 455F, 456L and 490S), but instead harbours a set of unique alterations in the NTD and RBD. When taking the end of our investigated timeline into account (mid-November 2023), our model predicts that BA.2.86 (+JN.1) would increase in frequency in all investigated countries (illustrated in Fig. 5e for a selection of countries). As predicted by our model, BA.2.86 and JN.1 were steeply on the rise from mid-November 2023 until January 2024. Our model even predicted the magnitude of increase, for which the steepest increase is observed in the USA, followed by the UK and Denmark.
Last, we looked at convergent evolution in different regions of the globe. For example, FE.1 and EG.5.1 dominated in Brazil versus all other investigated countries. They share an almost identical RBD and NTD alteration profile, but emerged from XBB.1.18.1 versus XBB.1.9, respectively. Our model estimates nearly identical relative fitness for these two lineages in the individual countries. FE.1 rose to dominance in Brazil in May 2023 and started to be replaced by July 2023, as predicted by our model. FE.1 was never really exported in considerable quantities to Europe, the USA or South Africa, leaving the immunological niche to be filled by EG.5.1, which appeared from August onwards and rose to dominance. However, EG.5.1 could never successfully invade Brazil, as FE.1 had already created cross-immunity. Likewise, GK.3 (an XBB.1.5 descendent) rose in Brazil after the FE.1 wave. As predicted by our model, HK.3 (an EG.5 descendent) with an identical NTD and RBD profile compared to GK.3 rose in Japan and Australia after the respective EG.5.1 wave.
These numerous applications of our model highlight that we can identify whether an emerging variant may encounter an immunological niche in a particular country that favours its growth. Thus, the model may be suitable to serve as a variant alert system that can be applied to the regional infection history.
Discussion
Exposure to SARS-CoV-2 triggers adaptive immune responses, which, in simplified terms, comprise a cellular immune response (CD4+ and CD8+ T cell responses) and a humoral immune response (B cell-associated antibody production)25,26. Although T cell immunity may contribute to an overall decline in SARS-CoV-2 transmissibility and severity27,28, it may not differentiate between variants, as variants share T cell epitopes29, and T cells are often cross-reactive30. Among the produced antibodies, only those targeting epitopes in the RBD and NTD of the spike protein, the main site of viral evolution, are considered virus neutralizing31. Herein, we developed a mechanistic model, combining regional infection history and variant cross-neutralization based on RBD and NTD alteration profiles, to predict an adaptive immune landscape. This landscape describes the relative number of susceptible individuals for any SARS-CoV-2 variant, determines its relative fitness and dictates variant dynamics (Figs. 3–5).
In our study, we utilized DMS data to compute cross-neutralization between any pair of variants, including those that have not yet appeared (Figs. 3b and 5e). Combining these data with incidence reconstruction from virus sequencing data22, or a combination of unbiased incidence measures (such as viral load measurements in wastewater32; Extended Data Figs. 3 and 4), we could overcome limitations of earlier works3. Overall, the developed method can accurately model regional immune landscapes that were shaped by the infection history of all (≫100) locally encountered variants, explaining country-specific differences in evolutionary dynamics (Fig. 5) and including countries with regionally endemic variants (for example, BR.2.1 in Australia, and FE.1 and GK.1 in Brazil; Fig. 5d,f,g).
Limitations
Although the developed model has minimal data requirements, genomic surveillance data need to be available to contextualize the risk of occurrence and spread of variants to a regional scale. On the basis of a sensitivity analysis for the German dataset (Extended Data Fig. 8), we recommend at least 100 randomly sampled sequences per week.
Our model relied on a number of assumptions and simplifications. We did not include seasonality, because it is irrelevant for determining the relative variant-specific fitness (it cancels out). Although there is still no scientific consensus regarding the seasonality of SARS-CoV-2 incidence33, a recent study of influenza and endemic coronaviruses suggested that the availability of susceptible individuals may be a dominant driver for infection incidence34.
We did not consider the timeline of vaccination, as we found no impact on relative fitness estimates γy(t) (Extended Data Fig. 6b) for the considered time horizon. This observation can be explained by the limited uptake of BA.4 + BA.5-adapted booster vaccines (for example, ≈5.6 million doses in comparison to ≈23.4 million BA.4 and BA.5 infections in Germany). For the ≈60 million Wuhan-Hu-1-based third-dose boosters in Germany, we observed that they had a scaling effect on γy(t) that vanished by 2023 (Extended Data Fig. 7). Wuhan-Hu-1-based booster vaccinations removed susceptible individuals similarly across all Omicron variants, inducing stronger competition (higher amplitude of γy(t)), without affecting the order of γy(t) estimates, or inflection point predictions. As vaccine uptake further declined from 2023 and infection dynamics remained at a high level (Extended Data Fig. 5), our observation strongly suggested that it is not necessary to consider vaccinations to estimate relative fitness from 2023 onwards (Figs. 4 and 5). However, it will probably be crucial to consider vaccination uptake when evaluating case severity and hospitalization risks, which is beyond the scope of the developed model; the model does not consider cellular immune responses, which are an important determinant of clinical severity.
Notably, as the scope of our model is the prediction of the (humoral) immune pressure at a population level, we consider the average immune dynamics, which may not predict individual protection profiles that emerged from a complex sequence of antigen exposures35. Likewise, our model is not suitable for estimating an absolute protective antibody titre against SARS-CoV-2 infection as it operates with unitless antibody concentrations.
In predicting variant phenotypes, we assumed that antibody-evasive mutational effects are mostly additive36, which allowed us to utilize DMS data to quantify the effect of antibody-evasive alterations in complex evolutionary backgrounds (compare Extended Data Fig. 1d,e–g). This assumption is coherent with cases in which compensatory alterations enable the emergence of antibody-evasive alterations and in which there is no direct epistatic interaction between antibody-evasive alterations37. However, machine-learning-based approaches could in the future be utilized38 instead of DMS, to capture more complex epistatic interactions. Likewise, protein language models39 may be used to forecast the acquisition of additional alterations in emerging variants and, when combined with our work, enable us to extend its prediction horizon (compare Fig. 3b).
Conclusion
Overall, our work provides proof that the ongoing evolution of SARS-CoV-2 is driven by variant-specific population immunity. Moreover, the developed model constitutes a means to calculate the dynamic immune landscape that resulted from recent infection history and determines variant fitness.
In the future, our approach could serve as a basis to identify epitopes most likely to be under recent selective pressure and, therefore, provide cues for designing vaccine candidates that maximize neutralization breadth against emerging variants in a forthcoming season. For this, previously developed approaches in the context of influenza H3N2 vaccine design40,41 could be complemented using the SARS-CoV-2 fitness models developed herein. Furthermore, our conceptual ideas may be transferred to model the evolution of other respiratory viruses that are subject to molecular surveillance.
Methods
DMS data
We used all DMS data5,17 available on 13 February 2023 to assess the phenotype (escape from neutralizing antibodies) of each SARS-CoV-2 variant. DMS measures, in a yeast-display assay, how much a mutated site s in the RBD affects the binding of antibodies elicited by a variant that is not mutated at site s. We utilized data from 836 antibodies that were classified into 12 distinct epitope classes17,18 (see below) and aggregated all values on site level by their mean, yielding ‘escape fractions’ to antibody a for each mutated site efs,a (values were bounded at 0.99). Escape fractions denote a proxy for quantifying the probability that an antibody does not bind an RBD that contains an alteration at site s. Thus, the numerical value depends on the antibody potency, and hence we aimed to convert these values to ‘fold resistances’ (fold change in antibody potency; see also Extended Data Fig. 1a–c). Assuming that mutational effects of site s on the binding affinity are independent, the binding probability of an antibody elicited by a variant x to a variant y can be expressed as
in which efs,a is the normalized escape fraction of mutated site s with respect to antibody a, and \(\varOmega (x,y)\) denotes the set of RBD sites that distinguish variants x and y (ref. 17). Using classical pharmacodynamic approaches, we then model the binding probability as
in which FRx,y(a) denotes the fold resistance of variant y to an antibody a elicited by variant x. The parameter \({{\rm{IC}}}_{50({\rm{DMS}})}(a)\) corresponds to the half-maximal inhibitory concentration (that is, ‘potency’) of the antibody against the antibody-eliciting variant, which was extracted from the DMS dataset. Notably, ca = 400 μg ml−1 was the antibody concentration at which the DMS experiment was conducted4,18. Combining equations (1) and (2) yields:
As already evident from Extended Data Fig. 1a–c, DMS estimates of efs,a, as well as the corresponding FRx,y(a), can become unreliable depending on antibody concentrations and antibody potency \({{\rm{IC}}}_{50(x)}(a)\), falsely predicting hyper-susceptibility (FRx,y(a) < 1). We enforce that FRx,y(a) ≥1 to avoid such artefacts.
Epitope classes
On the basis of the similarity of antibody profiles in DMS data, antibodies were previously classified into 12 epitope classes (A, B, C, D1, D2, E1, E2.1, E2.2, E3, F1, F2, F3)18 (Supplementary Tables 1 and 2). As we encountered more than 30% missing values for epitope classes E2.1 and E2.2, we merged them with E1 into a new class (E12), as they bind to similar regions, including amino acid site R346 (ref. 42). Finally, we retrieved a matrix of 10 epitope classes Α = (A, B, C, D1, D2, E12, E3, F1, F2, F3) for 179 RBD sites. This classification indicates that antibodies belonging to the same class bind to overlapping epitopes, and there is little overlap between epitope classes (Fig. 2a). Consequently, we assumed that antibodies within the same epitope class would be similarly affected by RBD alterations, whereas phenotypic changes between epitope classes may vary considerably.
We then quantified the fold resistance associated with each epitope class as the average fold resistance of all antibodies belonging to the class
for all epitope classes \({\vartheta }\) in Α (Fig. 2a).
As a proof of concept, we compared DMS-derived FRx,y(\({\vartheta }\)) using the calculations above with fold resistance values obtained from virus neutralization assays (reported in ref. 42) for antibodies targeting all epitope classes Α defined above. As can be seen in Extended Data Fig. 1d, we observed a strong and significant positive correlation between the DMS-derived FRx,y(\({\vartheta }\)) and those obtained by neutralization assays and reasonable agreement for polyclonal sera (Extended Data Fig. 1e–g).
As the DMS experiments were generated only for RBD-targeting antibodies, no escape data were available to quantify the fold resistance of NTD-targeting antibodies. To overcome this limitation, we included an additional class of NTD-targeting antibodies targeting three antigenic super-sites43: spike amino acid positions 14–20, 140–158 and 245–264. Consequently, we assigned alterations in the antigenic super-sites fold resistance values of 10, which is in range with corresponding ELISA experiments43. However, the model can be updated if comprehensive DMS data for the NTD domain become available44.
Assuming independence between mutational effects, the total fold resistance of a variant y to binding of an NTD-targeting antibody elicited by a variant x was computed as:
in which |Ω(x, y)| denotes the number of mutational differences between variants x and y in the antigenic super-site of the NTD.
Variant cross-neutralization probability
We assumed that a virus is neutralized if at least one antibody is bound to its surface (either at the RBD or NTD of the spike protein). Here, we collectively consider all antibodies from the same epitope class as they compete for the same binding site. By assuming binding independence between epitope classes, the neutralization probability can be computed as:
with \({b}_{{\vartheta }}(t,x,y)\) denoting the probability that an antibody of epitope class \({\vartheta }\) in Α ∪ (NTD) binds to the virus with
in which \({c}_{{\vartheta }}(t)\) is the antibody’s concentration in an individual at time t, \({{\rm{IC}}}_{50(x)}({\vartheta })\) is the half-maximal inhibitory antibody concentration against the variant that elicited the antibody. FRx,y(\({\vartheta }\)) is the fold resistance of variant y to binding of antibodies of epitope class \({\vartheta }\), elicited by variant x.
Antibody potency
Next, we quantified \({{\rm{IC}}}_{50(x)}({\vartheta })\) for each epitope class. As the DMS data were derived from yeast-display RBD mutant libraries, absolute antibody potencies may not directly translate to a clinical setting. However, the ranking of antibody potencies may be preserved. Consequently, we estimated the relative potency D(\({\vartheta }\)) from the DMS data:
in which \({\vartheta }\in {\rm{A}}\), and \(\widehat{\,{{\rm{IC}}}_{{50}_{({\rm{DMS}})}}}({\vartheta })\) denotes the average potency of all antibodies belonging to epitope class \({\vartheta }\). Epitope-class-specific clinical antibody potency \({{\rm{IC}}}_{50(x)}({\vartheta })\) was then inferred using the following relation:
in which \(\widehat{\,{{\rm{IC}}}_{{50}_{(x)}}}\) denotes the \({{\rm{IC}}}_{50(x)}\) averaged over all epitope classes. NTD-targeting antibodies were not included in the DMS dataset, and hence we set
\(\widehat{\,{{\rm{IC}}}_{{50}_{(x)}}}\) was the only free parameter in the model, which we estimated by fitting our model to (Wuhan-Hu-1-strain) vaccine efficacy (VE) data against the Delta lineage (B.1.617.2) present between 4 July 2021 and 31 December 2021 (Fig. 2d; genomic profile in Supplementary Table 3).
We considered interindividual differences in antibody pharmacokinetics (see below), implemented as combinations of the parameters tmax (time of maximal antibody concentration) and thalf (antibody half-life). For parameter estimation, we first estimated optimal drug potencies \(\widehat{\,{{\rm{IC}}}_{{50}_{(x)}}}\) (tmax, thalf) for each tmax, thalf combination in a 5 × 15 grid (ranges below) and then averaged over these 75 estimates. Parameter estimation was performed using scipy.optimize.root, applying the Levenberg–Marquardt method to solve the ordinary least-square problem.
in which VE(t, Wuhan-Hu-1, Delta) denotes the vaccine efficacy against the Delta strain t days after antigen exposure with the Wuhan-Hu-1 strain. Here, we assumed that VE = infection risk reduction ≈ PNeut.
As a proof of concept, we then tested our predictions with Wuhan-Hu-1-strain VE data against Omicron infection (Fig. 2d; genomic profile in Supplementary Table 3). Utilized VE data include those from all studies in which Wuhan-Hu-1-strain vaccines were tested and which were computed on the basis of hazard ratios or rate of confirmed infection (Supplementary Tables 4 and 5).
Antibody pharmacokinetics
To determine the duration of sterilizing immunity against any variant y, we accounted for antibody pharmacokinetics (PK) after antigen exposure to variant x (by means of infection or vaccination). Pharmacokinetics were considered using a classical, descriptive linear model with an analytical solution
in which t denotes the time after antigen exposure and c(t) denotes the normalized (fraction of maximum) concentrations of the antibody. The parameters ke and ka (elimination and ‘absorption’ parameters in classical PK models) were related to known quantities through established PK relations; that is,
In our simulations, we considered identical PK for antibodies of the different epitope classes. Utilized parameters (tmax, thalf) were extracted from the literature: the time of maximal antibody concentrations (tmax) varied between 14 and 28 days after antigen exposure26,45,46, and the half-life (thalf) ranged between 25 and 69 days19,47,48,49,50,51,52,53. For simulation, we used different combinations of (tmax, thalf) in a 5 × 15 grid within a range of tmax within 14 to 28 days and thalf within 25 to 69 days and plotted the range of predictions (minimum, maximum).
SARS-CoV-2 genomic data
We collected SARS-CoV-2 genomic data from Germany published by the Robert Koch Institute, including only genomic data from the ‘random sampling’ strategy, which denotes most of the sequence data. For other countries (Australia, Brazil, Canada, Denmark, France, Japan, Mexico, South Africa, Sweden, the UK and the USA), we downloaded data from GISAID (https://gisaid.org); however, here we could not be sure that the data are representative (randomly sampled), as anyone can upload SARS-CoV-2 data to GISAID. A dataset summary is given in Supplementary Table 6.
Variant proportions and spike pseudo-groups
If pangolin lineage information was absent in the data, lineage information was assigned using established methods54,55. Alteration profiles for all sequences were extracted using covSonar. For each lineage, we collected all ‘characteristic’ spike amino acid changes in the RBD (amino acid position 331–541) and the NTD antigenic super-sites regions (amino acid positions 14–20, 141–158 and 245–264) for subsequent analyses. In our work, ‘characteristic’ implied that an amino acid change was present in at least 75% of all sequences from that lineage. The ‘antigenic profile’ for each lineage was then determined on the basis of the set of unique alterations within the NTD and RBD regions. Differences between lineages were defined as the set difference between alteration profiles. Clustering lineages with identical ‘antigenic profiles’ yielded spike pseudo-groups with distinct genomic profiles in the RBD and NTD region of the spike protein. On the basis of the genomic profiles and their clustering into spike pseudo-groups, we computed pseudo-group frequencies πx(t) for all x ∈ \({\mathcal{X}}\) in the entire observation horizon. The frequencies were computed such that there were at least 100 sequences per time step, and daily lineage frequencies were computed by using linear interpolation between time steps. Furthermore, we filtered out spike pseudo-groups that never reached levels of >1% frequency to reduce noise from sequencing errors. Data files with lineage and pseudo-group frequencies and alteration profiles for all countries are available via GitHub at https://github.com/KleistLab/VASIL (a summary is given in Supplementary Table 6).
Genome-based reconstruction of infection timeline
Unfortunately, the infection timeline cannot be reconstructed from reported cases, as test coverage and reporting of SARS-CoV-2 became increasingly unreliable. To overcome this data limitation, we recently developed the genome-based incidence estimator, GInPipe22. This computational pipeline reconstructs the infection timeline solely from time-stamped viral sequences. Owing to the short infectious phase of SARS-CoV-2 (ref. 56), a limited amount of intra-patient evolution is typically observed before an infection is passed on. This implies the existence of an ‘evolutionary signal’ ϕ that relates haplotype diversity and the number of alterations present in a temporal collection of viral sequences to the number of infections (see ref. 22 for details). We confirmed previously that this evolutionary signal is proportional to the actual number of infected individuals I(t) ≈ cϕ(t) at time t.
Although GInPipe works quite robustly when sequencing efforts change over time, extreme changes may cause biases22. For the USA and the UK, we observed major drops in the number of available sequences (≈10-fold after 2022), and henceforth downsampled the number of viral sequences to 6,000 and 2,500 per day, respectively, which corresponds to the maximum number of sequences after the drop in sequencing efforts (Extended Data Fig. 5).
In the pipeline, sequences are pooled according to their collection date, such that ‘bins’ b comprise either the same number of sequences nb or span the same number of days ∆db. We chose time spans ∆db = 7, 10 and 14 days and bin sizes of nb = 2% and 5% of all sequences, as well as the average weekly number of sequences for each particular country. Incidence correlates ϕb were filtered out if the time span of a bin was smaller than 3 days, or if a bin comprised fewer than 50 sequences. We allowed a bin to span at most 21 days (data-rich setting) or as many days as necessary to comprise the minimal number of sequences (data-poor settings). Bin-wise ϕb estimates were smoothed using kernel smoothing with a bandwidth of 14. GInPipe-estimated incidence correlates are depicted in Extended Data Fig. 5 for all investigated countries.
To confirm the validity of GInPipe-estimated incidences, we compared our predictions with other unbiased incidence measures for Germany (citizen science and wastewater data; Extended Data Fig. 3) and with a UK dataset from the representative COVID-19 Infection Survey of the Office for National Statistics (ONS)6, using the percentage of infected individuals in the UK from the ONS study as the ground truth (March 2021–March 2023; Extended Data Fig. 4).
The percentage of PCR-positive individuals from GInPipe at the time point of interest was then computed using a rolling sum over 10 days56, and normalized with the ratio of the population size and the sum of infected people in the respective time horizon according to the ONS. For the reported cases, the rolling sum over 10 days was normalized by the population size of the UK.
Expected sterilizing immunity against variant y
The estimation of cross-neutralization probability PNeut enabled us to estimate the expected number of individuals being immune against infection with a variant y for a given time point t by taking the infection history before time t into account. The expected number of individuals immune to infection with strain y is given by
in which \({\mathcal{X}}\) denotes the set of all variants present within the time horizon of interest, PNeut(t − s, x, y) denotes the probability that an infection with strain x, that occurred t − s days ago cross-neutralizes a variant y. In the equation above, πx(s) denotes the proportion of variant x at day s < t, derived from the molecular surveillance, and I(t) denotes the number of infected individuals at some previous time point s. The expected number of susceptible individuals to a variant y at time t can then be calculated as \({\mathbb{E}}[{{\rm{S}}}_{y}(t)]={\rm{Pop}}-{\mathbb{E}}[{{\rm{Immune}}}_{y}(t)]\), in which Pop denotes the total population size. The variable I(s) is typically not available, but can be replaced by incidence correlates ϕ(s) = I(s)/c (see below).
Variant dynamics
To estimate whether an emerging variant may successfully out-compete existing variants, we estimated the relative growth advantage of a variant γy(t):
in which the denominator denotes the average growth rate across all variants existing at time t and for which αx > 0 denotes a variant’s intrinsic (antibody-independent) relative transmission fitness, which we assumed to be nearly identical for all circulating variants αx ≈ α, implying that variant dynamics are dominated by infection history and immune dynamics. We ignored low-abundance variants with πx(t) < 1% and renormalized accordingly. We computed the frequency-weighted average γz(t), whenever different variants were combined during analyses (as indicated in the respective graphics).
Replacing infection numbers with incidence correlates
As infection numbers I(t) are typically unreliable, our method can utilize any incidence correlate, such as GInPipe’s ϕ(t), wastewater virus load trajectories or reliable estimates from large citizen science projects. With regard to GInPipe’s output, we showed in Extended Data Figs. 3 and 4 and in ref. 22 that I(t) ≈ cϕ(t), which allows us to use ϕ(s) instead of the number of infected individuals I(s) when computing \({\mathbb{E}}[{{\rm{Immune}}}_{x}(t)]\), as performed in Figs. 3–5. To compute \({\mathbb{E}}[{{\rm{S}}}_{x}(t)]\), we can write \({\rm{Pop}}=k\cdot \int I(s){\rm{d}}s\approx k\cdot c\cdot \int \phi (s){\rm{d}}s\), for which k > 0 is a scaling factor (for example, k−1 = 2 implies that every individual got infected twice, on average, over the time horizon of interest). When reconstructing actual incidences for Germany and the UK, we found that k ≈ 1 for the length of the time horizon (Extended Data Fig. 9). k is a modelling choice and any mis-estimation of k affects only the scaling of γy(t) (Extended Data Fig. 10). Qualitative results (inflection points γy = 0; variant ‘growth’ γy > 0; variant ‘decline’ γy < 0) remain unaffected. The interpretation of this ‘scaling’ is straightforward: if one overestimates the number of individuals that became infected, one under-predicts the number of susceptible individuals and thus infers stronger competition among variants, which is expressed in overestimation of the amplitude of γy(t) by some constant ξ > 1.
Relation of γ y to variant dynamics
In a discrete-time quasi-species model57, the theoretical variant frequencies \({\boldsymbol{p}}\in {[\mathrm{0,}1]}^{|{\mathcal{X}}|}\) are given by
for which \({\boldsymbol{Q}}\in {{\mathbb{R}}}^{|{\mathcal{X}}|\times |{\mathcal{X}}|}\) denotes a transition matrix between different variants, which we set to the identity matrix Q = Id, ignoring any mutational transitions from one variant to another. Fitness values of any variant y, relative to the population average \(\bar{f}(t)=\,{\sum }_{x}{p}_{x}(t)\cdot {f}_{x}(t)\) are contained in matrix \({\boldsymbol{F}}(t)={\rm{diag}}([\ldots ,\,{f}_{y}(t)/\bar{f}(t),\,\ldots ])\). Consequently, for py(t) > 0, we get
if fitness is determined by population immunity.
Modelling booster vaccinations in Germany
As the vaccination timeline contains the number of individuals that were vaccinated, we first reconstructed actual infection numbers for Germany using GInPipe22: we inferred the time-dependent case ascertainment probabilities Prep(t) ≤ 1 (that is, the probability of an infection being reported):
for which Irep(t) ≤ I(t) denotes the daily reported infections (weekly cases/7). We smoothed reported cases Irep(t) with a bandwidth of 14.
Last, we normalized the case ascertainment probabilities \({\widetilde{P}}_{{\rm{rep}}}(t)\) at maximum to be able to estimate the minimum number of infections Imin(t)
The minimum number of infections is then calculated as
The impact of infections on the immune landscape was then modelled as outlined above by setting I(t) ≈ Imin(t). To model the impact of booster vaccinations in Germany (Extended Data Fig. 7), we added exposure to either the Wuhan-Hu-1-variant antigen or the BA.4 + BA.5-variant antigen corresponding to the timeline of vaccination (https://impfdashboard.de/en/) to the infection timeline and computed the relative fitness γy(t) on this basis. Note that this reconstruction of actual infection numbers can become unstable, as it is normalized to an extreme point (the maximum) and therefore warrants post-analysis inspection. Moreover, it requires the use of case reporting data, which largely ceased by the end of 2023.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The DMS data used in this study can be accessed via GitHub at https://github.com/jbloomlab/SARS2_RBD_Ab_escape_maps/blob/main/processed_data/escape_data.csv and a processed version is available via GitHub at https://github.com/KleistLab/VASIL/blob/main/ByCountry/Australia/results/epitope_data/dms_per_ab_per_site.csv. For the evaluation of the German SARS-CoV-2 outbreak, we used genomic data provided via the German Sequence Data Hub (DESH) to the Robert Koch Institute, available via GitHub at https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland, and via Zenodo at https://zenodo.org/records/13987397 (ref. 58; details for the subset in Supplementary Table 6). Wastewater surveillance data for Germany are provided by the Robert Koch Institute via GitHub at https://github.com/robert-koch-institut/Abwassersurveillance_AMELAG and can be accessed via Zenodo at https://zenodo.org/records/12704658 (ref. 59). Reported case numbers for Germany were taken from GitHub at https://github.com/robert-koch-institut/COVID-19_7-Tage-Inzidenz_in_Deutschland. The evaluation of other countries in this study was based on genomic data associated with 5,617,986 SARS-CoV-2 sequences available on GISAID (https://gisaid.org/) and accessible at https://doi.org/10.55876/gis8.241022rp (Supplementary Note 1 and Supplementary Table 6). Source data are provided with this paper.
Code availability
Codes were written in Python 3.11.3 and R version 4.2.3 (15 March 2023) and are available via GitHub at https://github.com/KleistLab/VASIL and via Zenodo at https://doi.org/10.5281/zenodo.8349295 (ref. 60). Simulations were performed on the high-performance computing cluster at ZEDAT, Freie Universität Berlin61. The pipeline for the genome-based incidence estimation (GInPipe) is available via GitHub at https://github.com/KleistLab/GInPipe. Alteration profiles for all sequences were extracted using covSonar, version 1.1.11, which is available via GitHub at https://github.com/rki-mf1/covsonar.
References
Markov, P. V. et al. The evolution of SARS-CoV-2. Nat. Rev. Microbiol. 21, 361–379 (2023).
Chen, Y. et al. Broadly neutralizing antibodies to SARS-CoV-2 and other human coronaviruses. Nat. Rev. Immunol. 23, 189–199 (2023).
Meijers, M., Ruchnewitz, D., Eberhardt, J., Luksza, M. & Lassig, M. Population immunity predicts evolutionary trajectories of SARS-CoV-2. Cell 186, 5151–5164 (2023).
Greaney, A. J. et al. Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition. Cell Host Microbe 29, 44–57 (2021).
Cao, Y. et al. Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution. Nature 614, 521–529 (2023).
Office for National Statistics. Coronavirus (COVID-19) Infection Survey, UK: 24 March 2023. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/coronaviruscovid19infectionsurveypilot/24march2023#:~:text=In%20England%2C%20the%20estimated%20number,around%201%20in%2040%20people (2023).
World Health Organization. Weekly epidemiological update on COVID-19 - 1 June 2023. https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---1-june-2023 (2023).
Davis, H. E., McCorkell, L., Vogel, J. M. & Topol, E. J. Long COVID: major findings, mechanisms and recommendations. Nat. Rev. Microbiol. 21, 133–146 (2023).
World Health Organization. Coronavirus disease (COVID-19) pandemic. https://www.who.int/europe/emergencies/situations/covid-19 (2023).
Jones, J. M. et al. Estimates of SARS-CoV-2 seroprevalence and incidence of primary SARS-CoV-2 infections among blood donors, by COVID-19 vaccination status - United States, April 2021-September 2022. Morb. Mortal. Wkly Rep. 72, 601–605 (2023).
Dejnirattisai, W. et al. SARS-CoV-2 Omicron-B.1.1.529 leads to widespread escape from neutralizing antibody responses. Cell 185, 467–484 (2022).
McCallum, M. et al. Structural basis of SARS-CoV-2 Omicron immune evasion and receptor engagement. Science 375, 864–868 (2022).
Nutalai, R. et al. Potent cross-reactive antibodies following Omicron breakthrough in vaccinees. Cell 185, 2116–2131 (2022).
Lyngse, F. P. et al. Household transmission of SARS-CoV-2 Omicron variant of concern subvariants BA.1 and BA.2 in Denmark. Nat. Commun. 13, 5760 (2022).
Hale, T. et al. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker). Nat. Hum. Behav. 5, 529–538 (2021).
Worp, N. et al. Towards the development of a SARS-CoV-2 variant risk assessment tool: expert consultation on the assessment of scientific evidence on emerging variants. Lancet Microbe 4, e830–e836 (2023).
Greaney, A. J., Starr, T. N. & Bloom, J. D. An antibody-escape estimator for mutations to the SARS-CoV-2 receptor-binding domain. Virus Evol. 8, veac021 (2022).
Cao, Y. et al. Omicron escapes the majority of existing SARS-CoV-2 neutralizing antibodies. Nature 602, 657–663 (2022).
Khoury, D. S. et al. Neutralizing antibody levels are highly predictive of immune protection from symptomatic SARS-CoV-2 infection. Nat. Med. 27, 1205–1211 (2021).
Kryscio, R. J., Mendiondo, M. S., Schmitt, F. A. & Markesbery, W. R. Designing a large prevention trial: statistical issues. Stat. Med. 23, 285–296 (2004).
Gruell, H. et al. mRNA booster immunization elicits potent neutralizing serum activity against the SARS-CoV-2 Omicron variant. Nat. Med. 28, 477–480 (2022).
Smith, M. R. et al. Rapid incidence estimation from SARS-CoV-2 genomes reveals decreased case detection in Europe during summer 2020. Nat. Commun. 12, 6009 (2021).
Focosi, D., Quiroga, R., McConnell, S., Johnson, M. C. & Casadevall, A. Convergent evolution in SARS-CoV-2 spike creates a variant soup from which new COVID-19 waves emerge. Int. J. Mol. Sci. 24, 2264 (2023).
Ito, J. et al. Convergent evolution of SARS-CoV-2 Omicron subvariants leading to the emergence of BQ.1.1 variant. Nat. Commun. 14, 2671 (2023).
Sette, A. & Crotty, S. Adaptive immunity to SARS-CoV-2 and COVID-19. Cell 184, 861–880 (2021).
Liao, B. et al. Detection of anti-SARS-CoV-2-S2 IgG is more sensitive than anti-RBD IgG in identifying asymptomatic COVID-19 patients. Front. Immunol. 12, 724763 (2021).
Rydyznski Moderbacher, C. et al. Antigen-specific adaptive immunity to SARS-CoV-2 in acute COVID-19 and associations with age and disease severity. Cell 183, 996–1012 (2020).
Tan, A. T. et al. Early induction of functional SARS-CoV-2-specific T cells associates with rapid viral clearance and mild disease in COVID-19 patients. Cell Rep. 34, 108728 (2021).
Choi, S. J. et al. T cell epitopes in SARS-CoV-2 proteins are substantially conserved in the Omicron variant. Cell. Mol. Immunol. 19, 447–448 (2022).
Moss, P. The T cell immune response against SARS-CoV-2. Nat. Immunol. 23, 186–193 (2022).
Poland, G. A., Ovsyannikova, I. G. & Kennedy, R. B. SARS-CoV-2 immunity: review and applications to phase 3 vaccine candidates. Lancet 396, 1595–1606 (2020).
Torabi, F. et al. Wastewater-based surveillance models for COVID-19: a focused review on spatio-temporal models. Heliyon 9, e21734 (2023).
McClymont, H. & Hu, W. Weather variability and COVID-19 transmission: a review of recent research. Int. J. Environ. Res. Public Health 18, 396 (2021).
Baker, R. E., Yang, W., Vecchi, G. A., Metcalf, C. J. E. & Grenfell, B. T. Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic. Science 369, 315–319 (2020).
Yang, L. et al. Antigen presentation dynamics shape the antibody response to variants like SARS-CoV-2 Omicron after multiple vaccinations with the original strain. Cell Rep. 42, 112256 (2023).
Rochman, N. D. et al. Epistasis at the SARS-CoV-2 receptor-binding domain interface and the propitiously boring implications for vaccine escape. mBio 13, e0013522 (2022).
Moulana, A. et al. Compensatory epistasis maintains ACE2 affinity in SARS-CoV-2 Omicron BA.1. Nat. Commun. 13, 7011 (2022).
Thadani, N. N. et al. Learning from prepandemic data to forecast viral escape. Nature 622, 818–825 (2023).
Ito, J. et al. A protein language model for exploring viral fitness landscapes. Preprint at bioRxiv https://doi.org/10.1101/2024.03.15.584819 (2024).
Luksza, M. & Lassig, M. A predictive fitness model for influenza. Nature 507, 57–61 (2014).
Huddleston, J. et al. Integrating genotypes and phenotypes improves long-term forecasts of seasonal influenza A/H3N2 evolution. Elife 9, e60067 (2020).
Cao, Y. et al. BA.2.12.1, BA.4 and BA.5 escape antibodies elicited by Omicron infection. Nature 608, 593–602 (2022).
McCallum, M. et al. N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2. Cell 184, 2332–2347 (2021).
Dadonaite, B. et al. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell 186, 1263–1278 (2023).
Lumley, S. F. et al. The duration, dynamics, and determinants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibody responses in individual healthcare workers. Clin. Infect. Dis. 73, e699–e709 (2021).
Arkhipova-Jenkins, I. et al. Antibody response after SARS-CoV-2 infection and implications for immunity: a rapid living review. Ann. Intern. Med. 174, 811–821 (2021).
Bayart, J. L. et al. Waning of IgG, total and neutralizing antibodies 6 months post-vaccination with BNT162b2 in healthcare workers. Vaccines 9, 1092 (2021).
Barnes, T. W. et al. Determination of neutralising anti-SARS-CoV-2 antibody half-life in COVID-19 convalescent donors. Clin. Immunol. 232, 108871 (2021).
Xue, J. H. et al. Anti-receptor-binding domain immunoglobulin G antibody as a predictor of seropositivity for anti-SARS-CoV-2 neutralizing antibody. Arch. Pathol. Lab. Med. 146, 814–821 (2022).
Widge, A. T. et al. Durability of responses after SARS-CoV-2 mRNA-1273 vaccination. N. Engl. J. Med. 384, 80–82 (2021).
Dan, J. M. et al. Immunological memory to SARS-CoV-2 assessed for up to 8 months after infection. Science 371, eabf4063 (2021).
Wheatley, A. K. et al. Evolution of immune responses to SARS-CoV-2 in mild-moderate COVID-19. Nat. Commun. 12, 1162 (2021).
Van Elslande, J., Gruwier, L., Godderis, L. & Vermeersch, P. Estimated half-life of SARS-CoV-2 anti-spike antibodies more than double the half-life of anti-nucleocapsid antibodies in healthcare workers. Clin. Infect. Dis. 73, 2366–2368 (2021).
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
O’Toole, A. et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol. 7, veab064 (2021).
van der Toorn, W. et al. An intra-host SARS-CoV-2 dynamics model to assess testing and quarantine strategies for incoming travelers, contact management, and de-isolation. Patterns 2, 100262 (2021).
Bratus, A. S., Novozhilov, A. S. & Semenov, Y. S. in Advanced Mathematical Methods in Biosciences and Applications (eds Berezovskaya, F. & Toni, B.) 27–51 (Springer, 2019).
Robert Koch Institute. Random samples of SARS-CoV-2 sequences for Germany. Zenodo https://zenodo.org/records/13987397 (2024).
Robert Koch Institut, Fachgebiet 32. Abwassersurveillance AMELAG. Zenodo https://zenodo.org/records/12704658 (2024).
Raharinirina, N. A. et al. SARS-CoV-2 evolution on a dynamic immune landscape. Zenodo https://doi.org/10.5281/zenodo.8349295 (2024).
Bennett, L., Melchers, B. & Proppe, B. Curta: a general-purpose high-performance computer at ZEDAT, Freie Universität Berlin. FU Refubium http://dx.doi.org/10.17169/refubium-26754 (2020).
Yisimayi, A. et al. Repeated Omicron exposures override ancestral SARS-CoV-2 immune imprinting. Nature 625, 148–156 (2024).
Wang, P. et al. Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7. Nature 593, 130–135 (2021).
Uriu, K. et al. Neutralization of the SARS-CoV-2 Mu variant by convalescent and vaccine serum. N. Engl. J. Med. 385, 2397–2399 (2021).
Planas, D. et al. Reduced sensitivity of SARS-CoV-2 variant Delta to antibody neutralization. Nature 596, 276–280 (2021).
Mlcochova, P. et al. SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion. Nature 599, 114–119 (2021).
Liu, Y. et al. BNT162b2-elicited neutralization against new SARS-CoV-2 spike variants. N. Engl. J. Med. 385, 472–474 (2021).
Garcia-Beltran, W. F. et al. Multiple SARS-CoV-2 variants escape neutralization by vaccine-induced humoral immunity. Cell 184, 2372–2383 (2021).
Bates, T. A. et al. Neutralization of SARS-CoV-2 variants by convalescent and BNT162b2 vaccinated serum. Nat. Commun. 12, 5135 (2021).
Rossler, A., Riepler, L., Bante, D., von Laer, D. & Kimpel, J. SARS-CoV-2 Omicron variant neutralization in serum from vaccinated and convalescent persons. N. Engl. J. Med. 386, 698–700 (2022).
Planas, D. et al. Considerable escape of SARS-CoV-2 Omicron to antibody neutralization. Nature 602, 671–675 (2022).
Liu, C. et al. Reduced neutralization of SARS-CoV-2 B.1.617 by vaccine and convalescent serum. Cell 184, 4220–4236 (2021).
Cameroni, E. et al. Broadly neutralizing antibodies overcome SARS-CoV-2 Omicron antigenic shift. Nature 602, 664–670 (2022).
Loenenbach, A. et al. Participatory, virologic, and wastewater surveillance data to assess underestimation of COVID-19 incidence, Germany, 2020-2024. Emerg. Infect. Dis. 30, 1939–1943 (2024).
Oh, D. Y. et al. Advancing precision vaccinology by molecular and genomic surveillance of severe acute respiratory syndrome coronavirus 2 in Germany, 2021. Clin. Infect. Dis. 75, S110–S120 (2022).
Backer, J. A. et al. Shorter serial intervals in SARS-CoV-2 cases with Omicron BA.1 variant compared with Delta variant, the Netherlands, 13 to 26 December 2021. Euro Surveill. 27, 2200042 (2022).
Galmiche, S. et al. SARS-CoV-2 incubation period across variants of concern, individual factors, and circumstances of infection in France: a case series analysis from the ComCor study. Lancet Microbe 4, e409–e417 (2023).
Ito, K., Piantham, C. & Nishiura, H. Relative instantaneous reproduction number of Omicron SARS-CoV-2 variant with respect to the Delta variant in Denmark. J. Med. Virol. 94, 2265–2268 (2022).
Khandia, R. et al. Emergence of SARS-CoV-2 Omicron (B.1.1.529) variant, salient features, high global health concerns and strategies to counter it amid ongoing COVID-19 pandemic. Environ. Res. 209, 112816 (2022).
Our World in Data. Coronavirus (COVID-19) Cases. https://covid.ourworldindata.org/data/owid-covid-data.csv (2024).
Acknowledgements
We thank the German Electronic Sequence Data Hub (DESH) and all of its data contributors (that is, the authors from the originating laboratories responsible for obtaining the specimens and the submitting laboratories in which genetic sequence data were generated and shared); the Sequencing Core Facility of the Genome Competence Center (MF1) at Robert Koch Institute for sequencing services; all laboratories contributing to the German SARS-CoV-2 surveillance; and all international laboratories for contributing SARS-CoV-2 sequences to the GISAID EpiCoV database. We acknowledge high-performance computing services provided through ZEDAT at the FU Berlin and at the Robert Koch Institute. D.B. and M.v.K. acknowledge funding from the German Ministry for Science and Education (BMBF), grant number 01KI2016. M.R.S. was supported by funds from the Ministry for Economic Affairs and Climate action (BMWK), Daten- und KI gestütztes Frühwarnsystem zur Stabilisierung der deutschen Wirtschaft, with grant number 01MK21009H, and N.A.R., N.G., C.S. and M.v.K. acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – The Berlin Mathematics Research Center MATH+ (EXC-2046/1, project ID: 390685689). S.F. and T.W acknowledge funding through the European Center for Disease Prevention and Control, grant number 2021/008 ECD.122222, and T.W. acknowledges funding through grant D82015 by the German Ministry of Health (BMG). The funders had no role in designing the research or the decision to publish.
Funding
Open access funding provided by Robert Koch-Institut.
Author information
Authors and Affiliations
Contributions
Conceptualization: D.-Y.O. and M.v.K.; methodology: N.A.R., N.G., D.B., M.R.S., C.M., C.S. and M.v.K.; investigation: N.A.R., N.G., D.B., M.R.S. and S.P.; writing original draft: N.A.R., N.G., D.B., M.R.S., M.B., M.H., S.P. and M.v.K.; funding acquisition: S.F., T.W. and M.v.K.; supervision: R.D., T.W., M.H. and M.v.K.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Utilization of DMS data and comparison of DMS-derived fold resistances with titer changes from mono- and polyclonal sera.
a. Theoretical binding curve of an antibody a to a wild-type epitope of SARS-CoV-2 (black line) and corresponding binding curve to an epitope that is mutated at site s (red line). The black dot and red circle mark the concentrations (x-axis) where the binding is half maximal (y-axis) for the wild type IC50DMS(a) and mutant virus IC50DMS(a,s) respectively. The ‘fold resistance’ (red-black dashed line) denotes the shift in the IC50, such that IC50DMS(a,s) = FR(a, s) · IC50DMS(a). The red square marks the DMS-measured unbound fraction (escape fraction ef(s, a)) and the upward-pointing grey arrow the concentration at which the DMS experiment was conducted. Both, IC50DMS(a), ef(s, a) were measured in the original DMS experiment at an antibody concentration of 400 μg/mL. However, the same DMS experiment, performed with a more- (panel b.) or less (panel c.) potent antibody would yield a smaller, respectively bigger escape fraction, while the phenotypic effect FR(a, s) of the mutation s is quantitatively identical. To avoid these artifacts, we calculated FR(a, s) from the original DMS data. d. Comparison between DMS-derived fold resistances FRx,y and fold resistances derived from neutralization assays (mono-clonal). Distinct markers show the epitope classes and distinct colors the Spike-pseudo-groups. e-g. Comparison between DMS-derived changes in neutralization and neutralization titer changes (polyclonal sera). Geometric mean change in neutralization titers for Alpha (e.), Delta (f.) and BA.2 (g.) after exposure to the Wuhan antigen are highlighted by dots (± standard deviation), whereas DMS-based model predictions are shown as a blue vertical line and were computed as outlined in17,62. Raw data on neutralization titer changes21,63,64,65,66,67,68,69,70,71,72,73. h. Normalized antibody pharmacokinetics after antigen exposure. Parameters and equations given in the Methods section.
Extended Data Fig. 2 Spike-pseudo-group dynamics in Germany over the investigated time horizon.
For readability we only plot pseudo-groups (groups of lineages with identical mutation profiles in the spike protein) that appear at a frequency of > 1%.
Extended Data Fig. 3 Integration of wastewater virus loads with GInPipe’s incidence estimates and validation for Germany.
The upper panel shows the number of viral sequences per day in Germany (black bars = raw data, black line = 7-days rolling average). The middle panel depicts the estimation of the GInPipe-based minimal number of SARS-CoV-2 infected individuals per day in Germany (red area; description in the Methods) and viral load measurements from wastewater, linearly scaled to the GInPipe-based incidence (blue area). The daily reported cases are given by the black and grey dashed line. The lower panel shows the estimated under-reporting factor (ratio of incidence estimate vs. reported cases) on a log10 scale. The red line shows under-reporting estimates for GInPipe for the time with sufficient sequencing data and continuing in blue with the scaled wastewater incidence estimates. The light blue and orange points, together with the respective smoothed lines, show the under-reporting factor estimated with GrippeWeb (citizen-science) data using the virologic positivity rate approach (GW VPR) and the self-reported testing approach (GW SR)74. The vertical red line indicates the chosen cut date at 01. April 23, when the available sequence data declined drastically from (> 10,000 to ≤ 100 sequences/week).
Extended Data Fig. 4 Validation of genome-based incidence estimates, using data from the UK.
a. Average percentage of the population testing positive from the COVID-19 Infection Survey of the Office for National Statistics (ONS; ground truth data). Coloured lines depict the percentage for each UK constituent country. The black dashed line shows the average PCR-positivity (over all constituent countries) for the UK. b. Estimation of the SARS-CoV-2 positive population in the UK using ONS data (black line; ground truth), GInPipe’s estimate (red) scaled to the population (see Methods) and reported cases (grey), scaled to the population. A right-sided rolling sum of 10 days is applied to the reporting data and GInPipe’s estimates to approximate PCR-positivity. c. Under-reporting factor calculated as the ratio of the ONS data and the SARS-CoV-2 positive population fraction using reporting data (grey), and GInPipe’s incidence estimate (red). In line with previous work75, we performed separate GInPipe-incidence reconstructions for pre-omicron vs. omicron sequences (because of different evolutionary signals76,77,78,79, which are utilized in GInPipe22). The pre-omicron vs. omicron estimates were then scaled and added together to compute total incidence.
Extended Data Fig. 5 Incidence reconstruction for all considered countries.
For each country, the upper panel shows the output of GInPipe (i.e., incidence correlates ϕ). The dots show the point estimates for each temporal sequence bin, while the red line depicts the smoothed incidence correlate ϕ (bandwith = 14). The black-grey line shows the reported (smoothed) SARS-CoV-2 cases for the respective country, acquired from OWID80. The lower panel shows the number of sequences per day in black bars, together with the cumulative number of sequences as a grey line.
Extended Data Fig. 6 Accuracy of predictions per country and lineage (related to Fig. 4) and impact of BA.4/5 boosters.
a. Each dot represents the accuracy of a specific lineage and country combination. Accuracy is determined by partitioning the frequency curve πy into days of rising (1) and falling (−1) trends, then comparing these with corresponding predictions γy: If the full envelope is positive, the prediction is rising (1) if the full envelope is negative, the prediction is falling (−1): Days with negligible frequency changes or undecided predictions (envelopes with both positive and negative values) are excluded from the analysis. b. Predicted impact of bivalent (BA4/5) booster-vaccinations on the relative number of susceptibles (relative variant fitness) γy in Germany. Top left: Timeline of bivalent booster vaccine uptake in Germany (https://impfdashboard.de/) and average number of susceptibles (averaged with regards to the circulating ‘Spike pseudo-groups’ at the time). Remaining panels: Predicted relative fitness γy for major circulating variants in Germany March 2022–April 2023, without considering vaccinations (green area) and when considering both the timeline of infection and vaccinations respectively (red outlines).
Extended Data Fig. 7 Predicted impact of ≥3rd booster-vaccinations on the relative number of susceptibles (relative variant fitness) γy in Germany.
Top left: Timeline of booster vaccine uptake in Germany (https://impfdashboard.de/). Top right: Average number of susceptibles (averaged with regards to the circulating ‘Spike pseudo-groups’ at the time). Predicted relative fitness γy for major circulating variants in Germany March 2022–April 2023, without considering vaccinations (green area) and when considering both the timeline of infection and vaccinations respectively (red outlines).
Extended Data Fig. 8 Sensitivity of the developed method to diminishing data input.
We down-sampled the sequence data to 1% for Germany and superimposed model predictions for the relative variant fitness γy using the full dataset (green areas) vs. 1% of the data (orange areas).
Extended Data Fig. 9 Estimation of the number of infected individuals in Germany (left, red) and the UK (right, blue).
The upper panels show the estimated incidence for Germany with GInPipe and for UK using the ONS survey data over a time horizon of 13.5 months (roughly a year). The grey parts of the areas depict the officially reported cases, respectively. In the lower panels, the cumulative sum of the number of infected individuals over time are compared to the population size of the respective country, denoted by the black dashed line. The scaling factor k is given by the ratio of the population size and the sum of infected individuals within the time horizon.
Extended Data Fig. 10 Utilization of incidence correlates.
The figure shows how utilization of incidence correlates may affect relative fitness estimates γy (magenta dots), depending on the choice of k (k−1 = estimate of the expected frequency of infection in the population over the time horizon of interest). Essentially, if k is correctly predicted (vertical dashed line), γy is also correctly estimated with incidence correlates ϕ(t) (in comparison to computing γy with actual infection numbers I(t); solid black horizontal line). If k is over- or under-estimated γy gets scaled (with a factor ξ). If k is underestimated, the region’s population gets under-estimated, which results in under-estimating the number of susceptibles (over-prediction of competition between variants. Note, that the qualitative results are correct in any case (inflection points γy = 0; variant ‘growth’ γy > 0 vs. variant ‘decline’ γy < 0).
Supplementary information
Supplementary Information
Supplementary Tables 1–6, containing the definition of epitope classes and their assigned antibodies (Supplementary Tables 1 and 2), spike alteration profiles for vaccine efficacy simulations (Supplementary Table 3), information regarding clinical vaccine efficacies (Supplementary Tables 4 and 5) and a summary for international viral genomics datasets (Supplementary Table 6), and Supplementary Note 1 with GISAID acknowledgements.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Raharinirina, N.A., Gubela, N., Börnigen, D. et al. SARS-CoV-2 evolution on a dynamic immune landscape. Nature 639, 196–204 (2025). https://doi.org/10.1038/s41586-024-08477-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-024-08477-8
This article is cited by
-
Demultiplexing and barcode-specific adaptive sampling for nanopore direct RNA sequencing
Nature Communications (2025)