Main

The trajectory of the SARS-CoV-2 pandemic in SSA is uncertain. To date, reported case counts and mortality in SSA have lagged behind other geographical regions. All SSA countries, with the exception of South Africa and Ethiopia, reported fewer than 100,000 total cases and fewer than 1,800 deaths as of December 2020 (Supplementary Table 1)—totals far lower than observed in Asia, Europe and the Americas (Africa Centres for Disease Control and Prevention (Africa CDC) COVID-19 daily updates https://africacdc.org/covid-19/, Johns Hopkins Coronavirus Resource Center https://coronavirus.jhu.edu/data/mortality). However, variation in reporting between countries and some seroprevalence surveys that suggested high rates of local infection5,6,7 make it unclear if the relatively few reported cases and deaths to date indicate a generally reduced epidemic potential in SSA8.

Comparisons across SSA populations based on reported infection rates are obscured by heterogeneity in surveillance capacity (for example, variation in testing rates among countries) and correlation between surveillance and infection reporting9 (Extended Data Fig. 1). Combining reported death counts with assumptions about the probability of mortality given infection2 yields generally low estimates of the percentage of the population expected to have been infected (that is, <10%) but this varies more than tenfold between SSA countries and, critically, is sensitive to assumptions about the death reporting rate (Fig. 1a) and infection fatality ratio (IFR; Fig. 1b). Serology provides an alternative and more direct measure of the percentage infected. Initial serological studies of blood banks in Kenya (5–10%)5, health care workers in urban Malawi (9–16%)7 or from Niger State in Nigeria (20–30%)6 indicate that infection rates could be higher in some settings, but only the latter was designed as a representative sample and serology-based estimates are sparse in SSA.

Fig. 1: Variation in the cumulative percentage of the population infected in SSA countries as expected from reported mortality totals.
figure 1

a,b, The expected percentage of a country’s population infected given the number of reported deaths to date, country-specific age structure and a range of death reporting completeness scenarios (a), or a range of IFR scenarios (b). The global IFR age curves were fitted to existing age-stratified IFR estimates (Methods and Supplementary Table 4) and shifted toward younger or older ages by the specified number of years to simulate higher or lower IFRs, respectively (b). Conservatively, we assumed no variation in infection rates by age. (Infections skewed toward older age groups would result in a higher average IFR and thus a lower expected percentage of the population infected for a given number of deaths.) Reported case and death counts are current as of December 2020 (sourced from the Africa CDC; Supplementary Table 1). Data from Eritrea and the Seychelles are not shown as they have zero reported deaths as of December 2020. Comparisons to serological surveys (unfilled triangles) available from blood banks in Kenya5, health care workers in urban Malawi7 and a subnational cluster-stratified random sample from Niger State in Nigeria6 are shown.

Given limitations in inference from direct measures of infection and death rates, experience from locations where the pandemic has progressed more rapidly provides a valuable basis of knowledge to assess the relative risk of populations in SSA and identify those at the greatest risk. For example, individuals in lower socioeconomic settings have been disproportionately affected in high latitude countries10,11, indicating poverty as a determinant of risk of increased severity of disease. Widespread disruptions to routine health services have been reported12,13,14 and are likely to contribute to the burden of the pandemic in SSA15. The role of other factors, from demography2,3,4 to health system context16 and intervention timing17,18, is also increasingly well-characterized. A summary of the main findings and limitations of the study is shown in Table 1.

Table 1 Policy summary

Characterizing and anticipating the trajectory of ongoing outbreaks in SSA requires considering variability in known drivers and how they might interact to increase or decrease risk across populations in SSA and relative to non-SSA settings (Fig. 2). For example, while most countries in SSA have a relatively young population age structure, suggesting a decreased burden (since SARS-CoV-2 morbidity and mortality increase with age2,3,4), prevalent infectious and noncommunicable comorbidities could counterbalance this apparent demographic advantage16,19,20,21. Similarly, SSA countries have health systems that vary greatly in their infrastructure and dense, resource-limited urban populations could have fewer options for social distancing22. Yet, decentralized, community-based health systems that benefit from past experience with epidemic response (for example, to Ebola23,24) can be mobilized. Climate is frequently invoked as a potential mitigating factor for warmer and wetter settings1, including SSA, but climate varies greatly between population centers in SSA and the reality of the existence of large susceptible populations could counteract any climate forcing during initial phases of the epidemic25. Connectivity, at international and subnational scales, also varies greatly26,27 and the time interval between viral introductions and the onset of interventions, such as lockdowns, will modulate the trajectory9. Finally, burdens of malnutrition, infectious diseases and many other underlying health conditions are higher in SSA than in other regions (Supplementary Table 2) and their interactions with SARS-CoV-2 are, as of yet, poorly understood; conversely cross-protection from either SARS-CoV-2 infection or disease as a result of previous infection by widespread circulating coronaviruses is a possibility.

Fig. 2: Hypothesized modulators of relative SARS-CoV-2 epidemic risk in SSA.
figure 2

The factors (A–F) hypothesized to increase (red) or decrease (blue) mortality burden or epidemic pace within SSA, relative to global averages, are grouped into six categories or dimensions of risk. In this framework, epidemic pace is determined by person-to-person transmissibility, which can be defined as the time-varying effective reproductive number, Rt, and introduction and geographical spread of the virus via human mobility. SARS-CoV-2 mortality (determined by the IFR) is modulated by demography, comorbidities (for example, noncommunicable diseases) and access to care. Overall burden is a function of direct burden and indirect effects due to, for example, socioeconomic disruptions and disruptions in health services, such as vaccination and infectious disease control. Supplementary Table 2 contains the details and references used as a basis to draw the hypothesized modulating pathways.

The highly variable social and health contexts of countries in SSA will drive location-specific variation in the magnitude of the burden, the time course of the outbreak and options for mitigation. In this study, we synthesized the range of factors hypothesized to modulate the potential outcomes of SARS-CoV-2 outbreaks in SSA settings by leveraging existing data sources and integrating new SARS-CoV-2-relevant mobility and climate transmission models. Data on direct measures and indirect indicators of risk factors were sourced from publicly available databases including from the World Health Organization (WHO), World Bank, United Nations Population Division, Demographic and Health Surveys (DHS-USAID), Global Burden of Diseases and WorldPop, and newly generated datasets (Extended Data Fig. 2 and Supplementary Table 3). We organized our assessment around two aspects that will shape national outcomes and response priorities in the event of widespread outbreaks: (1) the burden, or expected severity of the outcome of an infection, which emerges from age, comorbidities and health systems functioning; and (2) the rate of spread within a geographical area or pace of the pandemic.

We grouped factors that might drive the relative rates of these two features (mortality burden and pace of the outbreak) along six dimensions of risk: (1) demographic and socioeconomic parameters related to transmission and burden; (2) comorbidities relevant to burden; (3) climatic variables that may impact the magnitude and seasonality of transmission; (4) prevention measures deployed to reduce transmission; (5) accessibility and coverage of existing healthcare systems to reduce burden; and (6) patterns of human mobility relevant to transmission (Supplementary Table 2).

National scale variability in SSA among these dimensions of risk often exceeds ranges observed across the globe (Fig. 3a–d and Extended Data Fig. 3). For example, estimates of access to basic handwashing (that is, clean water and soap28) among urban households in Mali, Madagascar, Tanzania and Namibia (62–70%) exceed the global average (58%) but are <10% for Liberia, Lesotho, Democratic Republic of the Congo and Republic of Guinea-Bissau (Fig. 3d). Conversely, the range in the number of physicians is low in SSA, with all countries other than Mauritius below the global average (168.78 per 100,000 population) (Fig. 3a). Yet, estimates are still heterogeneous within SSA, with, for example, Gabon estimated to have more than 4 times the physicians of neighboring Cameroon (36.11 and 8.98 per 100,000 population, respectively). This disparity is likely to interact with social contact rates among the aged in determining exposure and clinical outcomes (for variation in household size, see Fig. 3e,f). Relative ranking across variables is also uneven among countries (Extended Data Fig. 4) with the result that this diversity cannot be easily reduced (for example, the first two principal components explain only 32.6 and 13.1% of the total variance as shown in Extended Data Fig. 5), indicating that approaches reliant on a small subset of variables will fail to capture the observed variation among SSA countries.

Fig. 3: Variation among SSA countries in select determinants of SARS-CoV-2 risk.
figure 3

ad, Right, SSA countries were ranked from least to greatest for each indicator; the bar color shows the population age structure (percentage of the population above the age of 50). The solid horizontal lines show the global mean value and the dotted lines show the mean among SSA countries. Left, The boxplots show the median, the inner bounds correspond to the interquartile range (IQR, 25th to 75th percentiles) and the outer bounds correspond to the 1.5 × IQR, grouped by WHO-defined geographic regions. SSA, Sub-Saharan Africa; AMR, Americas Region; EMR, Eastern Mediterranean Region; EUR, Europe Region; SEA, Southeast Asia Region; WPR, Western Pacific Region (n = 206, 172, 106 and 92 countries with available data for ad, respectively). e,f, Bivariate comparisons of the variables shown in a,b and c,d, respectively. The dot size shows the mean household size for households with individuals aged over 50, the dashed lines show the median value among SSA countries and the quadrants with the greatest risk are outlined in red (for example, fewer physicians and greater age-standardized chronic obstructive pulmonary disease mortality). See Supplementary Table 3 and Extended Data Fig. 3 for a full description and link to visualization of all variables.

To first evaluate variation in the burden emerging from the severity of infection outcome, we considered how demography, comorbidity and access to care might modulate the age profile of SARS-CoV-2 morbidity and mortality2,3,4. Subnational variation in the distribution of high-risk age groups indicates considerable variability, with higher burden expected in urban settings in SSA (Fig. 4a), where density and thus transmission are likely higher29.

Fig. 4: Variation in expected burden for SARS-CoV-2 outbreaks in SSA.
figure 4

a, Expected mortality in a scenario where cumulative infection reaches 20% across age groups and the IFR curve is fitted to existing age-stratified IFR estimates (Methods and Supplementary Table 4). b, National-level variation in comorbidity and access to care variables, for example, diabetes prevalence among adults and the number of hospital beds per 100,000 population for SSA countries. c, Range in mortality per 100,000 population expected in scenarios where the cumulative infection rate is 20% and IFR per age is the baseline (black) or shifted ±2, 5 or 10 years (gray). Inset, The IFR by age curves for each scenario are shown. d,e, Selected national-level indicators; estimates of reduced access to care (for example, fewer hospitals) (d) or increased comorbidity burden (for example, higher prevalence of raised blood pressure) (e) shown with darker red for higher-risk quartiles (see Extended Data Fig. 4 for all indicators). Countries missing data for an indicator are shown in gray. For comparison between countries, estimates are age-standardized where applicable (Supplementary Table 3). High-resolution maps for each variable and scenario are available at the SSA-SARS-CoV-2-tool for estimating the burden of SARS-CoV-2 in SSA (https://labmetcalf.shinyapps.io/covid19-burden-africa/).

Comorbidities and access to clinical care also vary across SSA (for example, for diabetes prevalence and hospital bed capacity, see Fig. 4b). In comparison to settings where previous SARS-CoV-2 IFR estimates have been reported, mortality due to noncommunicable diseases in SSA increases more rapidly with age (Extended Data Fig. 6) suggesting risk for an elevated IFR in some SSA settings. Conversely, an analysis of the reported age-specific death data available from Kenya and South Africa suggested low IFRs in comparison to non-SSA countries30. Comparison of empirical age profiles of mortality more broadly across SSA is currently limited by the small number of total deaths reported to date for many countries (for example, 33 of 48 SSA countries have reported fewer than 200 total deaths as of December 2020) and incomplete associated age data. Consequently, we used the global IFR by age estimates and explored the potential effect on mortality of deviations from the expected baseline IFR in diverse SSA settings.

Small shifts (for example, of 2–10 years of age) in the IFR profile resulted in large effects on expected mortality for a given level of infection. For example, Chad, Burkina Faso and the Central African Republic, while among the youngest SSA countries, have a high prevalence of diabetes and low density of hospital beds. Given the age structure of these countries, a slight shift in the IFR by age profile toward higher mortality in middle-aged groups (for example, ages 50–60 years) would result in mortality increasing to a rate that would exceed a majority of the other, relatively older SSA countries at the unshifted baseline (Fig. 4c and Methods). Generally, minor shifts in the IFR lead to differences larger than the magnitude of the difference expected from differing age structures for countries in SSA.

Although there is greater access to care in older populations by some metrics (Fig. 3a; correlation between age and the number of physicians per capita, r = 0.896, P < 0.001), access to clinical care is highly variable overall (Fig. 4d) and maps poorly to indicators of comorbidity (Fig. 4e). Empirical data are urgently needed to assess the extent to which the IFR-age-comorbidity associations observed elsewhere are applicable to SSA settings with reduced access to advanced care. Yet both surveillance and mortality registration31 are frequently under-resourced in SSA, complicating both evaluating and anticipating the burden of the pandemic and underscoring the urgency of strengthening existing systems24.

The frequency of viral introduction to each country, likely governed by international air travel in SSA32, determines both the timing of the first infections and the number of initial infection clusters that seed subsequent outbreaks. The relative importation risk among SSA cities and countries was assessed by compiling data from 108,894 flights arriving at 113 international airports in SSA from January to April 2020 (Fig. 5a), stratified by the SARS-CoV-2 status at the departure location on the day of travel (Fig. 5b). A small subset of SSA countries received a disproportionately large percentage (for example, South Africa, Ethiopia, Kenya and Nigeria together contributed 47.9%) of the total travel from countries with confirmed SARS-CoV-2 infections, which likely contributed to variation in the pace of the pandemic across settings and is consistent with those 4 countries together contributing 74.3% of all reported cases in SSA as of 20 December 2020 (refs. 32,33).

Fig. 5: Variation in connectivity and climate in SSA and expected effects on SARS-CoV-2.
figure 5

a, International travelers to SSA from January to April 2020, as inferred from the number of passenger seats on arriving aircrafts. b, For the 4 countries with the most arrivals, the proportion of arrivals by month coming from countries with 0, 1–100, 101–1,000 and 1000+ reported SARS-CoV-2 infections at the time of travel (see Supplementary Table 5 for all others) is shown. c, Connectivity within SSA countries as inferred from average population-weighted mean travel time to the nearest urban area with a population >50,000. d, Mean travel time at the national level and variation in the fraction of the population expected to be infected (I/N) in the first year from stochastic simulations (Methods). e, Climate variation across SSA as shown by seasonal range in specific humidity, q (g kg−1) (max average q − min average q). f, The effect of local seasonality and control efforts (R0 decreases by 0%, that is, unmitigated, 10 or 20%) on the timing of epidemic peaks (max I/N) in SSA cities (with three exemplar cities highlighted in pink; Methods).

Once local chains of infection are established, the rate of spread within countries will be shaped by efforts to reduce spread, such as handwashing and other non-pharmaceutical interventions (Fig. 3d), population contact patterns including mobility and urban crowding29 (Fig. 3c) and potentially the effect of climatic variation1. Where countries fall across this spectrum of pace will shape interactions with lockdowns and determine the length and severity of disruptions to routine health system functioning.

Subnational connectivity varies greatly across SSA, both between subregions of a country and between cities and their rural periphery (for example, as indicated by travel time to the nearest city with a population over 50,000; Fig. 5c). As expected, in stochastic simulations using estimates of viral transmission parameters and mobility (Extended Data Fig. 7), a smaller cumulative proportion of the population is infected at a given time in countries with larger populations in less connected subregions (Fig. 5d and Extended Data Fig. 8); including non-pharmaceutical interventions reduces this proportion still further. At the national level, susceptibility declines more slowly and more unevenly in such settings (for example, Ethiopia, South Sudan, Tanzania) due to a lower probability of introductions and reintroductions of the virus locally, an effect amplified by lockdowns (Extended Data Fig. 9). It is unclear whether the more prolonged, asynchronous epidemics expected in these countries or the overlapping, concurrent epidemics expected in countries with higher connectivity (for example, Malawi, Kenya, Burundi) will be a greater stress to health systems. Outbreak control efforts are likely to be further complicated during prolonged epidemics if they intersect with seasonal events such as temporal patterns in human mobility29 or other infections (for example, malaria).

Despite extreme variation among cities in SSA (Fig. 5e), large epidemic peaks are expected in all cities (Fig. 5f), even from our models incorporating interventions and transmission rates that decline in response to warmer, more humid local climates (climate-dependent variation in transmission rate for coronaviruses inferred from endemic circulation in the United States, but robust to parameter value choice; Methods). After accounting for differences in the date of introductions, simulated climate forcing generates a maximum of only 6–7 weeks’ variation in the time to epidemic peaks, with peaks generally expected earlier in more southerly, colder, drier cities (for example, Windhoek and Maseru) and later in more humid, coastal cities (for example, Bissau, Lomé and Lagos). Reductions in transmission due to control efforts, as expected, prolong the time to epidemic peak (Fig. 5f and Extended Data Fig. 10). Apart from these slight shifts in timing, the large proportion of the population that is susceptible overwhelm the effects of climate25; earlier suggestions that Africa’s generally more tropical environment alone may provide a protective effect1 are not supported by evidence.

Our synthesis emphasizes striking country-to-country variation in the drivers of the pandemic in SSA (Fig. 3), indicating that continued variation in the burden (Fig. 4) and pace (Fig. 5) is to be expected even across low-income settings. Since small perturbations in the age profile of mortality could drastically change the national-level burden in SSA (Fig. 4), building expectations for the risk for each country requires monitoring for deviations in the pattern of morbidity and mortality over age. Transparent and timely communication of these context-specific risk patterns could aid community engagement in efforts to reduce transmission, help motivate population behavioral changes and guide existing networks of community case management.

Because the largest impacts of SARS-CoV-2 outbreaks may be through indirect effects on routine health provisioning, understanding how existing programs may be disrupted differently by acute versus longer outbreaks is crucial to planning resource allocation. For example, population immunity will decline proportionally with the length of disruptions to routine vaccination programs34, resulting in more severe consequences in areas with prolonged epidemic time courses.

Others have suggested that this crisis presents an opportunity to unify and mobilize across existing health programs (for example, for human immunodeficiency virus (HIV), tuberculosis, malaria and noncommunicable diseases)24. Although this might be a powerful strategy in the context of acute, temporally confined crises, long-term distraction and diversion of resources35 could be harmful in settings with extended, asynchronous epidemics. A higher risk of infection among healthcare workers during epidemics36,37 could also amplify this risk.

As evidenced by failures in locations where the epidemic progressed rapidly (for example, United States), effective governance and management before reaching large case counts will likely yield the largest rewards. Generalizing across SSA is difficult because the time course and estimates of the effect of intervention policies have varied greatly (Extended Data Fig. 9); however, Mauritius and Rwanda, for example, have reported extremely low incidence thanks in part to a well-managed early response.

The burden and time course of SARS-CoV-2 is expected to be highly variable across SSA. Simulations show that variation in international and subnational connectivity are expected to be important determinants of pace, but variability in reporting regimes makes it difficult to compare observations to date with expectations (Extended Data Fig. 7). As the outbreak continues to unfold, critically evaluating this mapping (Extended Data Fig. 8) can focus surveillance efforts to areas expected to have prolonged epidemic trajectories and high mortality burdens. The emergence and rapid spread in southern Africa of lineage B.1.351, with multiple spike protein mutations including the N501Y mutation associated with increased transmission rate in the UK lineage B.1.1.7, indicates the importance of genomic surveillance of transmission foci in SSA38. Additional immunological surveys and country-specific analyses of the age profile of mortality are urgently needed in SSA and will likely be a powerful lens for understanding the current landscape of population risk39. When considering hopeful futures with the distribution of SARS-CoV-2 vaccines, it is imperative that vaccine distribution be equitable and in proportion with need. Understanding factors that both drive spatial variation in vulnerable populations and temporal variation in pandemic progression could help approach these goals in SSA.

Methods

Reported SARS-CoV-2 case counts, mortality and testing in SSA as of December 2020

Variables and data sources for reporting data

The numbers of reported cases, deaths and tests for the 48 SSA countries studied (Supplementary Table 1) were sourced from the Africa CDC dashboard on 20 December 2020 (and previously on 23 September and 30 June 2020). The Africa CDC obtains data from the official Africa CDC Regional Collaborating Centre and member state reports. Differences in the timing of reporting by member states results in some variation in the recency of data within the centralized Africa CDC repository, but data should broadly reflect the relative scale of testing and reporting efforts across countries. For Mauritius (https://covid19.mu/) and Rwanda (https://covid19.who.int/region/afro/country/rw), reporting to the Africa CDC was confirmed by comparison to country-specific dashboards.

The countries or member states within SSA in this study follow the United Nations and Africa CDC-listed regions of Southern, Western, Central and Eastern Africa (excluding Sudan). From the Northern Africa region, Mauritania is included in SSA.

For comparison to non-SSA countries, the number of reported cases in other geographical regions were obtained from the Johns Hopkins University Coronavirus Resource Center on 23 September 2020 (https://coronavirus.jhu.edu/map.html).

Case fatality ratios (CFRs) were calculated by dividing the number of reported deaths by the number of reported cases and expressed as a percentage. Positivity was calculated by dividing the number of reported cases by the number of reported tests. Testing and case rates were calculated per 100,000 population using population size estimates for 2020 from the United Nations Population Division (https://population.un.org/wpp/Download/Standard/Population/). Since reported confirmed cases are likely to be an underestimate of the true number of infections, CFRs may be a poor proxy for the IFR, defined as the proportion of infections that result in mortality4.

Variation in testing and mortality rates

Testing rates among SSA countries varied by multiple orders of magnitude as of 30 June and remain highly variable as of 23 September and 20 December 2020. The number of tests completed per 100,000 population ranged from 19.84 in Burundi to 13,508.13 in Mauritius in June 2020; from 65.98 in the Democratic Republic of the Congo to 18,321.83 in Mauritius in September 2020; and from 100.9 in the Democratic Republic of the Congo to 23695.0 in Mauritius in December 2020 (Extended Data Fig. 1a). Tanzania (6.50 tests per 100,000 population) has not reported new tests, cases or deaths to the Africa CDC since April 2020. The number of reported infections (that is, positive tests) was strongly correlated with the number of tests completed in June 2020 (Pearson’s correlation coefficient, r = 0.9667, P < 0.001), September 2020 (r = 0.9689, P < 0.001) and December 2020 (r = 0.9750, P < 0.001) (Extended Data Fig. 1b). As of June 2020, no deaths due to SARS-CoV-2 were reported to the Africa CDC for five SSA countries (Eritrea, Lesotho, Namibia, Seychelles, Uganda). As of December 2020, still no deaths due to SARS-CoV-2 were reported to the Africa CDC for two of those countries (Eritrea and Seychelles). Among countries with at least 1 reported death, the CFR varied from 0.22% in Rwanda to 8.54% in Chad in June 2020; from 0.21% in Burundi to 6.96% in Chad in September 2020; and from 0.26% in Burundi to 5.40% in Chad in December 2020 (Extended Data Fig. 1c). Limitations in the ascertainment of infection rates and the rarity of reported deaths (for example, the median number of reported deaths per SSA country was 25.5 as of June 2020, 71.0 as of September 2020 and 101.0 as of December 2020), indicate that the data are insufficient to determine country-specific IFRs and IFR by age profiles for most countries. As a result, global IFR by age estimates was used for the subsequent analyses in this study.

Synthesizing factors that increase or decrease SARS-CoV-2 epidemic risk in SSA

Variable selection and data sources for variables associated with an increased probability of severe clinical outcomes for an infection

To characterize epidemic risk, defined as potential SARS-CoV-2 related morbidity and mortality, we first synthesized factors hypothesized to influence risk in SSA settings (Supplementary Table 2). Early during the pandemic, evidence suggested that age was an important risk factor associated with morbidity and mortality associated with SARS-CoV-2 infection40, a pattern subsequently confirmed across settings2,11,41. Associations between SARS-CoV-2 mortality and comorbidities including hypertension, diabetes and cardiovascular disease emerged early40 and have been observed across settings, with further growing evidence for associations with obesity11,42, severe asthma11 and the respiratory effects of pollution43. Specific to Africa, vulnerability scores based on these hypothesized associations or combinations of risks factors have been developed (for example, refs. 44,45).

Many possible sources of bias complicate interpretation of these associations46; while they provide a useful baseline, inference is also likely to change as the pandemic advances. To reflect this, our analysis combined a number of high-level variables likely to broadly encompass these putative risk factors (for example, noncommunicable disease-related mortality and healthy life expectancy) with more specific measures encompassed in evidence to date (for example, prevalence of diabetes, obesity and respiratory illness, such as chronic obstructive pulmonary disease). We also included measures relating to infectious diseases, undernourishment and anemia given their interaction and effects in determining health status in these settings47. Although interactions with such infectious diseases have been suggested, evidence is limited to date, except for HIV, where effects have been suggested to be minor48. We also note that the key concern raised around such infections to date is associated with disruption to routine screening (for example, for malaria49), treatment50 or prevention programs51.

Data on the identified indicators were sourced in May 2020 from the WHO Global Health Observatory database (https://www.who.int/data/gho), World Bank (https://data.worldbank.org/) and other sources detailed in Supplementary Table 3. National-level demographic data (population size and age structure) was sourced from the United Nations World Population Prospects and data on subnational variation in demography was sourced from WorldPop27. Household size data was defined by the mean number of individuals in a household with at least 1 person aged >50 years, taken from the most recently available Demographic and Health Surveys data (https://dhsprogram.com). All country-level data for all indicators can be found online at the SSA-SARS-CoV-2 tool (https://labmetcalf.shinyapps.io/covid19-burden-africa/).

Comparisons of national-level estimates sourced from the WHO and other sources are affected by variation within countries and variation in the uncertainty around estimates from different geographical areas. To assess potential differences in data quality between geographical areas, we compared the year of the most recent data for the variables (Extended Data Fig. 2). The mean (range varied from 2014.624 to 2014.928 by region) and median year (2016 for all regions) of the most recent data varied little between regions. To account for the uncertainty associated with the estimates available for a single variable, we also included multiple variables per category (for example, demographic and socioeconomic factors, comorbidities, access to care) to avoid reliance on a single metric. This allowed exploring variation between countries across a broad suite of variables likely to be indicative of the different dimensions of risk.

Although including multiple variables that were likely to be correlated (see the principal component analysis (PCA) methods below for further discussion) would bias inference of cumulative risk in a statistical framework, we did not attempt to quantitatively combine risk across variables for a country, nor project risk based on the variables included in this study. Rather, we characterized the magnitude of variation among countries for these variables (see Fig. 3 for a subset of the variables and Fig. 4b for the bivariate risk maps following Chin et al.52) and then explored the range of outcomes that would be expected under scenarios where the IFR increases with age at different rates (Fig. 4).

Variable selection and data sources for variables modulating the rate of viral spread

In addition to characterizing variation among factors likely to modulate burden, we also synthesized data sources relevant to the rate of viral spread, or pace, for the SARS-CoV-2 pandemic in SSA. Factors hypothesized to modulate viral transmission and geographical spread include climatic factors (for example, specific humidity), access to prevention measures (for example, handwashing) and human mobility (for example, international and domestic travel). Supplementary Table 2 outlines the dimensions of risk selected and references the previous studies relevant to the selection of these factors.

Climate data were sourced from the global, gridded ERA5 dataset53 where model data were combined with global observation data (see Methods for climate-driven modeling of SARS-CoV-2 section for further details).

International flight data were obtained from a custom report from OAG Aviation Worldwide (UK) and included the departure location, arrival airport, date of travel and number of passenger seats for flights arriving to 113 international airports in SSA (see International air travel to SSA section).

As an estimate of connectivity within subregions of countries, the population-weighted mean travel time to the nearest city with a population greater than 50,000 was determined; details are provided in the section on Subnational connectivity among countries in SSA. To obtain a set of measures that broadly represent connectivity within different countries in the region, friction surfaces from Weiss et al.26 were used to obtain estimates of the connectivity between different administrative level 2 units within each country. Details of this, alongside the metapopulation model framework used to simulate viral spread with variation in connectivity are found in the Subnational connectivity section.

Figure 3 shows variation among SSA countries for four of the variables and Extended Data Fig. 3 links to visualizations of variation for all variables. Figure 4 shows variation for a subset of the comorbidity and access to care indicators as a heatmap and Extended Data Fig. 4 shows variation for all the variables (also available at https://labmetcalf.shinyapps.io/covid19-burden-africa/).

PCA of variables considered

Selection of data and variables

The 29 national-level variables from Supplementary Table 3 were selected for the PCA. We conducted further PCA on the subset of 8 indicators related to access to healthcare (category E) and the 14 national indicators variables related to comorbidities (category B).

We excluded disaggregated subnational spatial variation data (variables A2, C1, E2 and category F), disaggregated or redundant variables derived from variables already included (variables A4 and D2) and disaggregated age-specific disease data from the Institute for Health Metrics and Evaluation (IHME) global burden of disease study (variables B2, B4 and B13) from the PCA analysis. COVID-19 tests per 100,000 population (variable D4; Supplementary Table 1), per capita gross domestic product (GDP) (variable A8) and the Gini index of wealth inequality (variable A9) were used to visualize patterns among SSA countries.

In some cases, data were missing for a country for an indicator; in these cases, missing data were replaced with a zero value. This is a conservative approach since zero values (that is, outside the range of typical values seen in the data) inflate the total variance in the dataset and thus, if anything, deflate the percentage of the variance explained by the PCA. Therefore, this approach avoids mistakenly attributing predictive value to principal components due to incomplete data. See Supplementary Table 3 for data sources for each variable.

PCA

The PCA was conducted on each of the three subsets described above using the scikit-learn library54. To avoid biasing the PCA due to large differences in magnitude and scale, each feature was centered around the mean and scaled to unit variance before the analysis. Briefly, PCA applies a linear transformation to a set of n features to output a set of n orthogonal principal components that are uncorrelated and each explain a percentage of the total variance in the dataset55. A link to the code for this analysis is available at https://labmetcalf.shinyapps.io/covid19-burden-africa/.

The principal components were then analyzed for the percentage of variance explained and compared to: (1) the number of COVID-19 tests per 100,000 population as of the end of June 2020 (Supplementary Table 1); (2) the per capita GDP; and (3) the Gini index of wealth inequality. For the Gini index, estimates from 2008 to 2018 were available for 45 of the 48 countries (no Gini index data were available for Eritrea, Equatorial Guinea and Somalia).

The first 2 principal components from the analysis of 29 variables explain 32.6 and 13.1% the total variance, respectively, in the dataset. Countries with higher numbers of completed SARS-CoV-2 tests reported tended to associate with an increase in principal component 1 (r = 0.67, P = 1.1 × 10−7; Extended Data Fig. 5a). Similarly, countries with a high GDP seemed to associate with an increase in principal component 1 (r = 0.80, P = 6.02 × 10−12; Extended Data Fig. 5b). In contrast, countries with greater wealth inequality (as measured by the Gini index) were associated with a decrease in principal component 2 (r = −0.42, P = 0.0042; Extended Data Fig. 5c). Despite these correlations, a relatively low percentage of variance was explained by each principal component: for the 29 variables, 13 of the 29 principal components were required to explain 90% of the variance (Extended Data Fig. 5d). When only the access to care subset of variables is considered, the first 2 principal components explain 50.7 and 19.1% of the variance, respectively, and 5 of 8 principal components are required to explain 90% of the variance. When only the comorbidities subset is considered, the first two principal components explain 27.9 and 17.8% of the variance, respectively, and 9 of 14 principal components are required to explain 90% of the variance (Extended Data Fig. 5d).

These data suggest that intercountry variation in this dataset is not easily explained by a small number of variables. Moreover, although correlations exist between principal components and high-level explanatory variables (testing capacity, wealth), their magnitude is modest. These results highlight that dimensionality reduction is unlikely to be an effective analysis strategy for the variables considered in this study. Despite this overall finding, the PCA on the access to care subset of variables highlights that the variance in these variables is more easily explained by a small number of principal components and hence may be more amenable to dimensionality reduction. This finding is unsurprising since, for example, the number of hospital beds per 100,000 population is likely to be directly related to the number of hospitals per 100,000 population (r = 0.60, P = 5.7 ×10 −6 for SSA). In contrast, for comorbidities, the relationship between different variables is less clear. Given the low percentages of variation captured by each principal component, and the high variability between different types of variables, these results motivate a holistic approach to using these data for assessing relative SARS-CoV-2 risk across SSA.

Evaluating the burden emerging from the severity of infection outcome

Data sourcing: empirical estimates of IFR

Estimates of the IFR that account for asymptomatic cases, underreporting and delays in reporting are few; however, it is evident that the IFR increases substantially with age56. We used age-stratified estimates of IFR from three studies (two published2,4 and one preprint3) that accounted for these factors in their estimation (Supplementary Table 4).

To apply these estimates to other age-stratified data with different bin ranges and generate continuous predictions of IFR with age, we fitted the relationship between the midpoint of the age bracket and the IFR estimate using a generalized additive model using the mgcv package v1.8-33 (ref. 57) in R v.4.0.2 (ref. 58). We used a beta distribution as the link function for the IFR estimates (data distributed on [0, 1]). For the upper age bracket (80+ years), we took the upper range to be 100 years and the midpoint to be 90.

We assumed a given level of cumulative infection (20% in each age class, that is, a constant rate of infection among age classes) and then applied IFRs by age to the population structure of each country to generate estimates of burden. Age structure estimates were taken from the United Nations World Population Prospects (Supplementary Table 3) country-level estimates of population in 1-year age groups (0–100 years of age) to generate estimates of burden.

Comorbidities over age from the IHME

Applying these IFR estimates to the demographic structure of SSA countries provides a baseline expectation for mortality but depends on the assumption that mortality patterns in SSA are similar to those from where the IFR estimates were sourced (France, China and Italy). Comorbidities have been shown to be an important determinant of the severity of infection outcomes (that is, IFR). To assess the relative risk of comorbidities across age in SSA, estimates of comorbidity severity by age (in terms of annual deaths attributable) were obtained from the IHME Global Burden of Disease (GBD) study in 2017 (ref. 59). Data were accessed through the GBD results tool for cardiovascular disease, chronic respiratory disease (not including asthma) and diabetes, reflecting three categories of comorbidity with demonstrated associations with risk (Supplementary Table 2). We assumed that higher mortality rates due to these noncommunicable diseases, especially among younger age groups, is indicative of increased severity and lesser access to sufficient care for these diseases, suggesting an elevated risk for their interaction with SARS-CoV-2 as comorbidities. While there are uncertainties in these data, they provide the best estimates of age-specific risks and have been used previously to estimate populations at risk20.

The comorbidity by age curves for SSA countries were compared to those for the three countries from which SARS-CoV-2 IFR by age estimates were sourced. Attributable mortality due to all three noncommunicable disease categories was higher at age 50 in all 48 SSA countries when compared to estimates from France and Italy and for 42 of 48 SSA countries when compared to China (Extended Data Fig. 6).

Given the potential for populations in SSA to experience a differing burden of SARS-CoV-2 due to their increased severity of comorbidities in younger age groups, we explored the effects of shifting IFRs estimated by the generalized additive model of IFR estimates from France, Italy and China younger by 2, 5 and 10 years (Fig. 3).

International air travel to SSA

The number of passenger seats on flights arriving to international airports were grouped by country and month for January to April 2020 (Supplementary Table 5), the months when the introduction of SARS-CoV-2 to SSA countries was likely to have first occurred. The first confirmed case reported from an SSA country, according to the Johns Hopkins Coronavirus Research Center was in Nigeria on 28 February 2020. By 31 March 2020, 43 of 48 SSA countries had reported SARS-CoV-2 infections and international travel was largely restricted by April. Lesotho was the last SSA country to report a confirmed SARS-CoV-2 infection (on 13 May 2020); however, given the difficulties in surveillance, the first reported detections were likely delayed relative to the first importations of the virus. The probability of importation of the virus is defined by the number of travelers from each source location, each date and the probability that a traveler from that source location on that date was infectious. Due to limitations in surveillance, especially early in the SARS-CoV-2 pandemic, empirical data on infection rates among travelers were largely lacking. To account for differences in the status of the SARS-CoV-2 pandemic across source locations and thus differences in the importation risk for travelers from those locations, we coarsely stratified travelers arriving each day into 4 categories based on the status of their source countries: (1) travelers from countries with zero reported cases (that is, although undetected transmission was possibly occurring, SARS-CoV-2 had not yet been confirmed in the source country by that date); (2) those traveling from countries with more than 1 reported case (that is, SARS-CoV-2 had been confirmed to be present in that source country by that date); (3) those traveling from countries with more than 100 reported cases (indicating community transmission was likely beginning); and (4) those traveling from countries with more than 1,000 reported cases (indicating widespread transmission).

To determine reported case counts at source locations for travelers, no cases were reported outside China until 13 January 2020 (the date of the first reported case in Thailand). Over 13 January to 21 January, cases were then reported in Japan, South Korea, Taiwan, Hong Kong and the United States (https://covid19.who.int/). Subsequently, counts per country were tabulated daily by the Johns Hopkins Coronavirus Resource Center60 beginning on 22 January (https://coronavirus.jhu.edu/map.html); we used the data from 22 January onwards and the WHO reports before 22 January.

The number of travelers within each category arriving per month is shown in Supplementary Table 5. This approach makes the conservative assumption that the probability a traveler is infected reflects the general countrywide infection rate of the source country at the time of travel (that is, travelers are not more likely to be exposed than non-travelers in that source location) and does not account for complex travel itineraries (that is, a traveler from a high-risk source location transiting through a low-risk source location would be grouped with other travelers from the low-risk source location). Consequently, the risk for viral importation is likely systematically underestimated. However, since the relative risk for viral importation will still scale with the number of travelers, comparisons among SSA countries can be informative (for example, SSA countries with more travelers from countries with confirmed SARS-CoV-2 transmission are at higher risk for viral importation).

Subnational connectivity among countries in SSA

Indicators of subnational connectivity

To allow comparison of the relative connectivity across countries, we used the friction surface estimates provided by Weiss et al.26 as a relative measure of the rate of human movement between subregions of a country. For connectivity within subregions of a country (for example, transport from a city to the rural periphery), we used as an indicator the population-weighted mean travel time to the nearest urban center (that is, population density >1,500 per square kilometer or a density of built-up areas >50% coincident with population >50,000) within administrative 2 units61. For some countries, estimates at administrative 2 units were unavailable (Comoros, Cape Verde, Lesotho, Mauritius, Mayotte and Seychelles); estimates at the administrative-1 unit level were used for these cases (these were all island nations, with the exception of Lesotho).

Metapopulation model methods

Once SARS-CoV-2 has been introduced into a country, the degree of spread of the infection within the country is governed by subnational mobility: the pathogen is more likely to be introduced into a location where individuals arrive more frequently than one where incoming travelers are less frequent. Large-scale consistent measures of mobility are rare. However, recently, estimates of accessibility have been produced at a global scale26. Although this is unlikely to perfectly reflect mobility within countries, especially since interventions and travel restrictions are put in place, it provides a starting point for evaluating the role of human mobility in shaping the outbreak pace across SSA. We used the inverse of a measure of the cost of travel between the centroids of administrative level 2 spatial units to describe mobility between locations (estimated by applying the costDistance function in the gdistance package v1.3-6 in R to the friction surfaces supplied in Weiss et al.26). With this, we developed a metapopulation model for each country to develop an overview of the possible range of trajectories of unchecked spread of SARS-CoV-2.

We assumed that the pathogen first arrives in each country in the administrative 2 level unit with the largest population (for example, the largest city) and the population in each administrative 2 level (of size Nj) is entirely susceptible at the time of arrival. We then tracked the spread within and between each of the administrative 2 level units of each country. Within each administrative 2 level unit, dynamics are governed by a discrete time susceptible (S), infected (I) and recovered (R) model with a time step of approximately one week, which is broadly consistent with the serial interval of SARS-CoV-2. Within the spatial unit indexed j, with total size Nj, the number of infected individuals in the next time step is defined by:

$$I_{j,t + 1} = \beta I_{j,t}^\alpha S_{j,t}/N_j + \iota _{j,t}$$

where β captures the magnitude of transmission over the course of one discrete time step; since the discrete time step chosen is set to approximate the serial interval of the virus, this will reflect the R0 of SARS-CoV-2, and is thus set to 2.5; the exponent α = 0.97 is used to capture the effects of discretization62 and Ij,t captures the introduction of new infections into site j at time t. Susceptible and recovered individuals are updated according to:

$$\begin{array}{l}S_{j,t + 1} = S_{j,t} + wR_{j,t} - I_{j,t + 1} + b\\ R_{j,t + 1} = (1 - w)R_{j,t} + I_{j,t}\end{array}$$

where b reflects the introduction of new susceptible individuals resulting from the birth rate, set to reflect the most recent estimates for that country from the World Bank Data (https://data.worldbank.org/indicator/SP.DYN.CBRT.IN), and w reflects the rate of waning of immunity. The population is initiated with Sj,1 = NjRj,1 = 0, and Ij,1 = 0 except for the spatial unit corresponding to the largest population size Nj for each country since this is assumed to be the location of introduction; for this spatial unit, we set Ij,1 = 1.

We made the simplifying assumption that mobility linking locations i and j, denoted as ci,j, scales with the inverse of the cost of travel between sites i and j evaluated according to the friction surface provided in Weiss et al.26. The introduction of an infected individual into location j is then defined by a draw from a Bernouilli distribution following:

$$\iota _{j,t} \approx {\mathrm{Bernouilli}}\left( {1 - {\mathrm{exp}}\left( { - \mathop {\sum }\limits_1^L {c_{i,j}}{I_{i,t}}/{N_i}} \right)} \right)$$

where L is the total number of administrative 2 units in that country and the rate of introduction is the product of connectivity between the focal location and each other location multiplied by the proportion of population in each other location that is infected.

Some countries show rapid spread between administrative units within the country (for example, a country with parameters that broadly reflect those available for Malawi; Extended Data Fig. 7), while in others (for example, reflecting Madagascar), connectivity may be so low that the outbreak may be over in the administrative unit of the largest size (where it was introduced) before introductions successfully reach other poorly connected administrative units. Where duration of immunity is sufficiently long, the result may be a hump-shaped relationship between the proportion of the population that is infected after five years and the time to the first local extinction of the pathogen (Extended Data Fig. 7, top right). In countries with lower connectivity (for example, resembling Madagascar), local outbreaks can go extinct rapidly before traveling very far; in other countries (for example, resembling Gabon), the pathogen goes extinct rapidly because it travels rapidly and rapidly depletes susceptible individuals everywhere. The U-shaped pattern diminishes as the rate of waning of immunity increases and is replaced by a monotonic negative relationship. With sufficiently rapid waning of immunity, local extinction ceases to occur in the absence of control efforts.

The impact of the pattern of travel between centroids is echoed by the pattern of travel within administrative districts: countries where the pathogen does not reach a large fraction of the administrative 2 units within the country in five years are also those where within-administrative-unit travel is low (Extended Data Fig. 7, right).

These simulations provide a window into qualitative patterns expected for subnational spread of the pandemic virus but there is no clear way of calibrating the absolute rate of travel between regions of relevance for SARS-CoV-2; this is further complicated by the remaining uncertainties around rates of waning of immunity. Thus, the time scales of these simulations should be considered in relative, rather than absolute terms. Variation in lockdown effectiveness, or other changes in mobility for a given country, may also compromise relative comparisons as might large volumes of land border crossings in some settings, which we have not accounted for in this study. Variability in testing and case reporting complicates clarifying this (Extended Data Fig. 7, bottom left and bottom right, respectively) but we have highlighted countries with less connectivity (that is, less synchronous outbreaks expected) relative to the median among SSA countries and with older populations (that is, a greater proportion in higher-risk age groups) (Extended Data Fig. 8).

The University of Oxford’s Blavatnik School of Government generated composite scores of government response, interventions for containment and economic support provided, with each scored from 0 to 100 (Coronavirus Government Response Tracker; https://www.bsg.ox.ac.uk/research/research-projects/coronavirus-government-response-tracker). These data were compared with the day on which ten cases were exceeded in a country according to the Johns Hopkins dashboard data (Johns Hopkins Coronavirus Resource Center; https://coronavirus.jhu.edu/map.html).

While faster waning of immunity will act to increase the rate of spread of the infection, resulting in a higher proportion infected after one year, control efforts will generally act to slow the rate of spread of the infection (Extended Data Fig. 9). Since different countries are likely to have differently effective control efforts (Extended Data Fig. 9), this precludes making country-specific predictions as to the relative impact of control efforts on delay.

Modeling epidemic trajectories in scenarios where transmission rate depends on climate

Climate data sourcing: variation in humidity in SSA

Specific humidity data for selected urban centers comes from the ERA5 using an average climatology (1981–2017)53; we did not consider year-to-year climate variations. Selected cities (n = 56) were chosen to represent the major urban areas in SSA. The largest city in each SSA country was included as well as any additional cities that were among the 25 largest cities or busiest airports in SSA.

Methods for climate-driven modeling of SARS-CoV-2

We used a climate-driven susceptible-infected-recovered-susceptible model to estimate epidemic trajectories (that is, the time of peak incidence) in different cities in 2020, assuming no control measures were in place or a 10 or 20% reduction in R0 beginning 2 weeks after the total reported cases for a country exceeded 10 cases25,63. The model is given by:

$$\frac{{\mathrm{d}}S}{{\mathrm{d}}t} = \frac{{N - S - L}}{L} - \frac{{\beta (t)IS}}{N}$$
$$\frac{{\mathrm{d}}I}{{\mathrm{d}}t} = \frac{{\beta (t)IS}}{N} - \frac{I}{D}$$

where S is the susceptible population, I is the infected population and N is the total population. D is the mean infectious period, set at 5 d following ref. 25.

To investigate the effects on epidemic trajectories of a climate dependency of SARS-CoV-2 on cities with the climate patterns of the selected cities in SSA, we used parameters from the most climate-dependent scenario in ref. 25, based on the endemic betacoronavirus HKU1 in the United States. In this scenario L, the duration of immunity, was 66.25 weeks (that is, >1 year and such that waning immunity did not affect the timing of the epidemic peak). We initially selected a range where R0 declined from R0max = 2.5 to R0min = 1.5 (that is, transmission declined 40% at high humidity) since this exceeds the range observed for influenza and other coronaviruses for which data are available (from the United States). R0max = 2.5 was chosen because 2.5 is often cited as the approximate R0 for SARS-CoV-2. Thus, we initially assumed that the climate dependence of SARS-CoV-2 in SSA would not greatly exceed that of other known coronaviruses from the US context. Then, we explored the effects of different degrees of climate dependency (that is, wider ranges between R0max = 2.5 to R0min = 1.5 and scenarios where R0min approached 1) (Extended Data Fig. 10).

Transmission is governed by β(t), which is related to the basic reproduction number R0 by R0(t) = β(t)D. The basic reproduction number varies based on climate and is related to specific humidity according to the equation:

$$R_0 = {\mathrm{exp}}{[a \times q(t) + {\mathrm{log}}(R_{0{\mathrm{max}}} - R_{0{\mathrm{min}}})]} + R_{0{\mathrm{min}}}$$

where q(t) is specific humidity53 and a is set at −227.5 based on estimated HKU1 parameters25. We assumed the time of introduction for cities to be the date at which the total reported cases for a country exceeded 10 cases.

Sensitivity analysis

Selecting an R0min value of 1, such that epidemic growth stops at high humidities, is likely implausible since simulations indicated no outbreaks would occur in cities such as Antananarivo (countered by the observation that SARS-CoV-2 outbreaks did in fact occur) (Extended Data Fig. 10b; see Supplementary Table 1 for the reported case counts at the country level). Expanding the range between R0min and R0max by increasing R0max resulted in epidemic peaks being reached earlier after outbreak onset but did not increase the difference in timing between cities with different climates (Extended Data Fig. 10c; for example, the difference in timing between peaks in Windhoek and Lomé is similar in 10a and 10c). Finally, we explored scenarios where the R0min was between 1.0 and 1.5. When R0min > 1.1, epidemic peaks were seen in each SSA city with the difference in timing of the peak growing larger when smaller values of R0min were selected (Extended Data Fig. 10d). However, the difference in timing, even when small values of R0min were selected, was a maximum of 25 weeks and rapidly reduced to only a few weeks when R0min approached 1.5.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.