Introduction

During the first two years of coronavirus disease 2019 (COVID-19), numerous studies have highlighted disparities in infection rates and fatalities. Individuals living in poverty exhibited higher rates of mortality compared to their more affluent counterparts. In the United States, it was observed that Hispanic, Black, and indigenous populations were disproportionately affected by COVID-19 infections compared to White individuals (Stafford et al. 2020; Abedi et al. 2020). A similar trend was observed in the United Kingdom, where Black, Asian, and Middle Eastern (BAME) groups experienced a higher incidence of infections (Windsor-Shellard, & Kaur, 2020). Based on 81 Upper Tier Local Authority (UTLA) districts of UK, Basu et al. (2021) documented that the individuals with lower economic status had difficulty in adhering to the rules of social distancing which led to higher susceptibility to COVID-19 fatalities compared to wealthier individuals.

Moreover, the patterns of infections and fatalities due to the COVID-19 have varied immensely across countries. Surprisingly, richer countries with better healthcare systems have been more affected in terms of higher fatalities compared to poorer nations. The reported death toll remains confined to just a few high-income regions. In fact, lower-middle income and low-income countries (India and those in South Asia) account for just 3% of the global death toll although they carry roughly half of the world’s population (Schellekens and Sourrouille 2020). This rich-poor divide in COVID-19 deaths across the globe is well documented.

Understanding the relationship between COVID mortality and poverty is complex because we need to take into account the geographical distribution of the population. If richer people live in urban areas with high population density while poor live predominantly in rural less densely populated areas, poor may be less vulnerable to infections. Second, it has been studied that contact with natural environments enriches the human microbiome, promotes immune balance and protects from allergy and inflammatory disorders. For example, in rural areas, there are domesticated animals which reduce the risk of childhood asthma and other inflammatory disorders through early life zoonotic exposure. These exposures are non-existent in urban slum areas. This concept aligns with the “Missing Microbes” and “Old Friends” hypotheses which suggest that people are not encountering enough microbial stimuli early in life to develop a stronger immune system (Shimojo and Izuhara 2017; Rook 2023). Third, if affluent individuals are predominantly older, they may possess additional co-morbidity factors that increase their vulnerability to infections.

There are serious immunological reasons behind the hypothesis that the urban population across the globe could be more susceptible to COVID-19. A lack of rural upbringing, that includes a long-term and early-life exposure to stables and farm milk, and an elevated acute stress response (induced by adrenaline and cortisol surges) in an urban lifestyle provides a possible mechanism underlying the higher prevalence of chronic inflammatory disorders in urban areas. These ailments include asthma and allergies. Prospective human and mechanistic animal studies reinforce the idea that an exaggerated immune reactivity could play a role in a hyper-reactive immune system and chronic low-grade inflammation (Steinheuser et al 2014; Akdis 2021; Celebi Sozener et al. 2022). With this backdrop, we study the incidence of case fatalities in India during the second wave of COVID-19.

A staggering feature of COVID-19 infections in India is its regional disparity. Basu and Mazumder (2021) document that confirmed cases were more concentrated in prosperous and urbanized regions with high population density. On the other hand, poor and less developed regions in India suffered fewer infections. In this paper, we do a more comprehensive analysis with second wave of COVID-19 data at the district level to understand the deep-rooted factors behind this regional variation of fatalities.

Two principal findings emerge from our study. First, we find a clear rural–urban divide of COVID-19 fatalities in India. Rural districts classified by the degree of urbanization are predominantly poor as per the official poverty indicators and infant mortality rates. These poor rural districts experienced less COVID-19 case fatalities. Fatalities are more clustered in prosperous, urbanized and denser areas with lower poverty. This experience stands in sharp contrast with the experience of the US where poorer sections viz., Hispanic, Black and indigenous populations were exposed more to COVID-19 infections compared to Whites (Stafford et al. 2020; Abedi et al. 2020). A similar pattern is also experienced in the UK where Black, Asian and Middle Eastern (BAME) groups suffered more infections (Windsor-Shellard, & Kaur, 2020). Second, we probe into the reasons for the urban dominance of cases and fatalities. Our study suggests that urban population, compared to the rural, suffer from life-style disorders such as obesity, diabetes and hypertension. In addition, the richer and relatively more urban districts have more aged people. Furthermore, urban population in India is likely to suffer more from Chronic Obstructive Pulmonary Disease (COPD) and associated respiratory diseases on account of poor air quality (Ghosh and Mukherji 2014; Maji et al 2018) due to compromised industrial and vehicular emissions. Incidence of respiratory diseases could be an additional factor behind COVID-19 deaths as COVID-19 infections trigger a severe respiratory condition known as Acute Respiratory Distress Syndrome or ARDS (Tzotzos et al 2020). These factors, together with high population density, contributed significantly to COVID-19 cases and fatalities in urban regions of India.

In India, the poor may be immune to various kinds of infections due to unhygienic living conditions from very early childhood while in the US, the basic health infrastructure permits low-income people to access clean and germ-free environment from childhood. We use district level data for all-India analysis of 19 mainland states and 4 union territories leaving out the north-eastern states. Our source of data is the real time database available in www.COVID19india.org at the district level, which is by far the most comprehensive dataset for COVID-19 infections and fatalities across India. Our principal variables of interest are confirmed cases per million district population, deaths per million, and several regional macroeconomic and development indicators such as per capita net district domestic product, the degree of urbanization, population density, district-level head count poverty, percentage of aged population (60 years and above), share of district GDP from agriculture and allied activities.

Furthermore, three life-style disorder indicators are chosen that capture the typical middle and upper-middle class disease patterns in urban regions of India. These are hypertension, diabetes and obesity. Most of these state and district level socio-economic and demographic features are drawn from government sources such as the Census of India- 2011, the Niti Ayog and the National Family Health Survey (NFHS)-5, amongst several others, as detailed in the Appendix.

The paper is organized in the following sections: We review the related literature in “Literature” section. “Data” section discusses data, measurement and econometric issues. “Empirical Analysis” section reports the results of district level panel data analysis followed by an analysis of the impact of life-style diseases and ageing on COVID-19 fatalities in “Why did Urbanisation and Affluence Lead to Higher Fatalities: Exploring the Role of Lifestyle Diseases and Population Ageing at the District Level” section. “Summary and Conclusions” section concludes.

Literature

Several recent studies have reported the trends in COVID-19 infections in India and their regional variations. The regional disparities in COVID-19 infections in India have also been reported by Mandi et al. (2020) where they construct a multi-dimensional index of vulnerability for districts. Further, Ray and Subramanian (2020) also note regional variations of cases although the their paper aims to provide a critical appraisal of the COVID-19 lockdown in India. Using district level data, Jalan and Sen (2020a, b, c) also point out that all regions of India have not been impacted uniformly by COVID-19.

A growing tome of literature reveals several studies that have focused on socio-economic and socio-demographic causes of COVID-19 deaths. For US data, Hawkins et al (2020) observed that socio-economic factors play a crucial role in COVID-19 prevalence and mortality. They found that lower education level had the highest association with cases as well as fatalities. Cases and fatalities were higher in proportion among Black residents. COVID-19 fatalities were also correlated with median income and shifts in jobs. In a cross-country study, Sannigrahi et al (2020) examined the local and global spatial associations between key social and demographic factors and COVID-19 deaths and cases in the European region using spatial regression models. They documented disparate COVID-19 experiences of different countries where the most affected countries are Italy, Germany, Austria, Slovenia and Switzerland. Yang et al (2021) examined the influences of climate, socioeconomic determinants, and spatial distance from Wuhan on the confirmed cases and deaths in the peak phase of COVID-19 in China.

Along similar lines, Amaratunga et al (2021) investigated the possible effect of several localised socio-economic factors on the case count and time course of confirmed COVID-19 cases and fatalities across 21 counties in New Jersey. Their findings suggest that counties with more dense population proxied by number of restaurants have higher COVID cases. For 401 counties in Germany, using a multivariate spatial model, Ehlert (2021) finds that cases and deaths have significant positive association with mean age, population density and the share of people employed in elderly care during the first wave of infections in 2020.

A handful of epidemiological studies are reported in the literature, outlining the potential role of life-style disorders, especially obesity and diabetes in influencing COVID-19 deaths. From a clinical perspective, Albashir (2020) reports that obese patients with high body mass index are at a greater risk of complications from viral lung infections and more vulnerable to COVID-19 than non-obese patients. This is because co-morbidities associated with obesity are correlated with higher deaths. Wang et al. (2021) investigate the global association between lifestyle disorder factors and COVID-19 deaths by means of cross-country regression analysis. Several lifestyle-related indicators, such as obesity and diabetes, are recognised as risk factors behind COVID-19 deaths, which together with ageing, are associated with increased COVID-19 deaths across countries. Gardiner et al. (2021) also provide evidence that a large proportion of the cross-country variation in COVID-19 death rates can be attributed to differences in proportion of obese populations, population health, population density, demographic features, per capita GDP among others. For the UK, Tan et al. (2020) find increasing evidence lending to the hypothesis that obesity is an independent life-style disorder behind severe infection and even death from COVID-19. Ioannidis et al. (2020), Sasson (2021), Cortis (2020), Yanez et al. (2020), and Haklai et al. (2021), have empirically verified the incidence of higher COVID-19 deaths among the aged. Basu and Sen (2020) also provide cross country evidence of significant association between ageing and COVID-19 during the onset of the pandemic. Menon (2021) finds that BMI predicts quite significantly the COVID-19 hotspots after controlling for several factors. Dang and Gupta (2021) also find evidence of over-nutrition and resulting obesity as a determinant of cases and fatalities. Our study complements their finding.

A growing body of physiological and epidemiological literature reports the association between respiratory conditions and COVID-19. The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) affects the respiratory tract and usually leads to pneumonia in most patients resulting in acute respiratory distress syndrome (ARDS). ARDS, one of the leading causes of lung damage and death in patients with COVID-19, is mainly triggered by elevated levels of pro-inflammatory cytokines, referred to as cytokine storm, that in their ardour to battle a pathogen, impair the respiratory epithelium (Montazersaheb et al 2022). Increased inflammation in urban environments could be due to impaired immune regulation, which is thought to be dependent on reduced exposure, especially during early life, to microorganisms with which mammals co-evolved (evolutionary symbionts), as proposed by the “biodiversity” hypothesis (Hanski et al 2012), “missing-microbes” hypothesis (Blaser 2017), or “old-friends” hypothesis (Rook et al. 2013). In accordance with the “old friends hypothesis”, early exposure to both pets and farm animals is able to reduce the risk of childhood asthma and other inflammatory disorders (Böbel et al. 2018). In the light of these studies, we investigate here the role of various socio-economic factors which include development, health, and structural indicators in determining the COVID-19 death differentials across districts of 23 mainland Indian states and union territories (listed in Appendix 2B).

The interaction between socioeconomic and health indicators in determining regional disparity in COVID-19 fatalities has largely remained unanswered for India. Although Basu and Mazumder (2021) investigated the role of socioeconomic determinants in explaining the regional disparity in cases, their work was based on state level data of the first wave of infections but did not include fatalities. In this paper, we look at fatalities during the second wave of COVID-19 infections in India with a focus on more disaggregated district level data. Furthermore, we examine the role of life-style diseases in determining regional variations in case fatalities in India which is largely unexplored in the COVID-19 literature. The uniqueness of our study is that we explore the role of interactions between ageing and diseases associated with life-style disorders in determining the rural–urban divide of COVID-19 case fatalities. Our study is novel because to the best of our knowledge there is no analysis of the determinants of regional variations in COVID-19 fatalities with district level data. Throughout the text, the terms deaths and fatalities have been alternatively used to convey the same meaning.

Data

Our key data source for COVID-19 related statistics is the national COVID-19 portal for India https://www.COVID19india.org/ which has been regularly updated across states and districts of India since the onset of the pandemic in 2020. We take weekly cumulative total COVID-19 figures for both confirmed cases per million (CASES) as well as fatalities per million (DEATHS) across 557 districts covering 19 mainland states and 4 union territories of India leaving out all north-eastern states, the union territories of Ladakh, Andaman and Nicobar Islands, Lakshadweep islands, and Daman and Diu for which district level figures were unavailable on a weekly basis for our study period. The start date for our studyis 23-02-2021 and the end date is 27-09-2021, thus covering the second wave of COVID-19 infections in its entirety at the district level. The selection of the region and the period of study are justified by the fact that the COVID-19 deaths were mostly concentrated in the mainland states of India, specifically during the second wave of infections (February–September, 2021).

Apart from tracking COVID-19 infections and deaths, we have compiled district-level development and socio-economic indicators primarily sourced from the Census of India- 2011, Niti Ayog, NFHS-5, and various other references such as state-level statistical abstracts (for district-level information) gathered from the respective state government portals (detailed in Appendix 2 for precise definitions and sources).

Our district-level macroeconomic and development indicators are considered as time-invariant fixed factors. While these factors may vary across districts, they are unlikely to change every week. The district-level indicators include: (i) PCDDP (Per Capita District Domestic Product at constant prices), (ii) URBAN (Percentage of urban population at the district level), (iii) DENSITY (District-level population density, derived from Census 2011), (iv) AGRI (Percentage of district domestic product from agriculture and allied activities), (v) BPL (Percentage of the district population below the poverty line, essentially the headcount ratio), (vi) IMR (Infant Mortality Rate at the district level, obtained from Niti Ayog), (vii) ELECT (Percentage of district-level population residing in households with electricity connections, extracted from NFHS-5 district-level data), (viii) ROADS (Sum total of the length of state and national highways at the district level expressed as km/100 square km of district area, compiled from state-level statistical abstracts).

Additionally, the variable AGED represents the percentage of the population aged 60 years and above at the district level, serving as a measure of the old-age population. Hypertension (HYPER), diabetes (DIAB), and obesity (OBES), presented as percentages of the state-level adult populations, are extracted from the recently released NFHS-5 statistics (National Family Health Survey, 5th round, 2019–20), providing district-level cross-sectional data. Also, district level population per lakh (1 lakh = 1,00,000 or 0.1 million) affected by respiratory diseases (RESP) including Chronic Obstructive Pulmonary Disease (COPD) for 2019–20, (drawn from Ministry of Health and Family welfare) is taken to represent overall district level respiratory health.

In addition to monitoring COVID-19 cases and deaths, all developmental and socio-economic indicators, including the three lifestyle disorder variables, encapsulate district-level fixed effects. Within the framework of this study, every non-COVID variable remains time-invariant throughout the entire second wave of COVID-19 infections in India. Consequently, we do not use a fixed effects specification, such as in the Least Squares Dummy Variable (LSDV) model, because the non-COVID or district-level variables inherently account for fixed (district) effects. It is crucial to bear this in mind while interpreting the pooled estimators.

Underreporting of Fatalities

Pursuing COVID-19 research on India, one encounters a formidable problem of underreporting of cases, particularly fatalities. This may quite legitimately raise doubts about the reliability of our regression results. Given that our research focuses on determinants of fatalities, the underreporting typically gives rise to an issue of measurement error for the dependent variable. To see it clearly, define: \({\widetilde{y}}_{it}=\) reported fatalities at date t in the ith district, \({y}_{it}\)=actual fatalities and \({v}_{it},\) a positive measurement error representing the underreporting. In other words, \({{\widetilde{y}}_{it}=y}_{it}-{v}_{it}.\) Let \({x}_{it}\) be the vector explanatory variables. Our true regression equation is, \({y}_{it}=\alpha +\beta {x}_{it}+{u}_{it}\) where \({u}_{it}\) is the underlying error term which captures all omitted variables. The actual regression with observed fatalities as the dependent variable is: \({\widetilde{y}}_{it}= \alpha +\beta {x}_{it}+{e}_{it}\) where the composite error term is \({e}_{it} ={u}_{it}-{v}_{it} .\) If \({u}_{it}\) has a zero conditional mean then \(\text{E}({e}_{it}|{x}_{it}))=\text-{E}({v}_{it}|{x}_{it})\). The bias then depends on the property of the measurement error. If the error does not depend on the independent variable, and we assume that \(\text{E}({v}_{it}|{x}_{it})=\mu\), then the estimator \(\alpha\) is biased because it is \(\alpha -\mu\) but the estimator of \(\beta\) is unbiased and consistent.

However, if the measurement error depends on the independent variables, in the sense that \(\text{E}({v}_{it}|{x}_{it})\) changes with \({x}_{it}\) then we have the usual omitted variable bias problem which makes our estimator of \(\beta\) inconsistent. We need to find a suitable set of instrumental variables (IV) to rectify this bias. In this paper we report both OLS and IV.

Empirical Analysis

In Fig. 1 we plot the per capita NSDP, confirmed cases per million state population and the COVID-19 deaths per million across states after expressing each variable in a 0–1 scale for the sake of comparability. Cases and deaths are concentrated in the relatively richer states of India. Motivated by this plot we compute the ordinary correlations between variables of interest at the district level. Results are in presented in Table 1. Not surprisingly CASES and DEATHS are significantly correlated. Deaths associate positively and significantly with PCDDP, URBAN and DENSITY implying that the COVID-19 deaths in India during the second wave are concentrated in the richer, more urbanised and densely populated districts. The cases are weakly correlated with PCDDP but correlated significantly and positively with URBAN. Almost similar is the correlation pattern for deaths.

Fig. 1
figure 1

Plotting per capita NSDP, confirmed cases and fatalities across Indian states (cases and fatalities per million as on 27-09-2021). Source: Plotted by the authors on the basis of secondary data. COVID statistics are drawn from COVID19 india.org. All variables are expressed in a 0 to 1 scale following the HDI-type attainment index formula

Table 1 District level correlation among variables of interest in the cross-section

Few observations about the correlations reported in Table 1 are in order. Poor districts classified by BPL are predominantly rural as suggested by the significant negative correlation between BPL and URBAN and positively correlated with AGRI. Poor districts also have high IMR due to poor health infrastructure and lower population density due to their agricultural base. Cases and fatalities are lower in poor districts as indicated by the significant negative correlations with BPL, AGRI and IMR.

All these correlations confirm the finding that poorer districts experience less COVID-19 cases and fatalities. Moreover, deaths are weakly associated with PCDDP but significantly positively with URBAN. Finally, deaths and cases strongly associate positively with ELECT which is anticipated as urbanised and richer regions of India have better access to household electricity. Arguably ELECT proxies both PCDDP and URBAN in this paper.

Motivated by these correlations, we next run a log-linear cross-district regression to focus on various developmental determinants of COVID-19 deaths across 23 states and union territories of India covering 557 districts taking weekly new death count as the dependent variable. The results are in Table 2. Broadly, in line with the existing literature, we choose PCDDP, URBAN, DENSITY, BPL, AGRI and ELECT as explanatory developmental variables [see for instance, Ehlert (2021); and Canatay et al (2021)]. Time and time-squared terms are introduced for capturing the non-linear nature of cumulative deaths.

Table 2 District level dynamic panel regression of weekly COVID-19 deaths during second wave [Dependent variable: Dlog(deaths)]

At first glance, non-industrial states have had fewer fatalities as seen by the significantly negative AGRI coefficient in models 2 and 3. Finally URBAN, PCDDP and DENSITY have significant and positive coefficients. The richer and more densely populated states have a higher chance of COVID-19 related fatalities. These results are broadly consistent with Basu and Mazumder (2021).

Our rationale for running these various specifications is just to ascertain whether fatalities are consistently lower in less prosperous poor districts which is the key hypothesis of this investigation. Model 1 shows that both DENSITY and URBAN have positive and statistically significant influence on deaths per million, while BPL has a negative influence on the same. Next, model 2 suppresses URBAN but introduces PCDDP. AGRI is also taken while BPL is retained. Income (i.e., PCDDP) explains deaths while partial influence of both BPL and AGRI turn out to be negative and significant. In model 3 the URBAN-BPL interactive term has a negative and significant coefficient although URBAN by itself significantly positively influences deaths, other things unchanged. The negative sign of the interaction term in model 3 suggests that the positive effect of urbanisation on fatalities is partly muted by poverty which accords well with the key result that poor are more immune to COVID-19 fatalities. Among similarly urbanised districts if we move to poorer regions, COVID-19 deaths are expected to go down. AGRI has a death suppressing influence even in model 3. Finally in model 4 when BPL is dropped PCDDP and URBAN have positive and significant coefficients strengthening our fundamental hypothesis. Since BPL has strong negative correlation with these variables, dropping BPL possibly makes the other developmental variables more significant in determining COVID-19 cases and fatalities. On the whole our district level results in Table 2 are consistent with our correlations in Table 1.

Time and time-squared are used as regressors to capture the non-linearity in the cumulative growth pattern. White’s period standard errors have been used throughout for the sake of robustness although consistency in the signs of the coefficients is the focus rather than statistical precision of the estimates. To adjust for serial correlation in errors we introduce a lagged dependent variable as a regressor throughout.

Why did Urbanisation and Affluence Lead to Higher Fatalities: Exploring the Role of Lifestyle Diseases and Population Ageing at the District Level

We now turn to the question why COVID-19 deaths are more concentrated in the richer and urbanized regions of India. Does the population in the richer and urbanized regions suffer from specific health disorders which are rare in the poorer and rural areas? Studies have demonstrated that hospitalized patients younger than 50 with morbid obesity (BMI ≥ 40 kg/m2) are more likely to die from COVID-19. Patients with obesity are associated with impaired immune response, endothelial dysfunction, decreased functional residual capacity in the lungs and hypoxemia and expression of ACE2 in adipose tissue which has high affinity to the SARCOV2 virus (Klang et al. 2020; Tibiriçá and Lorenzo 2020). Moreover, there is sufficient evidence by now that co-morbidity factors like diabetes, hypertension and obesity might be jointly responsible for higher COVID-19 deaths (Escobedo-de la Peña et al. 2021; Mahamat-Saleh 2021).

The NFHS-5 report in India provides an opportunity to delve deeper into health-related insights. While state-level estimates of population proportions with co-morbidities (defined as the presence of at least two different diseases or medical conditions simultaneously in the same person) are not directly available, we do possess district-level data for three crucial lifestyle diseases or disorders from the National Family Health Survey 2019–20 (NFHS-5).

These are namely: (i) Diabetes (DIAB)—measured by the percentage of the population above 15 years with blood sugar levels exceeding 140mg/dl, (ii) Obesity (OBES)—captured by the percentage of the population aged 15–49 years who are obese (BMI > 25 kg/m2). It is worth noting that at the district level, the NFHS-5 fact sheet provides data specifically for female obesity, (iii) Hypertension (HYPER)—indicating the percentage of the population aged 15 years and above suffering from elevated blood pressure (Systolic ≥ 140 mm of Hg and/or Diastolic ≥ 90 mm of Hg) or taking medication to control blood pressure.

Furthermore, it is well-recognized that the elderly population constitutes a significant proportion of COVID-19 mortalities globally (Wang et al. 2021; Yanez et al. 2020). Urban India having a relatively higher proportion of aged individuals could potentially explain the higher urban deaths due to COVID-19, especially when considering the influence of co-morbidity factors. In addition COPD and associated respiratory conditions are likely to be associated with higher COVID-19 deaths because a significant proportion of COVID-19 patients have been observed to develop ARDS (Hsu et al. 2021).

Table 3 presents the ordinary correlation coefficients between development variables and the three selected health indicators, providing insights into the interplay between development factors, lifestyle disorders as well respiratory diseases.

Table 3 District level cross-section based ordinary correlations between Socio-economic and life-style disorder variables

First, looking at the DEATHS column we find that both obesity (OBES) and hypertension (HYPER) are significantly and positively associated with DEATHS implying that COVID-19 deaths in India have been more concentrated in districts that suffer more from obesity and hypertension. This is consistent with cross-country observations separately by Wang et al. (2021) and Gardiner et al. (2021). The ordinary correlation coefficient of DEATHS with each of the life-style diseases is highly significant at the district level cross-section. Moreover, fatalities are significantly and positively associated with ageing and the incidence of respiratory diseases (RESP).

Second, a glance at the PCDDP and URBAN columns reveal that the three lifestyle disorder diseases along with respiratory disease variable (RESP) are all significantly positively associated with PCDDP, and the degree of urbanization. Two of the lifestyle disorder diseases, namely hypertension and diabetes, are significantly positively correlated with respiratory diseases across Indian districts implying the coexistence of lifestyle diseases and respiratory diseases in India. In addition, all three diseases are significantly negatively associated with BPL and AGRI thereby indicating further that these life-style diseases are more concentrated in the urban and richer regions of India. However, a noteworthy observation in Table 3 is the highly positive association between AGED and URBAN (correlation being 0.67). It suggests that there are more aged people in urban areas. The statistically significant and positive ordinary correlation coefficient between AGED and RESP suggests that in India, the aged suffer more from respiratory diseases (Divo et al. 2014).

Although these correlations do not necessarily mean any causality among the variables, they certainly motivate further empirical analysis. We run a family of district level panel regressions judiciously choosing the life-style diseases as district level fixed factors along with our usual structural socioeconomic variables. The results are in Table 4. We run three district specific regressions where weekly new DEATHS are explained on the basis of percentage of 60 years plus population (we call AGED) and the three life-style diseases including the pairwise interactions. We introduce AGED as a control factor throughout the three models. AGED is statistically significant and positive across models implying that everything else equal, the higher the percentage of old age population at the state level, the higher the COVID-19 deaths per million.

Table 4 District level dynamic panel regression of weekly COVID-19 fatalities on ageing, lifestyle diseases and their interactions [Dependent variable: Dlog(deaths)

It is noteworthy that the coefficients of age, obesity, hypertension, diabetes and respiratory disease variables are all significant and positive (Model 1), implying greater risk of COVID-19 deaths on account of such factors. In the other interactive models (models 2 and 3), the AGED-HYPER and AGED-DIAB interactions have positive coefficients and are highly significant. In addition, the AGED-RESP interaction is also statistically significant indicating that the simultaneous presence of more aged people and proportion of population suffering from chronic respiratory diseases are likely to raise COVID-19 deaths at the district level. On the whole, ageing in conjunction with hypertension and diabetes seem to be significant co-morbidity factors behind second wave of COVID-19 deaths in India. Since all three chosen life-style disorder diseases are primarily urban in nature (in the Indian context), it provides an explanation for incidence of high COVID-19 deaths specifically in urbanized districts. In Table 4 the AGED–OBES interaction term is deliberately avoided as OBES covers the age-group 15–49 years only implying two mutually exclusive groups of individuals for AGED and OBES.

The key implication of this exercise is that if the population is comparatively more aged and if it suffers more from life style disorder related diseases (compared to that in poor, agricultural and rural districts) then the fatality risks due to COVID-19 are significantly higher. Respiratory disease both in isolation as well as through its interaction with ageing explains COVID-19 fatality. In other words, co-morbidities resulting from a combination of lifestyle diseases, ageing, and respiratory illnesses are potentially more life-threatening across India concerning COVID-19 deaths. These findings are fairly consistent with Sasson (2021), Ho et al. (2020), and Yanez et al. (2020) who also report higher COVID-19 mortality rates among the elderly and also higher mortality among the elderly with co morbidities.

The Interactions Between Urbanization and Urban Life-Style Diseases and Consequent Impact on COVID-19 Fatalities

Urban life-style diseases in India are concentrated more among the affluent urban population. The percentage of district level urban population (the variable URBAN in this paper) and the percentage of population suffering from life-style disorder diseases can have interactive effects and these interactions can potentially influence the COVID-19 fatality rates.

Table 5 presents different specifications of models where the principal focus is on the influence of the interactions between urbanisation (URBAN) and each of the life-style disorder indicators on COVID-19 deaths. Aged populations in India are relatively more concentrated in the urbanised districts according to the correlation reported in Table 4. Since these disorders are primarily urban life-style disorders, we add an interaction with URBAN, keeping in mind that a likely candidate for explaining higher COVID-19 deaths in urban areas could be rooted in URBAN—life-style disorder interactions. We present a non-interactive model to begin with (Model 1), which simply has the pure effects due to ageing, urbanization and each of the three life-style diseases besides incidence of respiratory diseases. Apart from DIAB, all explanatory variables including respiratory diseases turn out to be statistically significant in explaining weekly COVID-19 deaths. That is, the partial impacts of ageing, urbanization, respiratory disease and the three key urban life-style diseases for India (i.e., obesity, diabetes and hypertension) on COVID-19 deaths are positive and significant. The AGED–URBAN interaction coefficient is positive and significant in the second model. Next, across models the coefficients of the interactions between URBAN and each of the three chosen life-style disorder indicators are positive and statistically significant. The central message from results in Table 5 is that, given the levels of incidence of these life-style diseases, as we move to more aged as well as urbanized populations, COVID-19 related deaths are expected to rise significantly.

Table 5 Urbanisation and lifestyle disease interactions and consequent Impacts upon COVID-19 deaths—a district level dynamic panel regression

In sum, urbanization in India is likely to have aggravated the COVID-19 deaths toll through the life-style disease channel. Our results corroborate the findings by González-Val and Sanz-Gracia (2022), Naudé and Nagler (2022) and Chang et al (2022), who also provide similar evidence across nations where urbanization has been identified as a crucial predictor of COVID-19 mortalities.

Presence of Co-morbidities and COVID-19 Fatalities in India

A rudimentary approach of the modelling co-morbidity (the presence of at least two diseases or medical conditions in a person—lifestyle and respiratory diseases here) in a regression framework is through the introduction of pair-wise disease interaction terms from our chosen lifestyle disease incidences—namely, (i) OBES–DIAB, (ii) HYPER–OBES, (iii) DIAB–HYPER and similar interactive variables involving ‘RESP’ as an additional co-morbidity factor. In Table 6 we present the dynamic panel data models with weekly deaths count as the dependent variable. Our key objective here is to explore the impact of life-style disease interactions including those involving the incidence of chronic respiratory illness on COVID-19 deaths in India.

Table 6 Lifestyle disease interactions and consequent Impacts upon COVID-19 deaths—a district level dynamic panel regression [Dependent Variable: Dlog(deaths)]

The first model shows the pure and partial effects of each of the life-style diseases on deaths along with a single interactive term in the form of HYPER–RESP. Individually, all three life-style diseases and the respiratory disease variable explain COVID-19 deaths both positively and significantly. However, the disease interaction terms that are of special interest, are significant across the six models implying that co-morbidity related factors have significantly aggravated COVID-19 deaths in India during the second wave. This is fairly consistent with the findings of Rana et al. (2023) for the NCR region of India (Delhi), where higher mortality risk and severity due to COVID-19 was observed in patients with co-morbidities like diabetes, hypertension and chronic kidney diseases. Throughout the analysis, models with OBES-RESP interactive term are not reported, as this interaction is found to be insignificant.

Finally, we repeat a somewhat similar set of models with Fatality-Case ratio (FCR for short) as the dependent variable. FCR is the cumulative total weekly deaths divided by cumulative total weekly confirmed cases, times 1000. This is the number of COVID-19 deaths per 1000 confirmed cases (FCR here) representing the fatality ratio. The estimated pooled regression models are in Table 7. Except DIAB, other chosen variables are statistically significant in Model 1. The rest of the four estimated models are largely similar to those presented in Table 6. AGED and URBAN influence the FCR positively and significantly and this is a consistent finding across models in Table 7. Most importantly, the life-style disease interactions are all highly significant in explaining FCR even when respiratory disease is included as an additional co-morbidity factor.

Table 7 Lifestyle disease interactions and consequent Impacts upon COVID-19 Fatality-Case Ratio—a district level panel regression

Endogeneity

An important econometric question is whether our key structural-development variables such as PCDDP, URBAN amongst others are exogenous and pass the standard orthogonality tests. The endogeneity is unlikely to arise from reverse causality because it is implausible that COVID-19 fatalities will influence our structural variables within such a short time. However, the endogeneity could arise due to either omitted variables or measurement errors in fatalities correlated with the structural variables. In the presence of such endogeneity the application of ordinary least squares leads to biased and inconsistent estimates of parameters of the regression model. We adopt a standard Two Stage Least Squares (2SLS) approach to test for orthogonality for each of our structural variables. Based on district-level data, the results are reported in Table 8 relegated to Appendix (A1). In all regressions, the p-value associated with the J-statistic reveals that IVs are adequate and all our structural (explanatory) variables pass the orthogonality tests.

Summary and Conclusions

In this paper we explain the inter-district variations in COVID-19 fatalities in India focusing primarily on the second wave of COVID-19 infections and fatalities data. A striking finding is that fatalities are concentrated in urban and prosperous regions of India with a predominant aged population with a prevalence of life-style disorder related diseases such as obesity, hypertension and diabetes. Urbanised and industrially dominant districts have poorer air quality in India, thereby raising the incidence of chronic respiratory diseases. Consequently this adds to the list of co-morbidity factors among urban population. Our findings suggest that chronic respiratory illnesses in conjunction with life-style disorder diseases have aggravated COVID-19 fatalities in India. In addition to this, high population density in these urban industrial districts also contributed to case fatalities while the low-income citizens in sparsely populated agricultural regions experienced lesser COVID-19 fatalities.

Over the course of human history, a close beneficial relationship has developed between our immune system and the microbes that live within us. Exposure to environmental and commensal microbes played a beneficial role in setting up pathways that regulate our immune system. However, dietary changes, overuse of antibiotics, reduction of breastfeeding, improved sanitation, treated drinking water, Caesarean births, and spending more time indoors have reduced our exposure to these microbes, thus compromising our immune system. This explains why we see a clear rural–urban divide in COVID-19 fatalities. The stark rural–urban and rich-poor divide in COVID-19 case fatalities stands in sharp contrast with COVID-19 experiences of the advanced economies and it lends support to the “missing microbes” and “old friends” hypotheses that poor in rural India may be relatively more immune to various infections.

What does this mean for our health and policies? Since wealthy and densely populated areas were hit harder by COVID-19, in case of another outbreak, it might make sense to focus on and prioritize these areas for special attention and possible lockdowns. Since age is a big factor in COVID-19 deaths, it might be a good idea to keep younger and older people in segregated  bubbles during an outbreak. As the elderly, and people with existing health issues are more at risk, it might be helpful for the health department to focus on testing and vaccinating these vulnerable sections of the population.