Introduction

The complexity of managing patients with the coronavirus disease 2019 (COVID-19), a global pandemic [1] originated in China [2], has led to the widespread implementation of preventative measures such as social distancing and mask use [3] in many countries including the United States (US). As of April 14, 2020, there were over 1.9 million confirmed cases around the world with 601,000 cases and 24,129 deaths in the US alone [4]. It has been reported that age 65 and older, body mass index ≥ 40, diabetes [5] immunosuppression, smoking, hypertension, and cardiovascular diseases are underlying conditions that increase the risk of death from COVID-19 [6, 7].

The most recent conundrum of this disease is ascribed to the preliminary evidence of racial disparities in the population infected and dying from COVID-19 [3]. In a recent study, the Centers for Disease Control and Prevention (CDC) reported data from fourteen states and suggested that the US Black population may be disproportionally affected by COVID-19 [3]. This observation is consistent with the influenza A (H1N1) pandemic where other studies showed evidence of racial and ethnic disparities in the population affected both in exposure, severity, and mortality of the disease [8, 9]. As states release the racial and ethnic demographic data of COVID-19 cases, in addition to the increased spread of this disease to the central states, it is imperative that we understand the patterns of infection and death to reduce the risks, especially for high-risk population, and resolve issues that impede the provision of optimal care.

In this study, we conducted an ecological-based analysis to explore racial and economic inequality associated with the infection rate and risk of mortality due to COVID-19 in the US. The goal of the study was to provide evidence on the association of COVID-19 with respect to race, income level, poverty, education, and the impact of preventative measures such as social distancing. We trust that the decision making by the states’ officials will be driven by data and based on their unique needs and population characteristics to help in combating this disease.

Methodology

The study was conducted at two levels: (1) analysis of population characteristics (44 variables) for 369 counties in seven states which had the highest rate of COVID-19 infection as of April 9, 2020, along with COVID-19 infection and mortality rates. The included states were California, Michigan, New York, New Jersey, Louisiana, Pennsylvania, and Massachusetts; (2) analysis of COVID-19-related infection and death rate across all the states in the US with race/ethnicity information on the affected subject when available.

Data Source, Outcomes, and Independent Variables:

Data sources in this study include (1) publicly available data from USAfacts and the US Census Bureau for COVID-19 cases and county-level demographic data [10, 11], (2) COVID-19 data reported by each state on their department of health websites [10], (3) State Population by race/ethnicity data [12, 13], and (4) mobility data extracted from Google [14].

The variables used in this study include county-level information on total population, mobility, race, poverty level, median income, education, disability, and rate of the insured population. Mobility data were extracted from Google as reported on April 05, 2020. The state-level data were extracted on April 16, 2020. The outcome variables include the rate of COVID-19 infection and all COVID-19-related death as provided by each state’s department of health as of April 09, 2020. The infection rate is based on the reported results from all the laboratories testing samples in each county/state. The mortality data are reported by hospitals, nursing homes, and other health facilities. Table 1 summarizes the data elements used in this study. Only data provided by the states on their official websites were included in this study. Additionally, to compare the rate of COVID-19 cases and death, the population data for each ethnic/racial group affected were extracted from health department websites.

Table 1 Data elements, descriptions, reference to sources of data, and base statistics used in this study

Statistical Analysis

We summarized all continuous variables as mean ± standard deviation or median with inter-quartile range [IQR] and categorical variables as percentages. Data from different sources were extracted and analyzed for outliers. Values not within three inter-quartiles were removed as part of the data pre-processing. Each continuous variable was centralized and z score transformed. Thus, the transformed variables passed the normality test and the correlation matrix was created. Bivariate, partial correlation, and regression were used to test hypotheses of association. The correlation coefficients between “death rate” and “infection rate” with independent variables were calculated by Pearson’s correlation (R corr package). Partial correlation was further evaluated by Pearson’s correlation (R ppcor package) to determine if the existing correlation was still valid after controlling the second independent variable. The Bonferroni correction for multiple testing of controlling variables was considered to adjust the p value of the correlation. Bivariate linear regression adjusted for “State” variables was utilized to test the association between “death rate” or “infection rate” with independent variables. The raw p value was present in the forest plot. False discovery rate (FDR) corrections for multiple testing were calculated using the Benjamin and Hochberg procedure. Statistical analyses were performed using R version 3.6.2 [16]. and IBM SPSS Statistics 26 [17].

Results

Population, Mobility, and Socioeconomic Determinants

We extracted data from four different sources on 369 counties from seven states, including five states from the East Coast (Michigan, New York, New Jersey, Pennsylvania, and Massachusetts), one state from the West Coast (California), and one state from the South (Louisiana) with the total population of 102,178,117 (median, 73,447; IQR, 30,761-256,098). The information on race, income, education level, insurance, poverty, and disability including the description of abbreviated variables is summarized in Table 1, (Supplemental Table S1 includes additional summary statistics of the dataset).

Our data show a significant association among different socioeconomic determinants, such as poverty level, education, and income (see Table S2). In particular, counties with a higher percentage of people below the poverty level had a significantly lower percentage of the population with higher education (Pearson correlation, − 0.52, p < 0.005 for Bachelor’s degree; Pearson correlation, − 0.61, p < 0.005 for high school), as well as a lower percentage of people insured, but a higher percentage of people on Medicaid (Pearson correlation, 0.77, p < 0.005) or on disability (Pearson correlation, 0.41, p < 0.005; see Table S2 for more details). Counties with a higher percentage of residents below the poverty level had a higher percentage of Blacks (Pearson correlation, 0.52, p < 0.005 for men; Pearson correlation, 0.50, p < 0.005 for women) and a lower percentage of non-Hispanic Whites (Pearson correlation, − 0.30, p < 0.005 for men; Pearson correlation, − 0.33, p < 0.005 for women).

Counties with a Higher Total Population, More Diverse Demographics, Higher Education, and Income Level Are at a Higher Risk of COVID-19 Infection

The COVID-19 infection rate per one million (mean, 912.20 ± 1034.26) ranged from 15.36 to 5093.99 in different counties (Table 1 and S1). Figure 1 shows the map of Pennsylvania with total population for each county, rate of infection and death due to COVID-19 infection, as well as, median income in the counties and percentage of the population who are identified as non-Hispanic Whites. The map of the other six states is provided as Supplemental Fig. S1-S6 for reference. The outliers were not removed in these figures.

Fig. 1
figure 1

Population count, non-Hispanic White, and Median Income and the rate of COVID-19 and related death in the counties of the state of Pennsylvania, as of April 9, 2020

The results of the bivariate linear regression (Fig. 2, Table S3) estimate effect sizes (regression coefficients) of a number of variables contributed to COVID-19 infection when controlled for states in the model. Counties with a higher population (est. 0.34, 95% CI 0.24, 0.44, q < 1.1E-08), a higher median income (est. 0.36, 95% CI 0.25, 0.48, q < 2.3E-08), and a more diverse population (higher percentage of Hispanics, Asians, and Blacks) have a higher rate of infection. More specifically, a higher percentage of Asians (est. 0.32, 95% CI 0.20, 0.44 q < 6.3E-07, women; est. 0.32, 95% CI 0.20, 0.43, q < 4.5E-07, men), Blacks (est. 0.47, 95% CI 0.32, 0.62, q < 2.3E-08, women; est. 0.35, 95% CI 0.20, 0.51, q < 1.7E-05, men), and Hispanics (est. 0.49, 95% CI 0.34, 0.64, q < 1.2E-08, women; est. 0.46, CI 0.31, 0.62, q < 8.1E-08, men) are associated with a higher rate of infection while a higher percentage of non-Hispanic Whites (est. − 0.41, 95% CI − 0.55, − 0.26, q < 2.9E-07, women; est. − 0.44, 95% CI − 0.58, − 0.30, q < 4.2E-08, men) is associated with a lower rate of COVID-19. Change in grocery mobility (est. − 0.24, 95% CI − 0.36, − 0.13, q < 5.5E-05), retail mobility (est. − 0.26, 95% CI − 0.38, − 0.14, q < 4.5E-05), and work mobility (est. − 0.31, 95% CI − 0.43, − 0.20, q < 9.0E-07) were associated with a lower rate of infection. Another protective factor in terms of rate of infection for the counties analyzed was a higher percentage of disability (est. − 0.159, 95% CI − 0.265, − 0.053, q < 0.006). We also analyzed the rate of infection for counties with respect to their percentage of uninsured and found no significant association other than among men (est. 0.181, 95% CI 0.05, 0.313, q < 1.2E-02) and non-Hispanic Whites (est. 0.251, 95% CI 0.123, 0.380, q < 3.1E-04). Furthermore, in our stepwise regression model, minorities specifically Black and Hispanic women, poverty, and level of education among non-Hispanic Whites, disability, the total county population, and level of mobility are predictors of the rate of COVID-19 infection (Table S4).

Fig. 2
figure 2

a Bivariate analysis of factors for mortality due to COVID-19. b Bivariate analysis of factors for infection by COVID-19

Counties with a Smaller Population, Higher Poverty Levels, and Higher Disability Have a Higher Rate of Mortality

The COVID-19-related death (mean, 4.13% ± 2.70%; median, 3.40; IQR, 2.22–5.61) varied among different counties (Table 1 and S1). Figure 2 and Table S5 show the results of the bivariate regression analysis estimating the odds of mortality due to COVID-19 infection (model corrected for states). Protective factors for the counties are a higher percentage of Asians (est. − 0.27, 95% CI − 0.41, − 0.12, q < 0.003, women; est. − 0.23, 95% CI − 0.37, − 0.09, q < 0.009, men) and education level with a bachelor’s degree or higher with an odds ratio ranging from − 0.41 to − 0.03 across the various ethnicities (see Fig. 2). Other protective factors for counties include having a higher percentage of people insured (strongest indicator being for non-Hispanic White people with an estimate of − 0.48, 95% CI − 0.68, − 0.28, q < 6.0E-05) and median income (est. − 0.27, 95% CI − 0.41, − 0.12, q < 0.003). The total population in the counties is also a major indicator (est. − 0.33, 95% CI − 0.43, − 0.20, q < 6.0E-05) of lower COVID-related death. We have also explored the association between total population and various confounding factors, such as mobility data when analyzing the death rate and found that the total population is still an important protective factor (see Table S6, Fig. S7 and S8). Factors significantly associated with higher mortality in the counties analyzed include a higher percentage of people under the poverty level (for all the races analyzed in this study), a higher percentage of people on Medicaid (est. 0.17, 95% CI 0.03, 0.30, q < 0.04), and a higher rate of people with disability in the county (est. 0.27, 95% CI 0.09, 0.45, q < 0.02). Grocery mobility was also highly associated with mortality (est. 0.21, 95% CI 0.02, 0.39, q < 0.06).

To better understand the characteristics of counties with higher or lower death rates, we performed a comparative analysis using ANOVA and found that, similar to the above, counties with more population diversity, higher income and education, a lower rate of disability, and a higher rate of the insured population have a significantly lower than the median death rate. Table 2 and Table S7 summarizes the population characteristics when counties are compared with death rate lower and higher than the median (median death rate is 3.4% across the counties in the seven states, Table S1). Park and retail mobility changes are significantly different between the two groups. The average number of Asians (both man and woman), as well as Hispanics (both man and woman), is significantly higher (p < 0.05) in group 1 (death rate ≤ 3.4). The counties with higher death rates have lower median income and higher poverty levels across all the races. The group with a lower death rate has also a higher rate of the insured population and a lower rate of disability. The percentage of Medicaid is significantly higher in the group with a higher death rate.

Table 2 ANOVA comparing counties with higher and lower than the median death rate

COVID-19 Infection and Mortality Are Higher Among African Americans

We have also extracted data on all the states with respect to race distribution (see Table S8). As of April 16, 2020, we have observed that African Americans, as defined in the reports, have a higher rate of COVID-19 infection and a higher death rate. The number of African Americans infected by COVID-19 is 64,605 (1981 cases per million) with the number of deaths reaching 6181 (211 deaths per million), while the number of Whites, as defined in the reports, is 104,914 (658 cases per million) infected and 9806 (76 deaths per million) dead leading to a disproportional percentage of African Americans infected (p < 0.0001) by COVID-19 and dead (p < 0.0001) as the result. The number of infected Latinos and Asians per million, as defined in the reports, is 947 and 390, while the rate of mortality (per million) is 82 and 52 respectively.

Discussion

Our analysis highlights that counties with a higher total population, more diverse demographics, higher education, and income level are at a higher risk of COVID-19 infection; however, counties with a smaller population, higher disability rates, and higher poverty levels have a higher rate of mortality. The conflicting results for counties’ population could be related to the population density, easier access to the high quality of healthcare, and more experience managing the COVID-19 infection due to the higher number of patients. One can argue that counties with fewer residents have a higher rural population. Studies have shown that there are significant differences in the overall healthcare assessment of rural populations as compared with urban populations [18]. Our observation is also aligned with a recent analysis of health differences in 3053 US counties, showing that rural areas are more likely to have poorer health outcomes [19]. The association of poverty and disability makes the conclusion of this study more complex and beyond the analysis of social determinants. By adding the interaction terms in the linear regression model (death rate~poverty + disability + poverty:disability) of death rate, we do not observe a significant interaction (p = 0.469), suggesting these two variables could be independent in their contribution to the risk of mortality. Populations with a higher disability [20] and lower median income [21] might be less mobile, have more comorbidities [22, 23], and also less likely benefit from timely high-quality care [24] and high-quality nutrition; all of these factors could be equally important to combating this pandemic [25, 26]. Furthermore, our analysis of the preliminary data on mobility, given the recent social distancing guidelines, corroborate the impact of this intervention on lowering the infection rate and death.

Our findings highlight that race (especially Black) is a risk factor for the infection. To further our understanding of the impact of race, we performed an additional comparative analysis using ANOVA and found that counties with fewer than median non-Hispanic Whites (group 1: percentage of non-Hispanic Whites ≤ 39.7%) had a significantly lower total population (p < 5.2E-15) than counties with more than median non-Hispanic Whites (group 2: percentage of non-Hispanic Whites > 39.7%); however, the rate of mortality is significantly higher (p < 0.003) while the rate of infection is significantly lower(p < 4.4E-13) in this group; Table S9 includes additional details. Finally, access to insurance was a protective factor in terms of mortality from COVID-19, but access to insurance did not significantly associate with the rate of infection. Comparison of counties with higher or lower than median death rates provided further evidence of the association of lower median income and higher poverty levels across all the races with mortality. The counties with a higher death rate also had a higher percentage of people on Medicaid. The latter is expected since Medicaid is significantly associated with the rate of poverty (Pearson correlation, 0.769, p < 0.005) as well as the rate of disability (Pearson correlation, 0.428, p < 0.005). Descriptive statistics of data from all the states also corroborates that African Americans might be disproportionally affected by this pandemic as of April 16, 2020. This observation is consistent with the H1N1 pandemic, where studies have shown evidence of racial and ethnic disparities in the population affected in terms of exposure, severity, and mortality of the disease [8, 9]. Finally, historical data have taught us that minorities and people of color tend to be more affected by different diseases [27,27,28,30].

This is the first systematic study on the racial, health, and economic disparity, as well as education, mobility, and COVID-19 infection in the US with the available data from the most severely impacted states. Our study had several limitations; the data was not granular, and we had missingness, especially for smaller and less populated counties. Access to the infected patient information and mortality data was not possible, and only aggregated data were used. Furthermore, many states claimed difficulties in reporting racial/ethnic demographic data due to patients opting out of providing their racial identification. The lack of clarity resulted in partially reported data for the death and case rate per million reported in this article, due to some states reporting on racial data for one, two, or all the racial variables specified in this study. The infection rate estimate may be underrepresented, as some individuals may have mild symptoms but lacked clinical validation of the infection. Finally, our in-depth analysis was based on only seven states, leading to conclusions that may not be generalizable to other regions.

Conclusion

Implications of the results from this study highlight the value of the targeted interventions, as different counties, even within the same state, may have different characteristics and different needs. Furthermore, as the association between COVID-19-related fatality and infection is different among different race and health status, it is important to further study the impact of the immune system and immune-boosting strategies in the at-risk population (such as people with certain disabilities or those residing in elderly community centers), as preventive measure along with other measures based on social distancing guidelines and the ability to work from home.