Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Causal graph analysis of COVID-19 observational data in German districts reveals effects of determining factors on reported case numbers

  • Edgar Steiger ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    esteiger@zi.de

    Affiliation Central Research Institute of Ambulatory Health Care in Germany (Zi), Berlin, Germany

  • Tobias Mussgnug,

    Roles Investigation, Writing – original draft, Writing – review & editing

    Affiliation Central Research Institute of Ambulatory Health Care in Germany (Zi), Berlin, Germany

  • Lars Eric Kroll

    Roles Conceptualization, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Central Research Institute of Ambulatory Health Care in Germany (Zi), Berlin, Germany

Abstract

Several determinants are suspected to be causal drivers for new cases of COVID-19 infection. Correcting for possible confounders, we estimated the effects of the most prominent determining factors on reported case numbers. To this end, we used a directed acyclic graph (DAG) as a graphical representation of the hypothesized causal effects of the determinants on new reported cases of COVID-19. Based on this, we computed valid adjustment sets of the possible confounding factors. We collected data for Germany from publicly available sources (e.g. Robert Koch Institute, Germany’s National Meteorological Service, Google) for 401 German districts over the period of 15 February to 8 July 2020, and estimated total causal effects based on our DAG analysis by negative binomial regression. Our analysis revealed favorable effects of increasing temperature, increased public mobility for essential shopping (grocery and pharmacy) or within residential areas, and awareness measured by COVID-19 burden, all of them reducing the outcome of newly reported COVID-19 cases. Conversely, we saw adverse effects leading to an increase in new COVID-19 cases for public mobility in retail and recreational areas or workplaces, awareness measured by searches for “corona” in Google, higher rainfall, and some socio-demographic factors. Non-pharmaceutical interventions were found to be effective in reducing case numbers. This comprehensive causal graph analysis of a variety of determinants affecting COVID-19 progression gives strong evidence for the driving forces of mobility, public awareness, and temperature, whose implications need to be taken into account for future decisions regarding pandemic management.

Introduction

As the COVID-19 pandemic progresses, research on mechanisms behind the transmission of SARS-CoV-2 shows conflicting evidence [13]. While effects of mobility have been extensively discussed, less is known on other factors such as changing awareness in the population [46] or the effects of temperature [79]. A limiting factor in many studies is the lack of a causal approach to assess the causal contributions of various factors [10]. This can lead to distorted estimates of the causal factors with observational data [1012].

With COVID-19, we find ourselves in a situation in which information on the causal contribution of various influencing factors in the population is urgently needed to inform politicians and health authorities. On the other hand, trials cannot be carried out for obvious ethical and legal reasons. Therefore, when assessing the effects of determinants of SARS-CoV-2 spread, special attention must be paid to strategies for the selection of confounding factors.

Another problem with assessing the effects of various determinants of SARS-CoV-2 spread is the heterogeneity of the countries and regions examined for example in the Johns Hopkins University (JHU) COVID-19 database [13]. The comparison of time series of case numbers from different countries and observational periods can be strongly distorted by different factors like testing capacities and regional variations.

Our objective is to provide valid estimates of the effects of the main drivers of the pandemic with a causal graph approach. We conducted a scoping review of the available studies regarding signaling pathways and determinants of the spread of SARS-CoV-2 infections and the reported new COVID-19 cases. Then we integrated the current findings into a directed acyclic graph for the progress of the pandemic at the regional level. Using the resulting model and the do-calculus we found identifiable effects without blocked causal paths whose effects can be analyzed with observational data [14]. We used regional time series data of all German districts (401) from various publicly available sources to analyze these questions on a regional level. Germany is a good choice in this regard, because it has ample data on contributing factors on the regional level and has had high testing and treatment capacities from early on in the pandemic.

Causal model

We used a directed acyclic graph (DAG) [11, 12] as a tool to analyze the causal relationships between several exposures and SARS-CoV-2 spread. To get an overview on published associations, a scoping review was conducted from 20th to 22nd of May 2020 within Pubmed and Google scholar. Restrictions were applied to English and German language and the publication date in the last one year. The following search terms were applied to abstracts and title in Pubmed (“COVID-19” OR “COVID19” OR “Corona” OR “Coronavirus” OR “SARS-CoV-2”) and connected separately in each case with the exposure variables (“mobility”, “public awareness”, “awareness”, “google trends”,“ambient temperature”, “temperature”). For “mobility”, we analyzed n = 8 studies, N = 103 were scanned in Pubmed, together with the first ten pages (100 results) in Google scholar (“awareness”/“public awareness”/“google trends” n = 9, N = 215; “temperature”/“ambient temperature” n = 16, N = 235). We integrated these findings where possible into the construction of our DAG, which can be seen in Fig 1.

thumbnail
Fig 1. DAG of determinants of reported COVID-19 cases on the district level.

Unobserved variables are light gray, variables marked with an asterisk (*) are confounded by weekday/holiday.

https://doi.org/10.1371/journal.pone.0237277.g001

A number of studies report a strong association of mobility restrictions on the number of new COVID-19 cases: Restrictive measures (e.g. “stay-at-home” orders, travel bans, or school closures) are shown to possibly reduce the COVID-19 incidence [2, 1521]. However, some studies point out the combination of various non-pharmaceutical interventions (NPIs) is decisive to prevent new infections [22, 23].

Google Trends [24] data can be used as a tool to get insights into public interest (awareness) in the coronavirus disease. Several recent studies imply a connection of relative search volumes (RSV) indices and reported new COVID-19 cases [46, 2530]. Some search terms e.g. “COVID-19” or “coronavirus” predated newly infected cases/total number of cases by roughly 7 to 14 days for different countries [46, 26]. Additionally, we acknowledged that individual risk-aware behavior might be a reaction to the current COVID-19 burden (measured as reported cases at the day of exposure).

Mixed evidence is available regarding the effect of temperature: On the one hand several papers report an association between increase in temperature and decrease in newly infected COVID-19 cases [79, 3136]. On the other hand, also the opposite has been found [37, 38]. Some studies found no association at all [22, 3942]. It should be noted that few studies considered other confounding variables than meteorological ones (especially age and population density among others [22, 36, 39]). In addition, the transferability of results between different climate zones is questionable. To avoid possible bias caused by weather variables other than temperature, we included rain, wind, and humidity in our model.

When investigating causal determinants of SARS-CoV-2 infections, a number of confounders have to be considered. Well-known risk factors for SARS-CoV-2 as well as for other infections are demographic factors such as age, gender, socio-economic status (SES), population density, and foreign citizenship/ethnicity [13, 43, 44]. In Germany along with other countries (i.e. Brazil, USA, or the UK), populist parties or politicians and their electorate tend to be more sceptical about effects of containment measures than the other part of the electorate [45, 46]. Therefore we considered both “right-wing populist party votes” and “voter turnout” as possible confounders. Public health interventions were also taken into account (contact restrictions, school closures etc.), as their implementation showed strong correlations with controlling the spread of SARS-CoV-2 [22, 23, 47]. To avoid bias due to reporting delay of case numbers we had to include weekday and German holidays. We included some unobserved variables in our DAG (e.g. “Herd immunity”), too. Please note that “Exposure to SARS-CoV-2” is itself an unobserved variable: German case numbers are reported with delay after date of exposure and symptom onset. Exposure to the virus should not be confused with the formal exposure variables of the DAG.

Materials and methods

Data

We collected and aggregated data on reported COVID-19 cases, regional socio-demographic factors, weather, and general mobility on district and state level in Germany for the period of 15 February 2020 to 8 July 2020. Our observation period for the outcome consisted of all dates from 20 February 2020 to 8 July 2020 (T = 140), since we used a lag of 5 days for all confounders. We did not exclude any states or districts (K = 401). We analyzed the daily reported number of new cases as outcome (KT = 56 140 observations). The set of possible predictors was derived from our causal DAG (see Table 1 and Fig 1). Due to modelling and data limitations, some of the predictors were unobserved or were modelled as a construct consisting of several variables. For our causal graph analysis, we computed adjustment sets separately for all observed exposures within the DAG (if the respective exposure was identifiable within the DAG causal analysis framework).

Variables.

We downloaded German daily case numbers on district level reported by Robert Koch Institute (RKI, [48]) and aggregated them by date. The number of daily active cases for day d was derived by subtracting the total number of reported cases on day d and day d − 14 (14 days as a conservative estimate for the infectious period, which corresponds here to the required quarantine time in Germany).

To assess the mobility of the German population, we used data publicly available on German state level from Google [49]. Measurements are daily relative changes of mobility in percent compared to the period of 3 January 2020 to 6 February 2020. Missing values (25 out of 13 488) were imputed with value 0 and the state level measurements were passed onto districts within the corresponding state. Google mobility data was available for six different sectors of daily life (“retail and recreation”, “grocery and pharmacy”, “parks”, “transit stations”, “workplaces”, “residential”) which means that “mobility” is a construct consisting of several variables. All variables but “residential” mobility are relative changes of daily visitor numbers to the corresponding sectors compared to the reference period. “Residential” mobility is the relative change of daily time spent at residential areas. The six mobility variables showed high correlations among each other and with other variables. To reduce multicollinearity, we transformed them by principal component analysis (PCA) into six uncorrelated principal components which were used in place of the original variables.

The notion of awareness in the population of COVID-19 describes the general state of alertness about the new infectious disease. As such, it was hard to measure directly. As a proxy, we used the relative interest in the topic term “corona” as indicated by Google searches. The daily data was available on state level [24] and passed onto district level. As a second proxy for awareness, we used the daily reported number of COVID-19 cases on the day of the exposure: Since media reported case numbers prominently, we assumed that this could reflect individual awareness, too.

We constructed daily weather from four variables (“temperature”, “rainfall”, “humidity”, “wind”). Weather data was downloaded from Deutscher Wetterdienst (DWD, [50]) for all weather stations in Germany below 1000 meters altitude with daily records for our observation period. District level daily weather data was aggregated per district by averaging the data from the three nearest weather stations (which includes weather stations inside the district). Missing values were imputed with mean values (n = 59 for wind).

The reported number of COVID-19 cases varied strongly by day of the week. Thus, we included “weekday” as a categorical variable. Similarly, the reported cases and the exposure to the virus were affected by official holidays. Within the observation period, this included among others Good Friday, Easter Monday, and Labor Day. To correct for effects of these days, we included two variables in the model, “Holiday (report)” (indicates if the day of the report was a holiday, because governmental health departments were less likely to be on full duty) and “Holiday (exposure)” (indicates if the day of exposure to the virus was a holiday, because the population behaves differently on holidays).

For different official and political interventions on a daily basis and the district level we used one-hot encoded daily variables, i.e. ban of mass gatherings, school and kindergarten closures and their gradual reopening, contact restrictions, and mandatory face masks for shopping and public transport.

We included several social, economic, and demographic factors on the district level with direct or indirect influence on the risk of exposure to SARS-CoV-2 in our analysis. All are readily available from INKAR database [51]. We used the share of population that is 65 years or older and the share of population that is younger than 18 years (Age), the share of females in population (Gender), the population density, the share of foreign citizenships and the share of the population seeking refuge (Foreign citizenship), the share of low-income households (Socio-economic status), voter turnout, share of right-wing populist party votes, and the number of nursing (retirement) homes.

All continuous variables but the outcome “Reported new cases of COVID-19” and the offset “Active cases” were centered and scaled by one standard deviation for numerical stability, while we left binary variables as-is. After estimating the effects of variables, we re-scaled continuous variables’ effects to their original scale. Additionally for mobility variables, we re-transformed the effects of the principal components to the original mobility variables. Furthermore, we lagged the effect of all variables (but outcome, offset, and the non-dynamic socio-demographic variables) by 5 days (optimal lag found by cross-validation) which means that we assumed that their effects on the outcome will be visible after 5 days.

Methods

Causal analysis with DAG and adjustment sets.

We used a directed acyclic graph as a graphical representation of the hypothesized causal reasoning that leads to exposure to the SARS-CoV-2 virus, onset of COVID-19, and finally reports of COVID-19 cases. We use the terms “causal effect” or “causal relationship” for effect estimates that are based on this causal graph framework. Every node vi in the graph is the graphical representation of an observed or unobserved variable xi, a directed edge eij is an arrow from node vi to vj that implies a direct causal relationship from variable xi onto variable xj. The set of all nodes is denoted by V, the set of all edges by E, as such, the complete DAG is the tuple G = (V, E). The seminal works of Spirtes and Pearl [53, 54] introduce the theory of causal analysis, do-calculus, and how to analyze a DAG to estimate the total or direct causal effect from a variable xi onto a variable xj. The direct effect is the effect associated with the edge eij only (if it exists), while the total effect takes indirect effects via other paths from vi to vj into account, too. Here we estimated total effects only, since most of our variables were not hypothesized to have a direct effect on the reported number of new COVID-19 cases. In contrast to prediction tasks, where one would include all variables available, it is actually ill-advised to use all available variables to estimate causal effects, due to introducing bias by adjusting for unnecessary variables within the causal DAG. This is why we need to identify a valid set of necessary variables (an adjustment set) to estimate the proper causal effect [54]. The “minimal adjustment set” [55] is a valid adjustment set of variables that does not contain another valid adjustment set as a subset. However, identifying a minimal adjustment set might not be enough to reliably estimate the causal effect. Thus, we identified the “optimal adjustment set” [56] as the set of variables which is a valid adjustment set while having the lowest Akaike information criterion (AIC).

We analyzed the DAG from Fig 1 with the R Software [57] and the R packages dagitty (formal representation of the graph and minimal adjustment sets [12]) and pcalg (for finding an optimal adjustment set [58]). For the defined exposures and the outcome “Reported new cases of COVID-19”, we computed the minimal and optimal adjustment sets. Since it was possible that these sets contained unobserved variables that needed to be left out of the regression model, we chose the valid set with the lowest AIC (see next section) to estimate the final total causal effect from exposure to outcome.

Regression with negative binomial model.

We can estimate the causal effect from exposure to outcome by regression [54]. Since the outcome “Reported new cases of COVID-19” is a count variable, one should not employ a linear regression model with Gaussian errors, but instead we assumed a log-linear relationship between the expected value of the outcome Y (new cases) and regressors x, as well as a Poisson or negative binomial distribution for Y: (1) where α is the regression intercept, S is the set of adjustment variables for the exposure i* including the exposure variable itself, βi are the regression coefficients corresponding to the variables xi. As such βi* is the total causal effect from exposure variable xi* on the outcome Y.

The Poisson regression assumes equality of mean and variance. If this is not the case one observes so-called overdispersion (the variance is higher than the mean), this indicates one should use regression with a negative binomial distribution instead to estimate the variance parameter separately from the mean.

We needed to account for the fact that our outcome is not counted per time unit (one day) only, but depends on the number of active COVID-19 cases: Holding all other variables fixed, the number of new cases Y is a constant proportion of the number of active cases A. This was modeled by including an offset log(A + 1) in the regression model Eq (1): (2) (3)

Here we added a pseudocount “+1” to ensure a finite logarithm and avoid division by 0.

One can interpret the model as approximating the log-ratio of new cases and active cases by a linear combination of the regressor variables in Eq (2). If all variables xi are centered in Eq (3), we have for the baseline . In other words, the exponentiated intercept is the baseline daily infection rate (how many people does one infected individual infect in one day). If we hold all variables xi fixed (e.g. at baseline 0) in Eq (3) but now increase the exposure variable xi* = 0 by one unit to xi* + 1 = 0 + 1, we have which means the exponentiated coefficient βi* describes the rate change of the outcome by one unit increase of the exposure.

In practice, given observations of Y and x we estimate the regression coefficients α and βi by maximum likelihood [59]. Our observational measurements are ykt and xikt, where k indicates the corresponding district and t the date of measurement.

We conducted a log-linear regression (function glm with family = poisson() for Poisson regression, and glm.nb from the MASS package for the negative binomial regression [60]) for the full data set to assess general model adequacy and to estimate the θ parameter of the negative binomial. The proper lag between exposures and outcome was found by 10-fold cross-validation on different lags between 1 and 20 days. Model diagnostics on the final full model did not show severe problems with model assumptions (linearity, distribution of residuals, independence of observations). Analysis of variance inflation factors revealed some problems with multicollinearity. To reduce the effects of multicollinearity, first we transformed the highly correlated mobility variables by PCA as described above. Second, we used a ridge regression approach [61], which is a regularization method that shrinks regression coefficients and alleviates the effect of correlation between variables on their respective regression coefficients. Furthermore, regularized regression allows for better fits on unseen data, thus preventing overfitting the data, too. The hyper-parameter λ of the ridge regression was chosen by 10-fold cross-validation, where the folds were constructed from random subsets of the 401 districts. We used this hyper-parameter with the cv.glmnet function from the R package glmnet [62] with family = negative.binomial(theta) and chose the λ value within one standard deviation from the minimal λ as regularization hyper-parameter. Afterwards, we calculated the effects of separate exposures on the outcome. For every exposure, we analyzed the different valid adjustment sets given by analysis of the causal DAG (i.e. the minimal and optimal adjustment sets). Then, we first checked if the respective set included unobserved variables. If this was the case for the optimal adjustment set, we discarded the unobserved variables from the set and checked if it was still a valid adjustment set (function gac in package pcalg [63]). If a minimal adjustment set contained unobserved variables, we discarded the whole set. If no valid adjustment set for a given exposure was available, we concluded that the effect of this exposure was unidentifiable within our causal graph. We used the function glmnet with the parameters θ and λ as above on every remaining valid adjustment set as regressors (that is, we applied ridge regression) and calculated the Akaike information criterion (AIC) for this model/set of regressors. Finally, for every exposure, we decided for the model/adjustment set (if available) with the lowest AIC. We report the exponentiated estimated coefficients for the separate exposures on their original scale.

Results

Descriptive statistics for the included variables are presented in Table 2.

In the observational period, the number of daily reported COVID-19 cases increased till the end of March/beginning of April and continually decreased afterwards till the beginning of June 2020 with a slight increase and decrease afterwards (Fig 2A). On the other hand, the (log-)ratio of reported cases over active cases decreased steeply till the mid of April and increased steadily afterwards with a slight decrease close to the end of the observation period (Fig 2B). Both figures examplify a considerable variation among the districts (light blue points are individual district’s data).

thumbnail
Fig 2. Temporal and district level variation of outcome (log-scale).

https://doi.org/10.1371/journal.pone.0237277.g002

In Germany, we observed a rebound in mobility after the initial political measures, reductions in incident cases were associated with a diminishing public interest in COVID-19, and temperatures were overall increasing (cf. Fig 3); with correlations between temporal progression and mobility in retail and recreation rA,B = 0.02, awareness (“Searches corona”) rA,C = -0.3, and temperature rA,D = 0.8.

thumbnail
Fig 3. Temporal variation of outcome and main determinants.

https://doi.org/10.1371/journal.pone.0237277.g003

Main results

We list the results of our causal analysis for the effects of different exposure variables in Table 3. The estimates are multiplicative rates of increase/decrease for a one unit increase of the respective variable: Values above 1 lead to an increase, below 1 to a decrease of the infection rate. To put these estimates into perspective, Fig 4 shows the relative causal effect of the different exposure variables on the number of reported COVID-19 cases on a range of sensible values of the exposure variables (95 percent quantiles of data points).

Within our framework, we saw very different effects for individual mobility variables. For mobility in retail/recreation, an increase of 1 percent point mobility compared to the reference period (03 January to 06 February 2020) leads to an increase of the daily reported case number by about 0.11 percent. Similarly, mobility on workplaces showed an effect of 0.33 increase in case numbers for every 1 percent point increase in mobility, while mobility on transit stations showed an effect of 0.26 increase in case numbers for every 1 percent point increase. Contrarily, the remaining three mobility variables showed negative effects on the number of reported COVID-19 cases. An increase of 1 percent point mobility for the areas of grocery/pharmacy leads to a decrease in the reported case number by approximately 0.23 percent, while increased mobility of 1 percent point within parks leads to a decrease in the reported case number by approximately 0.03 percent, and finally an increase of 1 percent point in residential mobility leads to a decrease by approximately 0.97 percent. Fig 4 shows the effects of mobility on a range of possible values. Thus, we expect an increase of daily cases by approximately 7.8 percent if mobility in workplaces reaches baseline levels of 0 percent difference to the reference period. On the other hand, an increase of mobility for residential areas by 10 percent points compared to the reference period leads to a reduction of the infection rate by approximately 1.8 percent.

“Awareness” had two opposite effects on the outcome in our DAG. Awareness measured by Google searches for corona had a positive effect on the number of reported cases. An one percent point increase of the state’s Google searches (relative to other states and the observation period) leads to an increase of approximately 0.89 percent. For example, if a district shows 10 percent points more relative searches for corona than another one, we expect approximately 9.3 percent more infections for this district after 5 days. COVID-19 burden (reported number of cases on day of exposure) affected the outcome negatively, where every additional daily case in the district leads to a 0.2 percent decrease in newly reported case numbers. The corresponding plot in Fig 4 visualizes this relationship: For a local outbreak with 20 daily cases as COVID-19 burden, we estimate as total causal effect a subsequent reduction of infection rate by 3.9 percent.

Within our model, we observed effects of temperature and all other weather variables. Every increase of 1 degree Celsius in temperature leads to a reduction of the daily reported case numbers by approximately 0.95 percent. On the other hand, we found an increasing effect of rainfall: One millimeter (=1 liter per square meter) more rainfall leads to an increase of reported case numbers by approximately 1.21 percent. We observe effects for humidity and wind as well (higher humidity and stronger wind leading to more cases). In perspective (Fig 4), with temperature we expect an increase by approximately 21 percent at a daily average temperature of 0°C compared to a day with 20°C. For rainfall, we expect on a rainy day with 10 mm rainfall a corresponding increase of the infection rate by approximately 12.8 percent compared to a day with no precipitation.

The different intervention variables showed the strongest effects in our analysis, see Table 3. While the first intervention (ban of mass gatherings) reduced subsequent daily case numbers by 2.7 percent, the closure of schools/kindergartens reduced infections by an additional 7.2 percent and mandatory face masks reduced this by another 9.4 percent. The effect of contact restrictions was the strongest in our observation period, with an reduction of the case rates by 16.9.

The effects of the different socio-demographic factors are quite small in comparison to the effects described above. We see an increasing effect on case numbers by additional nursing homes between districts. Districts with a younger population, more foreign citizens, higher population density and a lower average social-economic status showed higher case numbers, too.

For all exposures, our analysis pipeline opted to use the (reduced) optimal adjustment set over the minimal adjustment sets because of lower AICs, except for exposure variable “nursing homes”, for which the minimal adjustment set had the lowest AIC. For an overview of all final adjustments sets, see Table 4. We found that there were no valid adjustment sets for the non-identifiable variables turnout and right-wing populist party votes.

We decided for a lag of 5 days based on cross-validation. Similarly, negative binomial regression was chosen over Poisson regression, because the latter showed overdispersion and an higher AIC value.

Discussion

Main findings

Our objective was to identify effects of determining factors for COVID-19 cases within a causal framework. We found that weather affects the reported number of infections, especially temperature (which has a reducing effect on case numbers) and rainfall (which increases case numbers). We saw that reports of high case numbers in districts led to a reduction in new infection numbers, which indicates risk-averse awareness in the population and/or effective public health measures to suppress a local outbreak. Mobility showed distinct effects: Increasing activity in retail and recreational areas, as well as transit stations and workplaces increased reported case numbers, while increased movement for essential shopping (grocery and pharmacy) and in parks or residential areas led to reduced case numbers. All interventions considered (ban of mass gatherings, school/kindergarten closures, contact restrictions, mandatory face masks) reduced case numbers considerably. Socio-demographic variables had small effects individually, but in conjunction they explained larger case numbers in (urban) areas with younger population, lower socio-economic status, and higher population density.

Furthermore, we made a strong case for the use of causal DAGs in epidemiology and a pandemic like COVID-19: DAGs allow to choose confounders for the analysis in a principled and statistically correct way while reducing possible causes for bias. Also, the DAG formalization allows for discussion about the underlying causal assumptions.

Comparison with previous research

Most research on determinants affecting case numbers of COVID-19 is restricted to single aspects [5, 16, 32, 35]. To reliably identify (causal) drivers, one must adjust for confounders. To this end, we used an integrated model with variables from different aspects like mobility, awareness, weather, or socio-demographics and identified confounders by causal analysis with a directed acyclic graph. A causal approach is used in another current COVID-19 analysis [64]. There, however, they identify the causal relationships (reconstruct a DAG), while we estimated effects for a given hypothesized causal DAG.

Several studies assessing the impact of public health measures on mobility have each observed a downward trend accompanied by a decrease in the number of newly reported cases [1517, 19, 23, 47].

Our findings regarding awareness/Google Trends analysis are in good agreement with the correlations found by others [4, 6, 26], who conclude that alertness to COVID-19 rises several days before the highest number of cases are reported. At this point it should be noted, that awareness is substantially influenced by public media coverage, which should be considered, if possible, in future studies [4]. As such, awareness is difficult to measure and here the number of Google searches for “corona” could only be a proxy for this concept.

In addition, in alignment with other recent published studies, our results confirm evidence which associated a negative effect of temperature on new COVID-19 cases [79, 3136]. It is however controversial to other scientific literature describing no effects [22, 3942] or even converse correlations [37, 38]. The conflicting results might be explained by different climates and characteristics of the populations under study. While we are confident that our strict causal analysis resulted in effect estimates as undistorted as possible, there might be unconsidered bias in those other studies. Further research needs to be done to elucidate the biological characteristics of the novel virus SARS-CoV-2 regarding its ambient temperature survival and transmission. Finally, we found a positive effect of increased precipitation and a raise in COVID-19 cases, which supports previous observations [33].

A recent review on COVID-19 based on evidence from the US and UK concludes that low socio-economic status groups are being hit harder by the pandemic [65]. Albeit specific pathways remain unclear, many studies found associations with poverty or its correlates such as poor and potentially overcrowded housing conditions. For Germany, a higher case fatality of COVID-19 cases in districts with higher socio-economic deprivation has also been reported just recently, which was especially pronounced in the second wave of the pandemic [66]. Similarly, our analysis identified a decreasing effect on COVID-19 case numbers within districts with a higher socio-economic status during the first wave.

Limitations and strengths

While use of a causal DAG is itself a strong tool to identify causal effects (and not just statistical associations), it introduces two limitations: causal assumptions within the graph (depicted by edges) need to be well justified, and the statistical regression model that calculates total causal effects needs to be appropriate for the task at hand. We endorse our graph as a basis for discussion on residual confounding. We did not try to construct the DAG from the available data (cf. [64]). As such, our proposed DAG is not entirely consistent with the data and there are conditional dependencies between variables that cannot be dissolved by adding edges to the DAG (e.g. between the interventions like contact restrictions and mandatory face masks). Another way to identify potential problems in the proposed DAG is to perform a sensitivity analysis of its structure by inspecting its maximal ancestral graph (MAG) or its Markov equivalence class represented by a complete partially DAG (CPDAG) and the existence of valid adjustment sets for these generalized graphs [67]. For the MAG derived from our DAG, only the effects for exposures mobility and searches for corona can be estimated with valid adjustment sets, while for the Markov equivalence class all exposures but COVID-19 burden lead to valid adjustments sets. A further analysis of these implications is out of the scope of this paper.

We observed overdispersion and a substantial increase in model performance with a negative binomial regression compared to Poisson regression, which is in line with the results on COVID-19 daily case counts of [17] and others [7, 9, 68]. We did not model case counts with a differential equation model like the classic SIR-model [69] and its successors, since these are more suited to prediction e.g. [70], while our choice of a negative binomial regression framework allowed us to estimate the effects of confounders more reliably. There are more advanced statistical methods for count data, e.g. zero-inflated models and mixed models. We tested both approaches as extensions to the negative binomial regression and experienced numerical problems and increased computing time, along with an insubstantial increase in model performance. Furthermore, our model assumed that all variables have effects proportional to the size of their measurements. It is possible that some variables show saturation effects or opposite effects for low, medium, or high values. This could be modeled with polynomial or other transformations of the variables, which we did not employ due to limited temporal and spatial data availability. Interaction effects of variables and confounding effects or mediating variables are explicitly taken care of by deriving the valid adjustment sets for a given exposure based on the causal DAG. Use of a fixed DAG with effect estimation via regression assumes that data was generated by the same underlying process for the observation period. By inclusion of the successive mitigation interventions as binary variables we were able to explain some of the variance caused by the changing dynamics of case numbers (similar to [68]). While multicollinearity of variables poses less of a problem for a proper causal graph analysis [71], we addressed the problem of multicollinearity in our predictors by two approaches: principal component analysis for the highly collinear mobility variables as well as a regularized regression approach (ridge regression). The latter (in conjunction with cross-validation) also reduced the problem of overfitting.

We stress the point that our effects were deduced on an aggregate (district) level in the absence of available data on an individual level. As such, conclusions about effects cannot be transferred on individuals without the possibility for an ecological fallacy. Furthermore, as we were using administrative data for our analysis, the results are susceptible to the Modifiable Area Unit Problem (MAUP) [72]. The MAUP postulates that different regional aggregations of the units of observation may lead to different results and conclusions. Due to limited available data for the different variables, there is currently no way to overcome these problems that are inherent to all analyses on aggregated data level.

Our observation period was restricted to succession from late winter to spring and summer (February to July). Nevertheless, this transition with increasing temperature was a natural experiment that allowed clues on weather effects.

We could not include data on health care utilization during the pandemic into our models due to the lack of available resources. This is planned for a later follow up to this paper since we rank health care utilization and mobility within health care facilities among the strong factors for COVID-19 progression: personnel in hospitals and private practices is particularly exposed to infection, while the lack of adequate care for other diseases has severe effects on general health of the population. At the same time, health care facilities are key for testing and surveillance of COVID-19 patients.

Social determinants of health are important factors to consider in an epidemiological framework of a pandemic disease like COVID-19. To account for this problem, we included several socio-economic confounders that were available on a district level in Germany. While our analysis is not an exhaustive analysis of the effects of social determinants on COVID-19 infections, we emphasize the necessity of their inclusion and our results add to the growing body of evidence that these factors interact with each other and cluster especially among people or within areas of underprivileged conditions, with detrimental effects on population health [73].

While our analysis focused on Germany and its districts, we assume that results may be transferred to other countries by adjusting for their respective weather conditions, mobility habits, socio-demographic characteristics, and other determining factors.

The code and resources for our analysis are available on Github, we invite other researchers to replicate our analysis with different assumptions using the files provided in the repository of the article (https://github.com/zidatalab/causalcovid19).

Discussion of causal effects

In our analysis, the adverse effects of mobility in retail/recreation and workplaces and the favorable effect of mobility in grocery/pharmacy and residential areas indicate that interventions like contact restrictions which limit the number of individual interactions can lead to reduced infection numbers. This is due to retail/recreational and workplace areas encompassing mostly places of (social) gatherings, while if people are doing more of their essential shopping at supermarkets and stay at home with less contact to other people, they are less likely to come in contact with infected individuals.

The effects of awareness measured via searches for “corona” and the COVID-19 burden are harder to interpret. We assume that within our model, the searches for “corona” are an insufficient proxy for awareness, while the decreasing effect for future case numbers of high daily COVID-19 burden indicates it affects individual risk-behavior and entails effective non-pharmaceutical interventions.

Similarly, the effects of temperature and rainfall can be interpreted as causal effects for indoor and outdoor activities, such that higher temperatures and low rainfall indicate more people spending time outdoor while lower temperatures and high rainfall result in indoor activities, which lead to more infections. Current research suggests this to be due to the prevalent airborne and respiratory droplets and aerosol transmission of the SARS-CoV-2 virus [74]. In this light, we advocate for precautious measures like increased hygiene, face masks, and air ventilation for unavoidable indoor activities.

Furthermore, our analyses strongly support the effectiveness of non-pharmaceutical interventions. To a lesser extent, the adverse effects of some socio-demographic factors might help to identify areas that are at higher risk of local COVID-19 outbreaks and more severe outcomes of infection cases.

Conclusion

To the best of our knowledge, this is the most comprehensive analysis of causes for COVID-19 infections which integrates different data sources (all publicly available). Causal reasoning with a DAG allows us to estimate the possible causal effects more reliably.

Our findings suggest that the infection-driving effects of mobility, awareness, and weather (and to some extent socio-demographic factors) need to be taken into account when deciding for mitigation and suppression interventions, depending on the recent and future COVID-19 pandemic development.

Acknowledgments

We are thankful for feedback from Thomas Czihal, Johannes Textor, Ralph Brinks, and an anonymous reviewer who gave helpful suggestions on earlier versions of the manuscript.

References

  1. 1. WHO Team. Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19); 2020, accessed 2020-06-25. Available from: https://www.who.int/publications-detail/report-of-the-who-china-joint-mission-on-coronavirus-disease-2019-(covid-19).
  2. 2. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. pmid:32144116
  3. 3. Guan W, Ni Z, Hu Y, Liang W, Ou C, He J, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. New England Journal of Medicine. 2020;
  4. 4. Higgins TS, Wu AW, Sharma D, Illing EA, Rubel K, Ting JY. Correlations of Online Search Engine Trends With Coronavirus Disease (COVID-19) Incidence: Infodemiology Study. JMIR Public Health and Surveillance. 2020;6(2):e19702. pmid:32401211
  5. 5. Li C, Chen LJ, Chen X, Zhang M, Pang CP, Chen H. Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020. Euro Surveillance. 2020;25(10).
  6. 6. Yuan X, Xu J, Hussain S, Wang H, Gao N, Zhang L. Trends and Prediction in Daily New Cases and Deaths of COVID-19 in the United States: An Internet Search-Interest Based Model. Exploratory research and hypothesis in medicine. 2020;5(2):1–6. pmid:32348380
  7. 7. Bannister-Tyrrell M, Meyer A, Faverjon C, Cameron A. Preliminary evidence that higher temperatures are associated with lower incidence of COVID-19, for cases reported globally up to 29th February 2020. medRxiv. 2020; https://doi.org/10.1101/2020.03.18.20036731
  8. 8. Demongeot J, Flet-Berliac Y, Seligmann H. Temperature Decreases Spread Parameters of the New Covid-19 Case Dynamics. Biology. 2020;9(5). pmid:32375234
  9. 9. Liu J, Zhou J, Yao J, Zhang X, Li L, Xu X, et al. Impact of meteorological factors on the COVID-19 transmission: A multi-city study in China. Science of the Total Environment. 2020;726:138513. pmid:32304942
  10. 10. Greenland S, Robins JM, Pearl J. Confounding and Collapsibility in Causal Inference. Statistical Science. 1999;14(1):29–46.
  11. 11. Schipf S, Knüppel S, Hardt J, Stang A. Directed Acyclic Graphs (DAGs)—Die Anwendung kausaler Graphen in der Epidemiologie. Gesundheitswesen. 2011;73(12):888–892. pmid:22193898
  12. 12. Textor J, van der Zander B, Gilthorpe MS, Liśkiewicz M, Ellison GT. Robust causal inference using directed acyclic graphs: the R package ‘dagitty’. International Journal of Epidemiology. 2017;45(6):1887–1894.
  13. 13. Center for Systems Science and Engineering (CSSE). COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University; 2020. Available from: https://github.com/CSSEGISandData/COVID-19.
  14. 14. Pearl J, Bareinboim E. External Validity: From Do-Calculus to Transportability Across Populations. Statistical Science. 2014;29(4):579–595.
  15. 15. Chang MC, Kahn R, Li YA, Lee CS, Buckee CO, Chang HH. Variation in human mobility and its impact on the risk of future COVID-19 outbreaks in Taiwan. BMC Public Health. 2021;21(1):226. pmid:33504339
  16. 16. Fowler JH, Hill SJ, Obradovich N, Levin R. The Effect of Stay-at-Home Orders on COVID-19 Cases and Fatalities in the United States. medRxiv. 2020; https://doi.org/10.1101/2020.04.13.20063628
  17. 17. Kraemer MUG, Yang CH, Gutierrez B, Wu CH, Klein B, Pigott DM, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science (New York, NY). 2020;368(6490):493–497.
  18. 18. Lasry A, Kidder D, Hast M, Poovey J, Sunshine G, Winglee K, et al. Timing of Community Mitigation and Changes in Reported COVID-19 and Community Mobility—Four U.S. Metropolitan Areas, February 26-April 1, 2020. MMWR Morbidity and mortality weekly report. 2020;69(15):451—457. pmid:32298245
  19. 19. Linka K, Peirlinck M, Sahli Costabal F, Kuhl E. Outbreak dynamics of COVID-19 in Europe and the effect of travel restrictions. Computer Methods in Biomechanics and Biomedical Engineering. 2020; p. 1–8. pmid:32367739
  20. 20. Mazzoli M, Mateo D, Hernando A, Meloni S, Ramasco JJ. Effects of mobility and multi-seeding on the propagation of the COVID-19 in Spain. medRxiv. 2020; https://doi.org/10.1101/2020.05.09.20096339
  21. 21. Xiong C, Hu S, Yang M, Younes HN, Luo W, Ghader S, et al. Data-Driven Modeling Reveals the Impact of Stay-at-Home Orders on Human Mobility during the COVID-19 Pandemic in the U.S. arXiv e-prints. 2020; p. arXiv:2005.00667.
  22. 22. Jüni P, Rothenbühler M, Bobos P, Thorpe KE, da Costa BR, Fisman DN, et al. Impact of climate and public health interventions on the COVID-19 pandemic: A prospective cohort study. Canadian Medical Association Journal. 2020; pmid:32385067
  23. 23. Lai S, Ruktanonchai NW, Zhou L, Prosper O, Luo W, Floyd JR, et al. Effect of non-pharmaceutical interventions to contain COVID-19 in China. Nature. 2020; pmid:32365354
  24. 24. Google LLC. Google Trends, search term “corona”; 2020, accessed 2020-06-25. Available from: https://www.google.com/trends.
  25. 25. Ayyoubzadeh SM, Ayyoubzadeh SM, Zahedi H, Ahmadi M, R Niakan Kalhori S. Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study. JMIR Public Health and Surveillance. 2020;6(2):e18828. pmid:32234709
  26. 26. Effenberger M, Kronbichler A, Shin JI, Mayer G, Tilg H, Perco P. Association of the COVID-19 pandemic with Internet Search Volumes: A Google Trends(TM) Analysis. International Journal of Infectious Diseases. 2020;95:192–197. pmid:32305520
  27. 27. Lin YH, Liu CH, Chiu YC. Google searches for the keywords of “wash hands” predict the speed of national spread of COVID-19 outbreak among 21 countries. Brain, Behavior, and Immunity. 2020; pmid:32283286
  28. 28. Mavragani A. Tracking COVID-19 in Europe: Infodemiology Approach. JMIR Public Health and Surveillance. 2020;6(2):e18941. pmid:32250957
  29. 29. Walker A, Hopkins C, Surda P. Use of Google Trends to investigate loss-of-smell-related searches during the COVID-19 outbreak. International Forum of Allergy & Rhinology. 2020;10(7):839–847. pmid:32279437
  30. 30. Zhou WK, Wang AL, Xia F, Xiao YN, Tang SY. Effects of media reporting on mitigating spread of COVID-19 in the early phase of the outbreak. Mathematical Biosciences and Engineering. 2020;17(3):2693–2707. pmid:32233561
  31. 31. Qi H, Xiao S, Shi R, Ward MP, Chen Y, Tu W, et al. COVID-19 transmission in Mainland China is associated with temperature and humidity: A time-series analysis. Science of the Total Environment. 2020;728:138778. pmid:32335405
  32. 32. Shi P, Dong Y, Yan H, Zhao C, Li X, Liu W, et al. Impact of temperature on the dynamics of the COVID-19 outbreak in China. Science of the Total Environment. 2020;728:138890. pmid:32339844
  33. 33. Sobral MFF, Duarte GB, da Penha Sobral AIG, Marinho MLM, de Souza Melo A. Association between climate variables and global transmission of SARS-CoV-2. Science of the Total Environment. 2020;729:138997. pmid:32353724
  34. 34. Tosepu R, Gunawan J, Effendy DS, Ahmad LOAI, Lestari H, Bahar H, et al. Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia. Science of the Total Environment. 2020;725:138436. pmid:32298883
  35. 35. Wang M, Jiang A, Gong L, Luo L, Guo W, Li C, et al. Temperature significant change COVID-19 Transmission in 429 cities. medRxiv. 2020; https://doi.org/10.1101/2020.02.22.20025791
  36. 36. Wu Y, Jing W, Liu J, Ma Q, Yuan J, Wang Y, et al. Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries. Science of the Total Environment. 2020;729:139051. pmid:32361460
  37. 37. Auler AC, Cássaro FAM, da Silva VO, Pires LF. Evidence that high temperatures and intermediate relative humidity might favor the spread of COVID-19 in tropical climate: A case study for the most affected Brazilian cities. Science of the Total Environment. 2020;729:139090. pmid:32388137
  38. 38. Xie J, Zhu Y. Association between ambient temperature and COVID-19 infection in 122 cities from China. Science of the Total Environment. 2020;724:138201. pmid:32408450
  39. 39. Briz-Redón Á, Serrano-Aroca Á. A spatio-temporal analysis for exploring the effect of temperature on COVID-19 early evolution in Spain. Science of the Total Environment. 2020;728:138811. pmid:32361118
  40. 40. Iqbal N, Fareed Z, Shahzad F, He X, Shahzad U, Lina M. The nexus between COVID-19, temperature and exchange rate in Wuhan city: New findings from partial and multiple wavelet coherence. Science of the Total Environment. 2020;729:138916. pmid:32388129
  41. 41. Jahangiri M, Jahangiri M, Najafgholipour M. The sensitivity and specificity analyses of ambient temperature and population size on the transmission rate of the novel coronavirus (COVID-19) in different provinces of Iran. Science of the Total Environment. 2020;728:138872. pmid:32335407
  42. 42. Yao Y, Pan J, Liu Z, Meng X, Wang W, Kan H, et al. No association of COVID-19 transmission with temperature or UV radiation in Chinese cities. The European Respiratory Journal. 2020;55(5).
  43. 43. de Lusignan S, Dorward J, Correa A, Jones N, Akinyemi O, Amirthalingam G, et al. Risk factors for SARS-CoV-2 among patients in the Oxford Royal College of General Practitioners Research and Surveillance Centre primary care network: a cross-sectional study. The Lancet Infectious Diseases; https://doi.org/10.1016/S1473-3099(20)30371-6
  44. 44. Wahrendorf M, Rupprecht CJ, Dortmann O, Scheider M, Dragano N. Erhöhtes Risiko eines COVID-19-bedingten Krankenhausaufenthaltes für Arbeitslose: Eine Analyse von Krankenkassendaten von 1,28 Mio. Versicherten in Deutschland. Bundesgesundheitsblatt—Gesundheitsforschung—Gesundheitsschutz. 2021;64(3):314–321.
  45. 45. Dohle S, Wingen T, Schreiber M. Acceptance and Adoption of Protective Measures During the COVID-19 Pandemic: The Role of Trust in Politics and Trust in Science. Social Psychological Bulletin. 2020;15(4):1–23.
  46. 46. Engle S, Stromme J, Zhou A. Staying at home: mobility effects of COVID-19. Available at SSRN. 2020; https://doi.org/10.2139/ssrn.3565703
  47. 47. Cowling BJ, Ali ST, Ng TWY, Tsang TK, Li JCM, Fong MW, et al. Impact assessment of non-pharmaceutical interventions against coronavirus disease 2019 and influenza in Hong Kong: an observational study. The Lancet Public Health. 2020;5(5):e279–e288. pmid:32311320
  48. 48. Robert Koch-Institut (RKI). Fallzahlen in Deutschland (COVID-19); 2020, accessed 2020-07-12. Available from: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html.
  49. 49. Google LLC. Google COVID-19 Community Mobility Reports; 2020, accessed 2020-06-25. Available from: https://www.google.com/covid19/mobility/.
  50. 50. Deutscher Wetterdienst (DWD) Climate Data Center (CDC). Recent daily station observations (temperature, pressure, precipitation, sunshine duration, etc.) for Germany, quality control not completed yet, version recent; 2020, accessed 2020-07-12. Available from: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl/recent/.
  51. 51. Bundesinstitut für Bau-, Stadt- und Raumforschung (BBSR). INKAR—Indikatoren und Karten zur Raum- und Stadtentwicklung; 2020, accessed 2020-06-25. Available from: https://www.inkar.de/.
  52. 52. Mitze T, Kosfeld R, Rode J, Wälde K. Face Masks Considerably Reduce COVID-19 Cases in Germany. Proceedings of the National Academy of Sciences. 2020;117(51):32293–32301. pmid:33273115
  53. 53. Spirtes P, Glymour CN, Scheines R, Heckerman D. Causation, prediction, and search. MIT Press; 2000.
  54. 54. Pearl J. Causality. Cambridge: Cambridge University Press; 2009. Available from: https://www.cambridge.org/core/books/causality/B0046844FAE10CBF274D4ACBDAEB5F5B.
  55. 55. Greenland S, Pearl J, Robins JM. Causal Diagrams for Epidemiologic Research. Epidemiology. 1999;10(1):37–48. pmid:9888278
  56. 56. Henckel L, Perković E, Maathuis MH. Graphical Criteria for Efficient Total Effect Estimation via Adjustment in Causal Linear Models. arXiv e-prints. 2020; p. arXiv:1907.02435.
  57. 57. R Core Team. R: A Language and Environment for Statistical Computing; 2019. Available from: https://www.R-project.org/.
  58. 58. Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P. Causal Inference Using Graphical Models with the R Package pcalg. Journal of Statistical Software. 2012;47(11):1–26.
  59. 59. Hilbe JM, Greene WH. 4—Count Response Regression Models. In: Rao CR, Miller JP, Rao DC, editors. Essential Statistical Methods for Medical Statistics. Boston: North-Holland; 2011. p. 104–145. Available from: http://www.sciencedirect.com/science/article/pii/B9780444537379500074.
  60. 60. Venables WN, Ripley BD. Modern Applied Statistics with S. 4th ed. New York: Springer; 2002. Available from: http://www.stats.ox.ac.uk/pub/MASS4.
  61. 61. Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12(1):55–67.
  62. 62. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software. 2010;33(1):1–22. pmid:20808728
  63. 63. Perković E, Textor J, Kalisch M, Maathuis MH. A Complete Generalized Adjustment Criterion. arXiv e-prints. 2015; p. arXiv:1507.01524.
  64. 64. Gencoglu O, Gruber M. Causal Modeling of Twitter Activity During COVID-19. Computation. 2020;8(4).
  65. 65. Wachtler B, Michalski N, Nowossadeck E, Diercke M, Wahrendorf M, Santos-Hövener C, et al. Socioeconomic inequalities and COVID-19—A review of the current international literature. Journal of Health Monitoring. 2020;(S7):3–17.
  66. 66. Hoebel J, Michalski N, Wachtler B, Diercke M, Neuhauser H, Wieler LH, et al. Sozioökonomische Unterschiede im Infektionsrisiko während der zweiten SARS-CoV-2-Welle in Deutschland. Dtsch Arztebl International. 2021;118(15):269–270.
  67. 67. Perković E, Textor J, Kalisch M, Maathuis MH. Complete graphical characterization and construction of adjustment sets in Markov equivalence classes of ancestral graphs. The Journal of Machine Learning Research. 2017;18(1):8132–8193.
  68. 68. Islam N, Sharp SJ, Chowell G, Shabnam S, Kawachi I, Lacey B, et al. Physical distancing interventions and incidence of coronavirus disease 2019: natural experiment in 149 countries. BMJ. 2020;370. pmid:32669358
  69. 69. Kermack WO, McKendrick AG. Contributions to the mathematical theory of epidemics–I. 1927. Bulletin of mathematical biology. 1991;53(1-2):33—55.
  70. 70. an der Heiden M, Buchholz U. Modellierung von Beispielszenarien der SARS-CoV-2-Epidemie 2020 in Deutschland. 2020; https://doi.org/10.25646/6571.2
  71. 71. Schisterman EF, Perkins NJ, Mumford SL, Ahrens KA, Mitchell EM. Collinearity and Causal Diagrams: A Lesson on the Importance of Model Specification. Epidemiology. 2017;28(1). pmid:27676260
  72. 72. Openshaw S. Ecological Fallacies and the Analysis of Areal Census Data. Environment and Planning A: Economy and Space. 1984;16(1):17–31. pmid:12265900
  73. 73. Solar O, Irwin A. A conceptual framework for action on the social determinants of health. WHO Document Production Services; 2010. Available from: https://drum.lib.umd.edu/handle/1903/23135.
  74. 74. World Health Organization, et al. Transmission of SARS-CoV-2: implications for infection prevention precautions: Scientific Brief, 09 July 2020. World Health Organization; 2020.