Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Explainable death toll motion modeling: COVID-19 data-driven narratives

  • Adriano Veloso ,

    Contributed equally to this work with: Adriano Veloso, Nivio Ziviani

    Roles Conceptualization, Data curation, Methodology, Project administration, Resources, Software, Supervision, Writing – original draft

    adrianov@dcc.ufmg.br

    Affiliation Computer Science Dept, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

  • Nivio Ziviani

    Contributed equally to this work with: Adriano Veloso, Nivio Ziviani

    Roles Investigation, Methodology, Writing – review & editing

    Affiliations Computer Science Dept, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Kunumi, Belo Horizonte, Brazil

Abstract

Models have gained the spotlight in many discussions surrounding COVID-19. The urgency for timely decisions resulted in a multitude of models as informed policy actions must be made even when so many uncertainties about the pandemic still remain. In this paper, we use machine learning algorithms to build intuitive country-level COVID-19 motion models described by death toll velocity and acceleration. Model explainability techniques provide insightful data-driven narratives about COVID-19 death toll motion models—while velocity is explained by factors that are increasing/reducing death toll pace now, acceleration anticipates the effects of public health measures on slowing the death toll pace. This allows policymakers and epidemiologists to understand factors driving the outbreak and to evaluate the impacts of different public health measures.

Introduction

What are the key factors accelerating the number of deaths due to COVID-19? What is the impact of public health countermeasures such as travel bans and movement restriction? Does mass social isolation effectively help decreasing the death toll rate? What would be the impact on death toll if public health countermeasures are relaxed? Answers to these questions may lead to timely and informed policy decisions, but most of them are still open to debate.

COVID-19 seems to be harder to understand than previously appreciated, and a general law or the complete causal chain behind death toll evolution seem to be unfeasible to obtain in the short term. By contrast, there are multiple competing explanations being reported that attribute different causes to death toll evolution in different countries. While these partial explanations are often guided by our standard understanding of the spread of infectious diseases, the COVID-19 death toll increases and decreases in myriad different ways depending on characteristics and particularities of each country, and there is not a sole cause or even one major explanatory factor.

Here we propose to model the COVID-19 death toll evolution by using the concept of motion in which velocity and acceleration are defined as follows:

  • Velocity is equal to the slope of the death toll as a function of time. Thus, velocity is the first order derivative of the number of deaths ().
  • Acceleration is equal to the slope of the velocity as a function of time. Thus, acceleration is the second order derivative of the number of deaths ().

Figs 13 show death toll, velocity and acceleration for eight countries: Belgium, Brazil, France, Germany, Italy, Sweden, UK and USA, as of June 5th 2020. While our analysis encompasses a much larger set of countries, we show only eight curves in order to avoid clutter. For each country, our analysis starts on the day of the first death, which is hereafter referred to as the first outbreak day. Commonly, velocity increases at an alarming exponential rate, but then it follows an approximate bell shaped curve [1]. The steepness of the curve is determined by how quickly the death toll increases. The peak of velocity coincides with the moment in which acceleration is exactly zero.

thumbnail
Fig 2. COVID-19 velocity (first-order derivative), as of June 5th 2020.

https://doi.org/10.1371/journal.pone.0264893.g002

thumbnail
Fig 3. COVID-19 acceleration (second-order derivative), as of June 5th 2020.

https://doi.org/10.1371/journal.pone.0264893.g003

In the figure, curves are aligned by the number of days after the first death reported in the corresponding country. Clearly, COVID-19 death toll evolves differently in each country, with some showing flattened curves but others not. The factors driving the pandemic evolution in each country may vary greatly depending on their specificities and on the countermeasures adopted at each one. By modeling velocity and acceleration across different economy, demography, social, and climate contexts we may unveil non-obvious patterns driving the COVID-19 death toll evolution. Countermeasures (or the lack of them) are often among the factors impacting acceleration the most. Velocity, on the other hand, is mostly impacted by socioeconomic and demographic factors.

In summary, the main contributions of this paper are:

  • We model the COVID-19 death toll evolution using the concepts of velocity and acceleration. We build models that predict velocity () and acceleration (Δ velocity) to be observed at an arbitrary day of the outbreak. Model predictions come with their explanations, thus providing insightful narratives on why the models behave the way they do. This is an enabling tool for policymakers to understand high-level reasoning behind death toll evolution, so that our models are specially valuable as a planning tool for government officials who need to know about the factors driving the likely trajectory of COVID-19 deaths in their countries. Explanations for acceleration may point to course correction measures that need immediate attention in order to avoid disastrous situations, while explanations for velocity usually point to more intricate factors such as urbanization problems, typical comorbidities, and social inequality issues.
  • Our models combine over 200 different factors in order to provide powerful predictions and insightful explanations for death toll evolution. The multitude of factors reflects the interdisciplinary aspects of the pandemic evolution. Clearly, COVID-19 death toll evolution is a complex phenomenon in which factors related to diverse disciplines interact in non-obvious ways, finally resulting in the observed death toll.

Materials and methods

In this section we present the data we use in this study and then we discuss how our models were built.

Data

We employed different sources of data to build our models: (i) the daily COVID-19 death toll curve in each country, (ii) country’s countermeasures in response to COVID-19 pandemics, (iii) community mobility reports, (iv) estimations of critical care beds available for and needed by COVID-19 patients in each country, and (v) country’s development indicators. Next, we present details about the data.

COVID-19 death toll curve.

Data concerning the evolution on the number of deaths comes from case reports provided by the European Center for Disease Prevention and Control (ECDC). The data was frozen on June 5th 2020, comprising 8,926,399 cases and 468,257 deaths reported worldwide. Death reports from 211 countries are presented daily. However, these reports have some issues, including discrepancies in reporting practices across countries. Another issue is the delayed communication of the death, which may result in abrupt variations in consecutive days of the death toll curve of a country. That is, days with under-reported number of deaths are followed by days with over-reported number of deaths. To avoid overfitting the original death toll curve with unnecessarily complex models, we smoothed it using the 7-day rolling average, and from the resulting curve we finally get daily values of velocity () and acceleration (). We define velocity and acceleration on Eqs 1 and 2, respectively. (1) (2) where t is the number of days after the first death.

Country’s countermeasures in response to COVID-19 pandemics.

Measures are implemented to slow the spread of the virus by enforcing physical distance between people. The dates in which these countermeasures were taken in response to COVID-19 pandemics come from the Oxford COVID-19 Government Response Tracker. Data is collected from public sources by a team of Oxford University students and staff from every part of the world [2]. Table 1 shows data about the different measures adopted in response to the COVID-19 pandemics. The value that a countermeasure receives on a specific day is given as the number of days during the outbreak for which it has being taken (or not taken).

thumbnail
Table 1. Countermeasures in response to the COVID-19 pandemics.

Values correspond to the number of days that the corresponding measure has being taken. For each countermeasure it is shown, at the peak of velocity on the country, the median followed by first and third quartiles (calculated over the 211 countries considered in the study).

https://doi.org/10.1371/journal.pone.0264893.t001

To compare the set of countermeasures adopted by each country, we apply the t-Distributed Stochastic Neighbor Embedding (t-SNE) method [3], which embeds the 47 measures present in Table 1 into a 2D space. t-SNE works by minimizing the divergence between two distributions: one that measures pairwise similarities of the input objects and another that measures pairwise similarities of the corresponding low-dimensional points in the embedding. As a result, both X and Y axes are non-linear combinations of the original 47 dimensions. There is one vector for each outbreak day for each country, starting at the first outbreak day of the corresponding country and ending at June 5th 2020. After removing duplicate vectors, the method processed all 7,422 vectors together, resulting in a 2D trajectory for each country, as shown in Fig 4. Countries may have a different number of vectors, as the first outbreak day varies depending on the country. The t-SNE embeddings are computed at once using vectors from all 211 countries on all days.

thumbnail
Fig 4. (Color online) Countermeasures adopted by each country forms a trajectory.

The heatmap is built by taking into account all the 211 considered countries.

https://doi.org/10.1371/journal.pone.0264893.g004

In the figure, trajectories are shown over a heatmap composed of clear and dark regions. When the trajectory is passing over a dark region, the corresponding measures are less stringent. By contrast, a trajectory encompasses stringent measures when it passes over a lighter region. Further, the thinner part of a trajectory indicates an abrupt variation on the direction due to changes on the adopted measures. Thicker parts, on the other hand, indicate consecutive days adopting the same set of measures. Finally, countries placed next to each other are currently adopting a similar set of measures. The figure shows a large variation in government responses, with some countries deciding to rise the stringency of response, while others keeping less stringent measures. We follow the definition of stringency presented in [2], that is, the overall stringency level of a country on a given day is calculated by taking the average of the stringency levels of the following nine measures: school closing, workplace closing, public events cancelling, gathering size restrictions, public transportation closing, stay at home requirements, restrictions on internal movement, restrictions on international travel, public information campaign. The stringency of each individual measure on a day takes a value between 0 and 100. For instance, in Table 1, measures of different types are sorted according to their stringency level (i.e., “No measures” has stringency 0, while the last measure has stringency level of 100). The index does not provide information on how well policies are enforced, nor does it capture demographic or cultural characteristics that may affect the spread of COVID-19. Finally, the overall stringency is given as: (3) where is the stringency associated with measure i on a given day. Notice that the stringency of a measure is discretized, for instance, in case of a four-level measure, its stringency level may assume only the values 0, , , and 100. Finally, as the nine measures that compose the stringency computation are within the 47 dimensions previously discussed in Table 1, the stringency level is easily transferred as colors in Fig 4. While we only show trajectories for few countries, it is worth mentioning that the colormap was built using the entire set of trajectories for all 211 countries considered in our analysis. Therefore, points are spread all over the space, and these points were interpolated in order to produce a smoother colormap.

Fig 4 shows trajectories for several countries. The start point of a trajectory corresponds to the first outbreak day of the country, and thus it varies for each country. The end point of a trajectory is June 5th 2020, thus being the same for all countries. Fig 5 shows plots for specific countries, namely: Canada, Ecuador, Sweden and Switzerland, which have the same heatmap as background. We can see that Canada has a trajectory passing over a dark region and a long thicker part, meaning less stringent measures and consecutive days adopting the same set of measures, such as recommend closing schools, universities, workplaces, cancelling events, limit private gatherings, closing public transport, require not leaving home and not to travel. On the other hand, Ecuador and Switzerland not only have an abrupt change of direction due to changes on the adopted measures but also their trajectories passes over a lighter region and thus encompass stringent measures. Sweden presents a more straight line meaning consecutive days adopting the same set of measures, such as lack of countermeasures which led to to be an outlier with cases and deaths increasing more rapidly than in its Nordic neighbours, explained by their COVID-19 strategy and the governance of the health system that has enabled the strategy to continue without major course corrections.

thumbnail
Fig 5. (Color online) heatmaps showing the trajectories for Ecuador, Switzerland, Sweeden, and Canada.

https://doi.org/10.1371/journal.pone.0264893.g005

COVID-19 community mobility reports.

Human mobility is associated with growth and decline of new cases of COVID-19 [4]. Fig 6 shows how mobility changes as a function of the level of stringency being adopted. As expected, human mobility reduces as stringency increases. The data used to draw the curves in Fig 6 comprises the number of visitors to specific categories of location (e.g., grocery stores, parks, subway and train stations) every day and compares this change relative to a baseline day before the pandemic outbreak. Google Maps [5] provides anonymized data showing how peoples’ movements have changed throughout the pandemic. Baseline days represent a normal value for that day of the week, given as median value over the five‑week period from January 3rd to February 6th 2020, extracted from Google Maps data. The data also comprises the change in average duration spent in residences during the outbreak. We use an offset of three weeks in order to better correlate with the number of deaths by taking into account an approximation of incubation and treatment period lengths.

thumbnail
Fig 6. (Color online) community mobility change.

Arrows correspond to the temporal dimension, that is, an arrow links two consecutive days of the outbreak.

https://doi.org/10.1371/journal.pone.0264893.g006

Critical care beds available and needed by COVID-19 patients.

We use estimations from the Institute for Health Metrics and Evaluation (IHME, University of Washington). These estimations are for COVID-19 patients exclusively, and thus non-COVID patient needs were excluded. Also, the estimation of critical care beds available for COVID-19 patients is obtained by excluding the typical percentage of beds occupied by other patients and emergencies. With these estimations we calculate the ratio between the total number of critical care beds needed and the total number of critical care beds available (in the country).

Country’s development data.

We use data from the World Bank, which provides a comprehensive dataset on the development of countries around the globe. Table 2 shows a partial list of indicators about health system, climate, socio-economics and demographics for all countries involved in this study. Both temperature and relative humidity are given as the average of the respective values measured in the ten most affected cities in each country, and their values are taken three weeks prior to the current date in order to better correlate with the number of deaths by taking into account an approximation of the incubation and treatment periods.

thumbnail
Table 2. Data on health system, climate, socioeconomics and demographics.

For each item it is shown, at the peak of velocity on the country, the median followed by first and third quartiles (calculated over the 211 countries considered in the study).

https://doi.org/10.1371/journal.pone.0264893.t002

Table 3 shows data about age structure, sex-ratio and share of deaths by age. It is worth mentioning that men aged 70 or more correspond to the sub-population at highest risk.

thumbnail
Table 3. Data on age structure, sex-ratio by age, and share of deaths by age.

For each item it is shown the median followed by first and third quartiles (calculated over the 211 countries considered in the study).

https://doi.org/10.1371/journal.pone.0264893.t003

Finally, Table 4 shows a partial list of the risk factors and comorbidities. Our intuition is to correlate COVID-19 death toll motion with an approximation of the size of at-risk group within each country. Prevalence of smoking is based on the population aged 15 years and older who smoke any form of tobacco, including cigarettes, cigars, pipes or any other smoked tobacco products. Obesity is defined as having a body-mass index (BMI) equal to or greater than 30. Diabetes prevalence refers to the percentage of people aged between 20 and 79 years who have type 1 or type 2 diabetes. AIDS prevalence is based on the population aged between 15 and 49 years infected with HIV. Cancer prevalence is based on the population with any form of cancer. Bacilli Calmette-Guérin (BCG) vaccination indicates the countries currently having universal BCG vaccination program. The table also shows death rates from stroke, diabetes, cancer and AIDS, for men and women. Values are standardized so that a value of 1.00 corresponds to the average over all countries we have considered in this study. Centering the average value in 1.00 will facilitate the interpretation of results, as values within a particular country can be easily shown as being above or below the average.

thumbnail
Table 4. Data on risk factors and comorbidities.

For each item it is shown the median followed by first and third quartiles (calculated over the 211 countries considered in the study).

https://doi.org/10.1371/journal.pone.0264893.t004

Models

Our models are particularly useful in situations where many intuitive factors interact in non-obvious ways, which seems to be the case with COVID-19. Specifically, our models combine a multitude of factors in order to reflect the interdisciplinary aspects of the pandemic evolution. Model explainability [6] enables inspecting the high-level concepts and reasoning used by our models so that we may formulate possible narratives about the COVID-19 death toll motion defined in terms of velocity and acceleration.

Model construction and evaluation.

Our models are built using an implementation of the LightGBM algorithm [7]. The LightGBM algorithm produces a complex model composed of hundreds of simple decision trees [8] that are combined into a single model by a process known as boosting [9]. This added complexity allows our models to reach low error levels that are not likely to be obtained with simpler models. Another reason for choosing LightGBM is that it naturally deals with missing values, as some factors are not available for some countries. Basically, LightGBM ignores missing values during a split, and then allocate them to whichever side reduces the loss the most. Models are optimized either to predict the death toll velocity or the death toll acceleration to be observed in the next day. A model for each country at each outbreak day (we consider the first outbreak day as the day of the first death) is built and evaluated using discovery and validation datasets, as follows:

  • The discovery dataset for country c at day x includes data from all countries. More specifically, data is composed of all outbreak days until day x in all countries, except for country c for which data goes until day x−1 of the outbreak. Thus, the discovery datasets are almost the same for all countries, varying only the xth outbreak day which is dropped or included depending on the country (i.e., it is dropped for country c, and included for the other countries). Also, the discovery datasets keeping increasing with the outbreak. For the sake of reproducibility: our models were obtained by combining up to 100 simple decision trees with maximum tree-depth of 10, and the learning rate was set to 0.10.
  • The validation dataset for country c at day x includes only the xth day of outbreak in country c. Thus, models will perform predictions for one-day ahead. The prediciton error associated with country c with respect to the validation datasets is assessed in terms of the standard error measures [10].

In summary, the model for country c at day x is obtained using data up to outbreak day x−1 from country c in addition to data from all other countries up to day x. The error figures to be reported correspond to the average of the validation loss, and no additional hold-out set was used. Each instance in both datasets corresponds to a specific day of the outbreak in a specific country, and it is represented by a superset of the factors we present in Tables 14. Some factors vary daily (i.e., number of critical care beds needed and available), while others remain constant during the entire outbreak period (i.e., population density). A complete list of the input factors is shown in S1 Appendix, and it includes additional factors not mentioned in Tables 14. These additional factors reflect the values of the previous day, so that derivative information is provided for factors that vary daily.

Algorithm1 describes the main steps of model construction for a specific country c and a specific outbreak day x. The model is built using the discovery datasets. A discovery dataset contains all factors for all countries for multiple outbreak days up to outbreak day x. Each discovery dataset has a paired validation dataset, which contains all factors of country c on a specific outbreak day x. All instances in the discovery and validation datasets also contains the velocity/acceleration information for the corresponding day. Discovery and validation datasets are used iterativelly in order to perform an exploratory search for the optimum subset of factors to compose the model for outbreak day x and country x. The exact search approach would require the exhaustive enumeration of all possible combinations of factors, so that one model is obtained for each combination in the power set. Obviously, inspecting all possible subsets of factors is computationally prohibitive. Instead, the algorithm samples the model space by randomly selecting the factors to compose a model. At the end, the best performing models are finally selected using the validation datasets and used to estimate velocity or acceleration one-day-ahead. Specifically, the selected model for outbreak day x and country c will be the one with the lowest cumulative error considering all validation datasets up to day x. Prediction error is given both in terms of the standard Mean Absolute Error and Mean Absolute Percentage Error (MAPE) measure.

Algorithm 1 samples a large number of candidate models (n = 1, 000, 000), and thus it is unlikely that (detrimental) correlated factors are included in the selected (best) model. Since correlated factors may make the estimates worse, it is expected that the best model does not include correlated factors. An important consideration about Algorithm 1 is that we assume that p is always the same across the different countries.

Algorithm 1 Model construction for country c at day x

Require: The discovery datasets for country c, }, such that y < x

Require: The validation datasets for country c, }, such that y < x

Require: The pool of all available factors, P

Require: The number of candidate models to sample, n

Ensure: The most performant models, for velocity and for acceleration at day x

p ← ∅

i ← 1

while i < n do

  

  

  k← a random integer between 1 and |P|

  pp ∪ {a subset of k distinct factors randomly selected from P}

  ii+ 1

end while

for each pip

  for each outbreak day d, until y do

    model estimating velocity using factors in pi and observations in

    error of in

    model estimating acceleration using factors in pi and observations in

    error of in

  end for

end for

the model in with the lowest

the model in with the lowest

Model explainability.

We assume that a low-error model may be seem as a possible explanation for COVID-19 death toll velocity and acceleration in a country. However, our models are complex as they are obtained from the interplay of a large number of factors. While this provides a more realistic and unified view of COVID-19 death toll motion, understanding how each factor contributes to velocity and acceleration is thus a major challenge. We follow a prominent method that is based on Shapley values from cooperative game theory [11] in order to attribute importance to factors. Specifically, Shapley values can be used to find a fair division scheme that defines how the total importance should be distributed among the factors composing the model [6]. The method is based on the idea that:

  1. (i). a specific outbreak day x is represented by a set of factors that interact according to a model, thus resulting in the prediction f(x), and
  2. (ii). the magnitude of the difference between f(x) and the average prediction E[f(x)] (i.e., the average velocity, or the average acceleration in the discovery dataset) is divided among the factors in a unique way that is “fair” given their individual contributions to change the prediction from E[f(x)] to f(x). Thus, the average contribution each factor has on changing the model prediction is considered to be its importance score.

More formally, the method creates an explanation function g(f, x) which takes as input a model f and the values assumed by the different factors within an outbreak day x, and returns importance scores of these factors on day x. The importance score associated with a factor can be directly interpreted as the sensitivity that indicates how much the model’s response will vary as the factor has its value increased or decreased. Importance scores will be visualized using two different plots:

  1. (i). Waterfall Plot − Shows importance scores for a specific country on a specific day of the outbreak. Thus, waterfall plots give a more specific, per-country explanation. Waterfall plots are designed to visually display how the importance of each factor moves the model output from a prior expectation under the discovery dataset distribution to the final model prediction given the evidence of all the factors within the model. Each waterfall plot shows the importance of a set of factors in a given day, so that:
    • Factors are sorted from top to bottom by the magnitude of their importance values, with factors of smallest magnitude grouped at the bottom to avoid clutter.
    • The importance for each factor is shown as positive (in red) indicating that the factor contributes to increase velocity or acceleration, and negative (in blue) indicating that the factor contributes to decrease velocity or acceleration.
    Thus, in summary, the x−axis shows the estimated value (which can be either velocity or acceleration), while the y−axis shows the factors contributing mostly for the final estimation. The number appearing on the left of a factor indicates its corresponding value on a particular day of the outbreak (i.e., the, a quantity which can be measured or the number of days a countermeasure is being adopted). A positive value, which is shown in red, indicates that the corresponding factor contributes to increase velocity or acceleration. Negative values, which are shown in red, indicate that the corresponding factor contributes to decrease velocity or acceleration.
  2. (ii). Summary Plot − Shows importance scores considering all countries on a specific day of the outbreak. Thus, summary plots give a more general explanation. Summary plots have five characteristics:
    • Each row corresponds to a factor and has as many dots as countries.
    • A dot represents the value of the corresponding factor for a country. Red dots indicate that the factor assumes a high value for the corresponding country. Likewise, blue dots indicate that the factor assumes a low value.
    • The x−axis position of a dot indicates the impact the corresponding factor has on velocity or acceleration. The impact of a factor is defined as how much the factor is decreasing or increasing velocity or acceleration.
    • The vertical line shows whether the impact of a factor increased the prediction (i.e., the dot is on the right size) or decreased it (i.e., the dot is on the left side).
    • Factors impacting most the model prediction appear on the top of the plot.

Results

In this section, we present results for some countries (for a live instance of our analysis tool with a larger set of countries, please visit https://covid-19.kunumi.com). It is worth mentioning that we did not use autoregressive time-series prediction models (e.g., ARIMA [12]) in our experiments. While this algorithm is considered one of the most used prediction models for time series, its input prevents us to collect actionable explanations as the estimations provided depend mostly on time-series and lag information, and not on actionable factors.

Table 5 shows the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) for velocity and acceleration models of some countries considering the number of days after the first death. In the table, countries are shown in ascending order of error for velocity. In general, the errors of our models (one-day ahead prediction) are very low. It is hard to grasp a clear trend on how the number of factors within each model, k, impacts the error. The best models (i.e., the ones obtained with the execution of Algorithm 1) have different values of k. The value of k seems to depend mainly on the country, but also on the outbreak day. For instance, models estimating acceleration in Brazil have values of k varying from 28 to 34 factors, depending on the outbreak day. Models estimating acceleration in Germany have values of k varying from 26 to 31 factors. For Italy, k varies from 19 to 23 factors. An interesting pattern is that models for estimating acceleration are usually composed of fewer factors than models for estimating velocity. Also, models for a country are stable in the sense that the factors within them do not change drastically on consecutive outbreak days. On the contrary, by inspecting how the factors vary on a daily-basis for a same country, we concluded that they change smoothly for consecutive days. Table 6 shows the average Jaccard Index (which is a statistic used for gauging the similarity and diversity of two sample sets) of factors within models of two consecutive days. The table shows that models are extremely similar on consecutive outbreak days, but stability decreases on the specific days just before and after the peak of velocity and acceleration. Models that estimate velocity seem to be more stable that models estimating velocity, and this may be because models estimating acceleration employ less factors that models estimating velocity.

thumbnail
Table 5. Model effectiveness at June 5th 2020, for some countries.

The prediction error is given both in terms of the mean absolute error (MAE) and mean absolute percent error (MAPE). Error is calculated on a daily-basis and then it is averaged. Standard deviation is also shown.

https://doi.org/10.1371/journal.pone.0264893.t005

thumbnail
Table 6. Average Jaccard Index of factors comprising the models of two consecutive outbreak days.

https://doi.org/10.1371/journal.pone.0264893.t006

For each country we employ model explainability so that we can discuss the main factors driving the predictions for some of these countries. The days between the peak of acceleration and the peak of velocity are of particular importance for evaluating the impact of countermeasures being adopted by public health officials. The days during this period reveal two important aspects: acceleration decreases while velocity keeps increasing, which makes the impact of countermeasures on slowing death toll pace hard to evaluate. However, we are able to present the main factors driving the decrease of acceleration during this period, thus revealing which countermeasures are being effective and also the ones that need further attention.

In order to keep waterfall plots with a manageable size we restricted the number of factors that are explicitly shown to only the most important ones. The remaining factors are collapsed into a group of other explanatory factors, and thus they cannot be directly interpreted. For instance, the factor “Critical care beds needed/available” usually shows an important aspect regarding the health system situation in the current outbreak day. It is important to mention that a similar factor, namely “Critical care beds needed/available on previous day” may also compose the model, and in this case it helps by giving a trend on whether the health system situation is worsening or improving. Thus, please notice that a higher value of “Critical care beds needed/available” can contribute to decrease acceleration if the health system situation is getting better. Finally, it is also worth mentioning that explanations change smoothly on a daily-basis. However, since we are focusing our analysis on the peaks of acceleration and velocity, the corresponding plots are those obtained when acceleration changes the most.

Italy, Netherlands and UK—Evaluating countermeasures (and the lack of them)

Figs 7 and 8 show observed and predicted curves and projections for velocity and acceleration. The peak of acceleration in Italy occurred on the 22nd day after the first death, while the peak of velocity occurred on the 35th day. Figs 911 show the main factors explaining acceleration on some days within this period. Initially, most of the factors are still contributing to increase acceleration, as expected. Closing schools and universities was the most effective measure at this point. Cancelling public events and the recommendation to close public transportation are still ineffective as they are not yet contributing to decrease acceleration. Five days later, other measures started contributing to decrease acceleration. Finally, at the peak of velocity, most measures are contributing to decrease acceleration, the most effective seem to be self-isolation, closing public transportation, and the cancellation of public events. By contrast, the long time without taking any measure regarding closing schools/universities is now a factor contributing to increase acceleration. Further, some measures may take longer to become effective (i.e., limitation of private gatherings).

thumbnail
Fig 8. (Color online) projections for acceleration in Italy.

https://doi.org/10.1371/journal.pone.0264893.g008

thumbnail
Fig 9. (Color online) waterfall plots for Italy.

Factors driving acceleration on the 22nd day.

https://doi.org/10.1371/journal.pone.0264893.g009

thumbnail
Fig 10. (Color online) waterfall plots for Italy.

Factors driving acceleration on the 27th day.

https://doi.org/10.1371/journal.pone.0264893.g010

thumbnail
Fig 11. (Color online) waterfall plots for Italy.

Factors driving acceleration on the 35th day.

https://doi.org/10.1371/journal.pone.0264893.g011

Figs 12 and 13 show observed and predicted curves for Netherlands. The peak of acceleration occurred on the 17th day after the first death, while the peak of velocity occurred on the 28th day. Figs 14 and 15 show the main factors explaining acceleration on the 20th and 28th days. Few days after the peak of acceleration (i.e., 20th day), measures on restricting internal movement started to decrease acceleration. On the peak of velocity, several factors are contributing to decrease acceleration, but the main ones include cancelling public events and banning arrivals from affected regions. It is worth mentioning that restricting internal movement is still effective on decreasing acceleration, but other measures started to contribute more strongly. Since more important factors appear first on the plot, restricting internal movement was collapsed within 14 other explanatory factors.

thumbnail
Fig 12. (Color online) projections for velocity in Netherlands.

https://doi.org/10.1371/journal.pone.0264893.g012

thumbnail
Fig 13. (Color online) projections for acceleration in Netherlands.

https://doi.org/10.1371/journal.pone.0264893.g013

thumbnail
Fig 14. (Color online) factors driving acceleration in Netherlands on the 20th day.

https://doi.org/10.1371/journal.pone.0264893.g014

thumbnail
Fig 15. (Color online) factors driving acceleration in Netherlandson the 28th day.

https://doi.org/10.1371/journal.pone.0264893.g015

Figs 16 and 17 show observed and predicted curves for the UK. The peak of acceleration occurred on the 23rd day after the first death, while the peak of velocity occurred on the 38th day. Figs 18 and 19 show the main factors explaining acceleration on the 30th and 38th days. Few days after the peak of acceleration, measures such as closing schools/universities and self-isolation, started to decrease acceleration. On the peak of velocity, other measures started to become effective. Cancelling public events seems to be an important measure, which still needs additional days to finally become effective. Finally, on the 30th day, the recommendation of closing public transportation was adopted for 17 days, and it is still contributing for increasing acceleration. However, on the 38th day, the measure was adopted for 24 days, and started to contribute to decrease acceleration.

thumbnail
Fig 16. (Color online) projections for velocity in the UK.

https://doi.org/10.1371/journal.pone.0264893.g016

thumbnail
Fig 17. (Color online) projections for acceleration in the UK.

https://doi.org/10.1371/journal.pone.0264893.g017

thumbnail
Fig 18. (Color online) factors driving acceleration in the UK on the 30th day.

https://doi.org/10.1371/journal.pone.0264893.g018

thumbnail
Fig 19. (Color online) factors driving acceleration in the UK on the 38th day.

https://doi.org/10.1371/journal.pone.0264893.g019

It is worth mentioning that Italy, Netherlands and UK have adopted countermeasures at different stages of the outbreak, and as a result the corresponding benefits took different times to appear. As expected, countermeasures adopted at earlier stages of the outbreak showed benefits earlier, as they started contributing to decrease acceleration earlier. For instance, both Italy and Netherlands required cancelling public events earlier than UK. Specifically, Italy adopted this countermeasure 5 days after the first death, and Netherlands adopted this countermeasure 4 days after the first death. UK took 8 days after the first death to start requiring the cancellation of public events. Another example concerns recommending closing public transport. Italy adopted this countermeasure 12 days before the first death, and UK adopted the same countermeasure 13 days before the first death. On the contrary, Netherlands adopted this measure 17 after the first death. For Italy and UK benefits appeared on the peak of velocity, but for Netherlands this factor is still contributing to increase acceleration at the peak of velocity.

Germany—The importance of a robust health system

Figs 20 and 21 show observed and predicted curves for Germany. The curves point to two days of particular interest: the 20th day after the first death (which marks the peak of acceleration), and the 39th day (which marks the peak of velocity). Figs 22 and 23 show the factors driving velocity and acceleration on the 20th day. Acceleration is mainly explained by the lack and the recency of countermeasures. For instance, no public transportation measures were taken for 67 days, and this is the main factor increasing acceleration at this stage of the outbreak. A related factor is the small decrease of visitors in transit stations (-10%), which also increases acceleration.

thumbnail
Fig 20. (Color online) projections for velocity in Germany.

https://doi.org/10.1371/journal.pone.0264893.g020

thumbnail
Fig 21. (Color online) projections for acceleration in Germany.

https://doi.org/10.1371/journal.pone.0264893.g021

thumbnail
Fig 22. (Color online) factors driving velocity in Germany on the 20th day.

https://doi.org/10.1371/journal.pone.0264893.g022

thumbnail
Fig 23. (Color online) factors driving acceleration in Germany on the 20th day.

https://doi.org/10.1371/journal.pone.0264893.g023

Since the 20th day marks the peak of acceleration, it is not expected to find many factors contributing to decrease acceleration. In fact, there are only three factors worth mentioning: the small proportion of urban population living in slums, the restriction on private gatherings, and most importantly, the high availability of critical care beds, indicating a robust system even during the peak of acceleration. As expected, many factors are contributing to increase velocity on the peak of acceleration. These factors are shown in Fig 22, including the distribution of urban population, male/female proportion on different ages, and risk factors such as obesity and smoking. On the other side, the high availability of critical care beds is the main factor contributing to decrease velocity. This points, again, to the importance of a prepared and robust health system in order to reduce death toll.

The 39th day marks the peak of velocity, and factors contributing to increase velocity include typical comorbidities (i.e., stroke) and risk factors such as obesity, as shown in Fig 24. The only relevant factor contributing to decrease velocity is the high availability of critical care beds. Ensuring sufficient physical infrastructure and resources, even on the peak of velocity, is crucial for dealing with the outbreak. Fig 25 shows that on the 39th day some factors are contributing to decrease acceleration. Stringent measures (such as the cancellation of public events) were adopted for a time which is now sufficient to show benefits. Further, the high availability of critical care beds keeps contributing to decrease acceleration.

thumbnail
Fig 24. (Color online) factors driving velocity in Germany on the 39th day.

https://doi.org/10.1371/journal.pone.0264893.g024

thumbnail
Fig 25. (Color online) factors driving acceleration in Germany on the 39th day.

https://doi.org/10.1371/journal.pone.0264893.g025

Brazil—Urban slums are potential incubators of COVID-19

Figs 26 and 27 show observed and predicted curves for Brazil. A particular period of the outbreak, starting on the 31st day and continuing until the 45th day, was marked by a consistent increase in acceleration. The drastic increase in acceleration caused the health system to collapse, as critical care beds needed became greater than beds available. Next we discuss the main factors driving acceleration during this period.

thumbnail
Fig 26. (Color online) projections for velocity in Brazil.

https://doi.org/10.1371/journal.pone.0264893.g026

thumbnail
Fig 27. (Color online) projections for acceleration in Brazil.

https://doi.org/10.1371/journal.pone.0264893.g027

Figs 2830 show the main factors driving acceleration in Brazil from the 31st to the 45th day after the first death. Clearly, the high proportion of urban population living in slums was a major factor increasing acceleration on this period. Urban slums are characterized by low-quality housing, limited basic services and poor sanitation. This makes simple lifesaving practices like hand-washing a challenge. Further, households in urban slums are typically + 10 times denser than neighboring areas of the same city. The largest urban slum in Brazil, for instance, has a density of 39,000 inhabitants per km2, while the country’s population density is only 25 inhabitants per km2. Finally, many people in slums need to work normally during the outbreak, as they often have no savings. They use public transportation over long distances, and their service occupations usually require close contact with others, making them vectors for the virus and accelerating the spread of infection.

thumbnail
Fig 28. (Color online) factors driving acceleration in Brazil on the 31st day.

https://doi.org/10.1371/journal.pone.0264893.g028

thumbnail
Fig 29. (Color online) factors driving acceleration in Brazil on the 40th day.

https://doi.org/10.1371/journal.pone.0264893.g029

thumbnail
Fig 30. (Color online) factors driving acceleration in Brazil on the 45th day.

https://doi.org/10.1371/journal.pone.0264893.g030

Belgium—The risk with long-term care and nursing institutions

A high proportion of long-term care facilities have reported COVID-19 outbreaks, with high rates of morbidity and case fatality in residents. Deaths in Belgium’s long-term care institutions account for one third of the COVID-19 death toll in the country, while in Switzerland, residents of long-term care institutions account for more than half of deaths due to COVID-19 [13]. Despite measures promoting social distancing, long-term care residents need care from staff who inevitably might transmit the virus. Infected asymptomatic staff members ravaged the long-term care institutions whose residents are older adults often with underlying chronic medical conditions [14].

Figs 31 and 32 show observed and predicted curves for Belgium. The curves point to two days of particular interest: the 21st day after the first death, which marks the peak of acceleration, and the 30th day, which marks the peak of velocity. The 30th day is further inspected in Fig 33, which shows the main factors driving velocity on the peak of velocity. Indeed, the high number of long-term care beds in Belgium is among the factors contributing to increase velocity on both peaks.

thumbnail
Fig 31. (Color online) projections for velocity in Belgium.

https://doi.org/10.1371/journal.pone.0264893.g031

thumbnail
Fig 32. (Color online) projections for acceleration in Belgium.

https://doi.org/10.1371/journal.pone.0264893.g032

thumbnail
Fig 33. (Color online) factors driving velocity in Belgium on the peak of velocity.

https://doi.org/10.1371/journal.pone.0264893.g033

Brazil and USA—The persistent impact of income inequality

While some factors gain importance only during specific periods of the outbreak, income inequality shows high impact on velocity during the entire outbreak. Income inequality is assessed with the GINI Index [15], a standard measure which varies from 0 (total equality) to 100 (total inequality). Inequality has many undesirable effects including significant negative impacts on public health [16], and our models reveal that countries with greater income inequality are more likely to face an increased COVID-19 death toll pace. This is the case of the two most affected countries, Brazil and the USA, two countries that suffer greatly from high income inequality.

Figs 3436 show the main factors driving velocity in Brazil. Similarly, Figs 3740 show the main factors driving velocity in the USA. In both countries, inequality is among the major factors contributing to increase velocity during the entire outbreak.

thumbnail
Fig 34. (Color online) factors driving velocity in Brazil on the 10th day.

https://doi.org/10.1371/journal.pone.0264893.g034

thumbnail
Fig 35. (Color online) factors driving velocity in Brazil on the 30th day.

https://doi.org/10.1371/journal.pone.0264893.g035

thumbnail
Fig 36. (Color online) factors driving velocity in Brazil on the 50th day.

https://doi.org/10.1371/journal.pone.0264893.g036

thumbnail
Fig 37. (Color online) factors driving velocity in the USA on the 20th day.

https://doi.org/10.1371/journal.pone.0264893.g037

thumbnail
Fig 38. (Color online) factors driving velocity in the USA on the 70th day.

https://doi.org/10.1371/journal.pone.0264893.g038

thumbnail
Fig 39. (Color online) factors driving velocity in the USA on the 80th day.

https://doi.org/10.1371/journal.pone.0264893.g039

thumbnail
Fig 40. (Color online) factors driving velocity in the USA on the 90th day.

https://doi.org/10.1371/journal.pone.0264893.g040

Global overview on the importance of factors and measures

So far we have discussed how different factors impact velocity and acceleration on specific country-level models. To get a global overview on the impact that different factors have on velocity, we plotted the contribution of every factor for every country. We selected three particular days to analyze: the peak of acceleration, the peak of velocity, and June 5th 2020 as it is the last day in our data. Fig 41 shows how the different factors are explaining velocity at the peak of acceleration on different countries. Next we discuss the role of some of these factors.

thumbnail
Fig 41. (Color online) summary plot showing how each factor contributes to increase/decrease velocity at the peak of acceleration.

https://doi.org/10.1371/journal.pone.0264893.g041

As expected, the most important factor is the demand for critical care beds. Clearly, countries showing high demands at the peak of acceleration (i.e., red dots) have experienced a large increase in velocity due to this factor. Little demands, on the other hand, usually contribute to decrease velocity. Other generally important factors include income distribution, number of beds in long-term care institutions, smoking prevalence among females, share of population infected with HIV, and relative number of cancer-related deaths among the elderly population. High values of these factors contribute to increase velocity.

It is worth mentioning that while we are evaluating each factor in isolation, their contributions to velocity are calculated by taking into account the complex interplay among all them. Take for instance the role of relative humidity − low relative humidity contributes to decrease velocity, while high relative humidity usually contributes to increase velocity. There are some exceptions, however, where high relative humidity contributes to decrease velocity. In order to better understand these exceptions, we extracted a heatmap from our working data showing the relationship between temperature and relative humidity, and check how their interplay ultimately impacts velocity.

Fig 42 presents a heatmap showing the relationship between temperature and relative humidity. Clearly, temperatures higher than ≈25°C are strongly associated with low velocities, despite the high values of relative humidity. Interestingly, authors in [17] have reported that the relationship between the annual average of temperature compensation and COVID-19 confirmed cases was approximately linear in the range of less than 25.8°C, which became flat above 25.8°C.

thumbnail
Fig 42. (Color online) the relationship between temperature, relative humidity and velocity.

Lighter regions are associated with higher velocities.

https://doi.org/10.1371/journal.pone.0264893.g042

The lack of countermeasures to slow the spread of the virus is reflected in several factors occupying the middle of the heatmap. As expected, the more days without a measure the higher is the impact on increasing velocity. A particular important measure at this stage of the outbreak seems to be the adoption of coordinated public information campaign due to the need for essential information and advice right on early stages of the outbreak. Clearly, the more days with coordinated public campaigns the more velocity decreases. However, at this stage of the outbreak, countermeasures are among the factors impacting the least on velocity.

Other factors that are worth mentioning include the number of years without BCG vaccination and share of one-person households in the country. Both factors show marginal impact on velocity at the peak of acceleration and are among the ones showing small impact on velocity, but they gain importance on latter stages of the outbreak, as shown in the following.

Fig 43 shows how the different factors impact velocity at the peak of velocity on different countries. Top factors remain the same as the ones at the peak of acceleration, namely: demand for critical care beds, number of long-term care beds in institutions, and income distribution. Factors such as number of years without BCG vaccination gained substantial importance. Specifically, some countries seem to benefit from an active national vaccination program. As recently reported in [18], the BCG vaccine has beneficial non-specific effects on the immune system that protect against a wide range of infections, and randomized controlled trials have provided evidence that the BCG vaccine’s immunomodulatory properties can protect against respiratory infections.

thumbnail
Fig 43. (Color online) summary plot showing how each factor contributes to increase/decrease velocity at the peak of velocity.

https://doi.org/10.1371/journal.pone.0264893.g043

Two factors that also gained substantial importance are the availability of nurses and the percentage of GDP expended on the health system. Specifically, velocity decreases as more nurses are available. Similarly, velocity decreases as more is spent on the health system.

Finally, the more days adopting less stringent measures (i.e., closing workplaces only for some categories, narrow debt relief and banning arrivals only from some regions), the more velocity increases. The opposite trend is observed with more stringent measures (i.e., broad contract relief), that is, the more days adopting these measures, the more velocity decreases.

Fig 44 shows how the different factors have impacted velocity on June 5th 2020, the last day in our working data. Interestingly, income distribution became the top factor impacting velocity. The number of nurses available, the age working dependency ratio and the share of elderly people also gained importance, and are now among the factors impacting velocity most.

thumbnail
Fig 44. (Color online) summary plot showing how each factor contributes to increase/decrease velocity on June 5th 2020.

https://doi.org/10.1371/journal.pone.0264893.g044

Two factors are involved in an interesting contrast: the share of one-person households and the population living in slums. Both factors have similar importance at this stage of the outbreak, but they show opposite trends. One person households contribute to decrease velocity, while population living in slums contribute to increase velocity. People living together are clearly more exposed to getting infected because they share space. Larger households, which are a characteristic of urban slums, pose greater risk of infection during the outbreak because they can contribute to the rapid spread of the virus.

Finally, at this stage of the outbreak, it becomes clearer which countermeasures were effective. We consider as effective a measure which contributes to decrease velocity as it is adopted for more days. Thus, effective measures include the recommendation of not leaving house, the adoption of a broad debt relief, closing schools, limiting private gatherings to less than 10 people, and cancelling public events.

Conclusion

This study shows how concepts in model explainability can be used to facilitate reasoning about different hypotheses regarding the ongoing COVID-19 pandemic. For this, we built intuitive country-level COVID-19 motion models described in terms of death toll velocity and acceleration, which are defined as the first and second order derivatives, respectively, of the COVID-19 death toll in respect to time. The study considered a multitude of factors, and we show that both velocity and acceleration can be approximated with proper combinations of these factors. Model explanation techniques reveal the main factors driving velocity and acceleration. In general, more intricate factors such as urbanization problems, typical comorbidities, and social inequality issues are among the ones better explaining velocity. Acceleration, on the other hand, is explained by the effects of public health measures being adopted, or the lack of them. These explanations enable data-driven narratives that may help epidemiologists and policymakers to understand factors driving the outbreak, as well as to monitor the impact of public health measures on the spread of the disease. As future work we intend to implement counterfactual predictions in order to estimate what is likely to happen as a result of a countermeasure. These predictions would allow the assessment of the impact of countermeasures by means of “what-if” simulation scenarios, possibly benefiting decision making.

Supporting information

S1 Appendix. Tables 14 present the main factors found by Algorithm 1.

However, the complete list of considered factors is much larger. Tables 7–12 show the factors not discussed previously. The names we gave to these factors are self-explanatory.

https://doi.org/10.1371/journal.pone.0264893.s001

(PDF)

References

  1. 1. Hall B, Acar H, Nandipati A, Barlow M. Growth rates made easy. Mol Biol Evol. 2014;31:232–238. pmid:24170494
  2. 2. Hale T, Webster S, Petherick A, Phillips T, Kira B. Oxford COVID-19 government response tracker; 2020. Blavatnik School of Government.
  3. 3. Maaten Lvd, Hinton G. Visualizing data using t-SNE. Journal of machine learning research. 2008;9:2579–2605.
  4. 4. Kraemer M, Yang C, Gutierrez B, Wu C, Klein B, D Pigott LdP Open COVID-19 Data Working Group, et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science. 2020;368(6490):493–497. pmid:32213647
  5. 5. Google Maps; 2020. Available at https://cloud.google.com/maps-platform, Accessed 5-June-2020.
  6. 6. Lundberg S, Lee S. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30; 2017. p. 4765–4774.
  7. 7. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems; 2017. p. 3146–3154.
  8. 8. Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees. Wadsworth; 1984.
  9. 9. Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–139.
  10. 10. de Myttenaere A, Golden B, Le Grand B, Rossi F. Mean absolute percentage error for regression models. Neurocomputing. 2016;38:48–192.
  11. 11. Shapley L. A value for n-person games. Contributions to the Theory of Games. 1953;28(2):307–317.
  12. 12. Benvenuto D, Giovanetti M, Vassallo L, Angeletti S, Picozzi S. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief. 2020;29:105340. pmid:32181302
  13. 13. Lee G. Coronavirus: Why so many people are dying in Belgium; 2020. Available at https://www.bbc.com/news/world-europe-52491210.
  14. 14. Comas-Herrera A, Zalakain J, Litwin C, Hsu A, Lane N, Fernandez J. Mortality associated with COVID-19 outbreaks in care homes: early international evidence; 2020. Available at https://ltccovid.org/wp-content/uploads/2020/05/Mortality-associated-with-COVID-3-May-final-5.pdf.
  15. 15. Gini C. Concentration and dependency ratios” (in Italian). English translation in Rivista di Politica Economica. 1909;87:769–789.
  16. 16. Clarke H, Whiteley P. Economic inequality can help predict Covid-19 deaths in the US; 2020. USApp—American Politics and Policy Blog. Available at http://eprints.lse.ac.uk/104943/1/Clarke_economic_inequality_can_help_predict_covid_19_published.pdf.
  17. 17. Prata D, Rodrigues W, Bermejo P. Temperature significantly changes COVID-19 transmission in (sub)tropical cities of Brazil. Science of the Total Environment. 2020;729:138862. pmid:32361443
  18. 18. Curtis N, Sparrow A, Ghebreyesus T, Netea M. Considering BCG vaccination to reduce the impact of COVID-19. The Lancet. 2020;395(10236):1545–1546. pmid:32359402