Introduction

With more than 57 million cases worldwide and over one and 1.36 million deaths as of November 20, 2020, the outbreak of coronavirus disease (COVID-19)1, is undeniably one of the worst global crises since World War II. In March 2020, the exponential increase of individuals needing hospitalization in intensive care units, combined with the lack of effective cures and vaccines, pushed many governments to take extraordinary measures aimed at “flattening the curve” of infections2,3,4. The adopted measures included the limitation of mobility and social activities, closure of schools, universities, shops, factories, and so forth, up to the extreme act of national lockdowns. Evidence that such measures achieved a reduction of the rate of new infections are gradually appearing in the scientific literature5,6,7.

These measures contributed to keep the spread of COVID-19 under control for some time, but we are now, in November 2020, experiencing the onset of a new exponential growth of confirmed cases, the “second wave,” with severe risks for personal health and healthcare systems under severe stress. Governments and authorities are facing again the difficult task of deciding if and when new containment measures may be needed. In absence of limitations to mobility and social activities, the pandemic spreads exponentially in time, so that any delay in applying restrictions may lead to severe consequences. On the other hand, accounting for the social and economic impact of the possible countermeasures, already observed in the first half of 20208,9,10,11, restrictions should be taken only if and when it is strictly necessary. Managing the trade-off between these contrasting requirements is extremely challenging.

To address this challenge, we leverage sequential detection theory12,13,14, and specifically quickest detection schemes15,16,17 to propose a rigorous methodology aimed at identifying as quickly as possible the onset of an exponential growth of the pandemic evolution. The proposed procedure—referred to as MAST (mean-agnostic sequential test)—is designed to minimize the average time to detect a change in regime15,16,17: from a situation in which the pandemic is under control (\(\mathcal{H}_0\) regime) to the onset of an exponential growth (\(\mathcal{H}_1\) regime).

Quickest detection theory has a long history18, and has been successfully applied in several fields, including quality control, climate modeling, remote sensing, financial analysis, image analysis, security, signal and speech processing, and biomedical applications16,17. In the context of public health surveillance, where the timely detection of various types of adverse health events is crucial, quickest detection techniques have found several applications19, e.g., to reveal the onset and the peak of the epidemic period20 or that the peak is over21.

The approach pursued in this article is substantially different from most of the epidemiological models, based for instance on stochastic evolution of epidemic compartments5,7,22,23,24 and metapopulation networks25,26, where the goal is to predict the mid/long-term behavior of the outbreak. For instance, in stochastic compartmental models, given an initial condition, the epidemic can have two outcomes: the number of infected individuals can increase, in which case we have a major outbreak, or decrease. The probability of a major outbreak can be computed27, but it is of limited use in taking timely on-line decisions.

The importance of taking a decision as quickly as possible in an epidemic scenario can be understood by looking at the curve of daily cases of COVID-19 infection, reported in Fig. 1 for several nations28.  In the same figure we also report the deterministic curves of daily cases, with an exponential growth described by the following equation

$$\begin{aligned} p_{n+1} = p_n(1+\alpha ) = p_1 (1+\alpha )^{n}, \quad n=1,2,\dots , \end{aligned}$$
(1)

where \(n\) is the time index (day), \(1+\alpha\) is the growth rate, and \(\alpha >0\), which corresponds to the \(\mathcal{H}_1\) regime of a major outbreak. This exponential behavior is equivalent to a recently-proposed disease-transmission model29, and the reproduction number defined therein is equivalent to the growth rate. All the curves in Fig. 1 are normalized to the initial value \(p_1\) and shifted to the same initial time. Figure 1 shows that first wave was noticeably more aggressive than the second one in terms of growth rate; specifically, in the first wave \(\alpha\) was varied by country but ranged between 0.06 and 0.40, while in the second wave it is between 0.01 and 0.06.

Figure 1
figure 1

Number of COVID-19 daily new positives in the first and the second wave of the pandemic for several nations. Each curve is normalized by the initial value and shifted to the same initial time. The initial point of the first wave for a specific nation is when the first positives are reported, while the initial point of the second wave is when the daily positives start to increase again—growth rate larger than one. The first wave, upper region (shaded in red), was more aggressive than the second wave, mostly confined to the lower region (shaded in blue). These two regions are delimited by different values of \(\alpha\) in Eq. (1). Curves of each nation are dashed if related to the first wave otherwise solid if related to the second wave.

In the latter scenario, any delay in revealing the onset of an exponential growth phase and the consequent implementation of containment measures produces a costly exponential increase of the number of new cases. In other words, it is essential to reduce as much as possible the decision delay to level off the curve of infection as early as possible. At the same time, it is important to enforce restrictions only if essential, in order to avoid unnecessary social unrest and economic cost. And, inevitably, any detection procedure under the controlled regime \(\mathcal{H}_0\) can produce false alarms, i.e., it can wrongly declare the upcoming onset of an exponential growth phase. The risk of taking a wrong decision can be quantified by the inverse of the mean time between false alarms. A false alarm could lead to the unnecessary adoption of restrictive measures. Since these have social and economic ramifications, adoption of this risk proxy quantifies how many times such an event occurs on average. Balancing between detection delay and risk represents a fundamental system trade-off. In this paper, we consider a relatively simple, but effective, mathematical model of the pandemic and develop a decision tool to quickly detect the passage from a controlled regime to a critical one. Its effectiveness in terms of delay/risk trade-off is demonstrated on publicly available COVID-19 data from several countries. It is our hope that the proposed MAST procedure can be useful in making timely and rational decisions to control the pandemic evolution.

Results

The main result of this research is the design of the MAST quickest-detection procedure, which is specifically tailored to epidemic scenarios, and its application on COVID-19 data. In this context, we show that the mean time \(\Delta\) required to reveal the onset of an exponential phase is in the order of a few days (or few weeks), with a risk R that scales exponentially with the delay:

$$\begin{aligned} R \sim \exp (-\omega \Delta ), \end{aligned}$$
(2)

where the symbol “\(\sim\)” refers to asymptotic behaviour (small \(R\), large \(\Delta\)), and \(\omega\) is a parameter whose value varies from country to country. This exponential relationship holds for most optimal quickest detection procedures under specific mathematical conditions and optimality criteria15. In this work, we show empirically that this optimality condition is verified by the MAST procedure when applied to the COVID-19 data, despite the fact that MAST is designed to work in the presence of uncertainty and non-stationarity of the data statistics.

To illustrate these concepts, let us refer to Fig. 2, where data from Italy are considered. Panel (a) shows the curve of daily positives from Johns Hopkins University COVID-19 data28, along with the smoothed version thereof, shown in green. Using the latter, we obtain the sequence of growth rates \(x_1, x_2, \dots\), shown in panel (b) (green curve) as the ratio of two consecutive values of daily positives, with its smoothed version (in magenta); see the data model in Eq. (3).

Figure 2
figure 2

(a) Daily new positive individuals in Italy since February 21, 2020, and its moving average obtained with a window of 21 days (green line). (b) Growth rate of the epidemic computed from the averaged daily new positive cases (green line), and its time-varying mean obtained through a moving average that uses a window of 21 days (magenta line). (c) MAST performance, in terms of risk (left axis) versus threshold and mean delay (right axis) versus threshold, obtained from the Italian data for the COVID-19 pandemic.

In Fig. 2a, we note that the peak of the first wave is smaller than the second one, which is likely due to the smaller number of swabs made during the first wave. In other words, in the two waves we are observing different fractions of the total infected populations. This difference does not affect our analysis because the sequence \(x_1, x_2, \dots\), of growth rates are ratios of successive daily positives.

As discussed in “Methods” section and illustrated in the notional scheme of Fig. 5, the proposed MAST procedure takes as input the sequence \(x_1, x_2, \dots\) and recursively computes the sequence of decision statistics \(T_1, T_2, \dots\), see Eq. (7), where \(T_n\) only depends on the observed growth rates \(x_1,\dots ,x_n\), up to day \(n\). It is worth stressing that the growth rate process \(x_1,\dots ,x_n\), is assumed to be Gaussian distributed with unknown and time-varying mean value sequence. The MAST decision statistic is “mean-agnostic,” because it does not require knowledge of the time evolution of such a mean sequence, and it is therefore robust to its deterministic fluctuations.

Figure 3
figure 3

Application of the MAST procedure on the COVID-19 pandemic data from Italy. On the left-side vertical axis, we select a decision threshold to correspond to a desired risk, e.g., \(R=10^{-4}\). Then, the blue curve indicates the stopping day (about July 18, in the example) corresponding to the selected value of risk. Finally, the red curve referred to the right-side vertical axis shows the mean delay \(\Delta\) corresponding to the selected risk \(R\) (about 3 days). For clarity, note that the right-side scale for the delay is split into two linear ranges, for a better rendering of the small-\(\Delta\) range.

The onset of the critical regime is declared by the MAST at the smallest index n such that \(T_n > \chi\), where \(\chi\) is a threshold level. Panel (c) of Fig. 2 shows the corresponding decision performance. The left axis reports the risk R as a function of the threshold value \(\chi\), and the right axis reports the mean delay of decision \(\Delta\), again as a function of \(\chi\). We see that the function \(R(\chi )\) is exponentially decreasing, while \(\Delta (\chi )\) is linearly increasing, consistent with Eq. (2) and with known relationships for Page’s test and other quickest-detection procedures15,16.

Combining the evidence shown in Fig. 2, we obtain Fig. 3 that can be interpreted as follows. The blue curve refers to the left-side vertical axis, while the red curve refers to the right-side vertical axis. Starting from a given level of risk selected on the left-side vertical axis, we find the stopping day of MAST, which is the day on which the onset of the critical regime is declared (first threshold crossing). Then, referring to the red curve, one obtains the corresponding delay \(\Delta\) on the right-side vertical axis. This is the mean delay incurred by the MAST procedure in declaring the onset of the critical regime. The interpretation of \(\Delta\) is that the passage to the exponential growth of the pandemic takes place, on the average, \(\Delta\) days before the alert produced by MAST. In Fig. 3, values of the risk smaller than \(10^{-9}\) are collapsed to \(R=10^{-9}\), because for \(R\le 10^{-9}\) we can safely assume that risk is essentially negligible. It is worth noting that, even at negligible risk level, the stopping day is approximately July 27, and the corresponding mean delay is less than 8 days.

Figure 4
figure 4

Operational curve—risk versus mean delay for decision—for 14 countries. For large \(\Delta\), the operational curve is described by Eq. (2), namely \(R \sim \exp (-\omega \Delta )\). The outermost black curves correspond to the ideal performance of the Page’s test, assuming known and constant growth rate, with \(\alpha =0.01\) and 0.06, respectively (extreme values of \(\alpha\) for the second wave, see Fig. 1). Each nation is characterized by a specific value of \(\omega\) (reported between brackets on the legend), and all values of \(\omega\) lie in the range between \(\omega =0.32\) and \(\omega =11.52\).

The analysis in Figs. 2, 3 can be repeated for other regions, yielding similar insights. We report the cases of: United States of America in Fig. 7; United Kingdom in Fig. 8; France in Fig. 9; and Germany in Fig. 10. Note that in the United States, there exists also a third wave of the pandemic, which we analyze separately in Fig. 7d, by restarting the test after declaring the onset of the second wave. (Actually, such a third wave is likely to be a delayed second wave in different geographic regions of the U.S., as a state-by-state analysis seems to imply). Additional analysis on data from more geographical regions is available online30 and updated regularly.

Figure 5
figure 5

Flowchart of the proposed MAST procedure. Input data are the daily infected. These noisy data are filtered to mitigate imperfections in data collection, randomness, and delays. Filtered daily positive \(p_n\) are used to compute the growth rate \(x_n\), which is used to compute the MAST statistic \(T_n\). The statistic is then compared with the threshold \(\chi\); if it is larger than the threshold, the outbreak is declared. Otherwise, the procedure continues to collect and process the data.

Figure 6
figure 6

(a) Growth rate of the hospitalized individuals (green solid line)—and its time-varying mean obtained through a moving average that uses a window of 21 days (magenta solid line)—compared to the growth rate of the daily new positives individuals (green area)—and its time-varying mean (magenta dotted line)—in Italy since February 21, 2020. (b) Application of the MAST procedure on the growth rate sequence of the daily new positive individuals (solid lines, already shown in Fig. 4) and on the growth rate sequence of the hospitalized individuals (dotted lines). On the left-side vertical axis we select a desired risk, e.g., \(R = 10^{-4}\). Then, the blue curves indicate the stopping day (about July 18 if the growth rate sequence of the daily new positive individuals is used, and August 10 if the growth rate sequence of the hospitalized individuals is used) corresponding to the selected value of risk. Finally, the red curves referred to the right-side vertical axis show the mean delay \(\Delta\) corresponding to the selected risk R (about 3 days if the growth rate sequence of the daily new positive individuals is used, and below 5 days if the growth rate sequence of the hospitalized individuals is used). For clarity, note that the right-side scale for the delay is split into two linear ranges, for a better rendering of the small-\(\Delta\) range.

Figure 7
figure 7

United States of America. (a) Daily new positive cases in the US since February 29, 2020, and its moving average obtained with a window of 21 days (green line). (b) Growth rate of the epidemic computed from the averaged daily new positive cases (green line), and its time-varying mean obtained through a moving average that uses a window of 21 days (magenta line). (c) MAST performance, in terms of risk (left axis) versus threshold and mean delay (right axis) versus threshold, obtained from the US data of the COVID-19 pandemic. (d) Application of the MAST procedure on the COVID-19 pandemic data from US. Both the second wave (left side) and the third wave (right side) are analysed. For each wave, we select a desired risk on the left-side vertical axis, e.g., \(R=10^{-4}\). Then, the blue curve indicates the stopping day (about June 6 for the first wave and September 10 for the third wave) corresponding to the selected value of risk. Finally, the red curve referred to the right-side vertical axis shows the mean delay \(\Delta\) corresponding to the selected risk \(R\) (approximately 4 days for both the second and the third waves). For clarity, note that the right-side scale for the delay is split into two linear ranges, for a better rendering of the small-\(\Delta\) range.

Figure 8
figure 8

United Kingdom. (a) Daily new positive individuals in the UK since February 23, 2020, and its moving average obtained with a window of 21 days (green line). (b) Growth rate of the epidemic computed from the averaged daily new positive cases (green line), and its time-varying mean obtained through a moving average that uses a window of 21 days (magenta line). (c) MAST performance, in terms of risk (left axis) versus threshold and mean delay (right axis) versus threshold, obtained from the UK data of the COVID-19 pandemic. (d) Application of the MAST procedure on the COVID-19 pandemic data from UK. On the left-side vertical axis, we select a desired risk, e.g., \(R=10^{-4}\). Then, the blue curve indicates the stopping day (about July 11, in the example) corresponding to the selected value of risk. Finally, the red curve referred to the right-side vertical axis shows the mean delay \(\Delta\) corresponding to the selected risk R (below 6 days). For clarity, note that the right-side scale for the delay is split into two linear ranges, for a better rendering of the small-\(\Delta\) range.

A comparison of the MAST performance for 14 different nations is addressed in Fig. 4. Different decision performances reflect different values of the parameter \(\omega\) governing the relationship between risk and delay, shown in Eq. (2). We see that, accepting a risk of \(R = 10^{-4}\), the mean detection delay \(\Delta\) is about 3 days for Netherlands and below 20 days for Spain. Thus, the MAST procedure detects an outbreak very expeditiously at very low risk level.

Figure 9
figure 9

France. (a) Daily new positive individuals in France since February 25, 2020, and its moving average obtained with a window of 21 days (green line). (b) Growth rate of the epidemic computed from the averaged daily new positive cases (green line), and its time-varying mean obtained through a moving average that uses a window of 21 days (magenta line). (c) MAST performance, in terms of risk (left axis) versus threshold and mean delay (right axis) versus threshold, obtained from the French data of the COVID-19 pandemic. (d) Application of the MAST procedure on the COVID-19 pandemic data from France. On the left-side vertical axis, we select a desired risk, e.g., \(R=10^{-4}\). Then, the blue curve indicates the stopping day (about July 7, in the example) corresponding to the selected value of risk. Finally, the red curve referred to the right-side vertical axis shows the mean delay \(\Delta\) corresponding to the selected risk \(R\) (below 20 days). For clarity, note that the right-side scale for the delay is split into two linear ranges, for a better rendering of the small-\(\Delta\) range.

The outer black curves labeled by \(\omega =0.32\) and \(\omega =11.52\) represent the ideal performance of the Page’s test (see Eq. (6)), in which the mean values of the growth rates \(x_1, x_2,\dots\), are assumed constant, known in advance, and equal to \((1 + \alpha )\) under \(\mathcal{H}_1\) and \((1 - \alpha )\) under \(\mathcal{H}_0\). For such a scenario, the parameter \(\omega\) appearing in Eq. (2) is the Kullback-Leibler distance \(2\left( \alpha /\sigma \right) ^2\) between the distributions under the two regimes15,31. In signal processing language, this quantity is often referred to as signal-to-noise ratio32. To provide performance envelopes, we use the values \(\alpha = 0.01\) and \(\alpha = 0.06\) corresponding to the extreme growth rates of the second wave reported in Fig. 1, and set \(\sigma =0.025\), which is the arithmetic mean of the estimated standard deviations of the 14 countries. For the two values of \(\alpha\), this yields \(2\left( \alpha /\sigma \right) ^2=0.32\) and 11.52, respectively, which are the values reported in Fig. 4. By setting \(\omega =2\left( \alpha /\sigma \right) ^2\) in Eq. (2) we observe that the higher is the growth rate, the better is the detection capability for a given risk. In other words, consistent with intuition, the more aggressive is the outbreak, the more quickly it can be detected. On the other hand, the larger is \(\sigma\), the higher is the delay for a given risk. Also this effect is intuitive, because the standard deviation \(\sigma\) measures the entity of random fluctuations in the data \(x_1,x_2,\dots\), and reliable decisions require more time because the data is uncertain. These trade-offs are also observed for the MAST test run over COVID-19 data for different nations, and are captured by the parameter \(\omega\) in Eq. (2).

Discussion

Let us focus again on Italy, and set \(R=10^{-4}\). Accepting a risk in the order of \(R=10^{-4}\) means that there is one chance in ten thousand that the countermeasures are taken too early or, in other words, unjustified actions against the pandemic are adopted every 27 years, on the average. By taking drastic countermeasures on July 18—the stopping day prescribed by our MAST procedure for \(R=10^{-4}\)—we would have left only \(\Delta \approx 3\) days of uncontrolled exponential growth of the pandemic, before addressing it. In this respect, it should be noted that the adoption of severe countermeasures in Italy has been decided only at the beginning of November 2020. The delayed decision situation is analogous for other nations.

It is clear that managing an unprecedented pandemic event is a huge and a multifaceted problem, which can only be addressed by taking into account many different perspectives. It is also clear that the decision when to take pandemic countermeasures depends on a large number of societal factors. The contribution of this article is limited to the analysis of the pandemic strictly from a quickest-detection viewpoint and, from this perspective, we obtain useful insights and quantitative analyses. One evidence, as just pointed out, is that critical regimes of many nations began dramatically earlier than when countermeasures were taken. In this sense, we believe that the proposed decision support tool would be a key component of a command and control (C2) system that anticipates, and possibly reacts as soon as possible, to threats. Such a C2 system would be useful for both national health security and armed forces in the context of Chemical, Biological, Radiological and Nuclear (CBRN) defense33,34.

Aside from the above retrospective analysis, a major contribution of the MAST quickest detection tool developed in this paper consists of providing proactive decision support for detecting future waves of the COVID-19 outbreak, and the onset of future pandemics. In these cases, precise nation-dependent performance predictions, such as those given in Fig. 4, cannot be available in advance because the curves have been obtained by exploiting estimates of the mean value of the growth rate (see e.g., the curve in magenta in Fig 2), derived by forensic inspection of the data. However, good approximate performance bounds can be obtained by assuming constant growth rates \((1\pm \alpha )\) in the critical and controlled regime, respectively, for “reasonable” values of \(\alpha\), for instance, the faster and the slower rates reported in Fig. 2 and used to compute the outermost black curves in Fig. 4. The corresponding value of \(\omega\) would be equal to the Kullback-Leibler information measure, which governs the performance of a clairvoyant Page’s test that knows exactly the constant growth rate.

Figure 5 shows the flowchart of the MAST procedure. The filtering operation in Fig. 5 is important to mitigate gross errors and lack or delayed reporting of the input data (for instance, thousands of positive individuals from previous days are all reported on the current day, number of recovered individuals unavailable in the US data for a long time, etc.) To compute \(p_n\), the filtering operation used in this paper requires one to observe data samples beyond the current day \(n\), which causes delays for on-line implementations. In these cases, alternative causal filtering strategies35, such as the Savitzky-Golay filter36, would be more appropriate. Another possibility to handle outliers could be the application of Huber’s robust statistic37.  We leave such possible enhancements to future work.

The quickest-detection tool developed can also be applied to different time-series, other than the sequence of growth rate of daily new positives \(\{x_n\}\), for instance to the hospitalized individuals addressed, for the Italian case, in Fig. 6. The sequence of hospitalized individuals does not require the smoothing operation shown in the flowchart of Fig. 5, because the collected data are less affected by gross errors. These data are also normally distributed with good accuracy, as confirmed by goodness-of-fit analysis (not reported). In Fig. 6 we see that the decision taken by using the hospitalized sequence is delayed as compared to that obtained by the sequence of daily new positives. This behaviour is intuitive because the hospitalized individuals are a subset of the positive ones, and a possible hospitalization follows the onset of clinical symptoms.

Figure 10
figure 10

Germany. (a) Daily new positive individuals in Germany since February 25, 2020, and its moving average obtained with a window of 21 days (green line). (b) Growth rate of the epidemic computed from the averaged daily new positive cases (green line), and its time-varying mean obtained through a moving average that uses a window of 21 days (magenta line). (c) MAST performance, in terms of risk (left axis) versus threshold and mean delay (right axis) versus threshold, obtained from the German data of the COVID-19 pandemic. (d) Application of the MAST procedure on the COVID-19 pandemic data from Germany. On the left-side vertical axis, we select a desired risk, e.g., \(R=10^{-4}\). Then, the blue curve indicates the stopping day (about July 19, in the example) corresponding to the selected value of risk. Finally, the red curve referred to the right-side vertical axis shows the mean delay \(\Delta\) corresponding to the selected risk \(R\) (below 13 days). For clarity, note that the right-side scale for the delay is split into two linear ranges, for a better rendering of the small-\(\Delta\) range.

Public opinion soundings suggest increased concern about new SARS-CoV-2 variants38, especially related to their higher fatality rate39, spreading velocity40,41, and the possibility that approved vaccines might be less effective against them42,43,44. The proposed quickest detection tool is useful also in the presence of variants of the coronavirus, which would affect the growth rate; indeed, variants were already present in the United Kingdom during the second wave, reported here in Fig. 8. It is worth noting here that the MAST statistic does not require knowledge of the time-varying mean value sequence of the growth rate.

The possibility of processing different sequences of data opens the way to the design of more sophisticated decision rules based on joint processing of multiple time-series. In addition, the MAST procedure can easily accommodate different definitions of critical regime. For instance, if a pandemic growth at rate \((1+\alpha ^*)\) is considered acceptable, for some \(0<\alpha ^*\ll 1\), the critical regime can be characterized by rates exceeding \((1+\alpha ^*)\), rather than 1. The necessary modifications to the MAST statistic are straightforward. We also envision that considering both the passage from a controlled to a critical regime and vice versa can be addressed by minor modifications to the MAST procedure45,46.

Special attention is given in the literature to the evaluation of underreporting and undertesting of COVID-19 cases25,47.  The mortality rate is used as the main indicator to evaluate the extent of underreporting and underdetection of COVID-19 cases47.  However, in our context, the evaluation of underreporting cases would not be beneficial in terms of quickest detection, as it is provided by an estimation procedure that uses the same data processed by the quickest detection. As already mentioned, all the available data (hospitalized individuals, daily deaths, daily number of tests etc.) could be used jointly and thus improve the detection capability and reliability of the approach.

An extended analysis of COVID-19 infection data from more countries than those covered in this paper is available on the web30.  We hope that in the near future the publicly available data can be: (i) more reliable so as to mitigate bias effects due to, e.g., false positives, contrasting multiple test outcomes for the same individual, markedly different contagion incidence in close geographical areas, etc.; and (ii) released with finer granularity so as to allow for analyses stratified by population age, comorbidity, etc. These aspects are also relevant for effective vaccination policies.

Methods

The observation model used in this paper can be formally obtained by replacing the constant growth rate \((1+\alpha )\) appearing in Eq. (1) by the sequence of random variables \(x_1,x_2,\dots\), yielding

$$\begin{aligned} p_{n+1}=p_1 \prod _{k=1}^n x_k, \quad n=1,2,\dots \end{aligned}$$
(3)

where \(p_n\) is the number of new cases on day n, while \(x_n=p_{n+1}/p_{n}\), a time-series that makes explicit the growth rate that we seek to coopt. The model in Eq. (3) is validated empirically. In particular, in the accompanying Supplementary Information, we elaborate on the classic susceptible-infected-recovered (SIR) compartmental epidemic model to motivate the usage of the sequence \(\{x_n\}\) as observable process for quickest detection. Using the model in Eq. (3), we assume to have available the sequence of daily new positives, for a certain region of interest; the analysis presented here relies on the data provided by the Johns Hopkins University28.  Referring for instance to Fig. 2, such a sequence is shown in gray in the left panel. To address gross errors, missing values and delays in reporting the data, the sequence is smoothed by a moving average filter with uniform weights (the moving average filter of length \(L\) is always assumed to have equal 1/L weights). The smoothed sequence \(\{p_n\}\) so obtained is shown in green in Fig. 2a. The growth rate process used as observable is then computed as \(\{x_n=p_{n+1}/p_{n}\}\), and is represented by the green curve in Fig. 2b. The length of the filter is selected to \(L=21\) as to obtain a convenient mitigation of the effects of gross errors in all analyzed countries. After such a pre-processing, we implement Kolmogorov-Smirnov tests to check data Gaussianity, see details in the Supplementary Information.

The time-varying statistical mean \(\{\mu _n\}\) of the sequence can be estimated by low-pass filtering of the sequence \(\{x_n\}\), and for this we use again a moving average filter of length \(L=21\) days. By subtracting from each \(x_n\) the estimated mean value \(\widehat{\mu }_n\), that is, the curve in magenta in Fig. 2b, one obtains the sequence \({x_n-\widehat{\mu }_n}\). Statistical analysis conducted by Kolmogorov-Smirnov goodness-of-fit test48 reveals that \({x_n-\widehat{\mu }_n}\), for each \(n=1,2,\dots\), can be modeled by a zero-mean Gaussian random variable with (country-dependent) standard deviation \(\sigma \ll 1\). Since \(\mu _n\) is close to unity, we see that \(x_k < 0\) with negligible probability, hence the Gaussian approximation should not be problematic.

By observing the sequence \(\{x_n\}\) as time n elapses, we want to detect the passage from the controlled regime \(\mathcal{H}_0\), to the critical regime \(\mathcal{H}_1\), when there is the exponential growth. In the controlled regime, the mean value of the random variable \(x_n\) is below one, i.e., \(\mu _n=\mu _{0,n} \le 1\), meaning that the number of new positives remains approximately stable or decreases. Conversely, in the critical regime, the mean value is greater than unity, \(\mu _n =\mu _{1,n} > 1\), which implies an explosion of daily new positives in the long run. Formally:

$$\begin{aligned} \text {controlled regime } \mathcal{H}_0\Rightarrow & {} \; \; x_n \sim \mathcal{N}(\mu _{0,n}, \sigma ), \qquad \mu _{0,n} \le 1, \end{aligned}$$
(4)
$$\begin{aligned} \text {critical regime } \mathcal{H}_1\Rightarrow & {} \; \; x_n \sim \mathcal{N}(\mu _{1,n}, \sigma ), \qquad \mu _{1,n} > 1 . \end{aligned}$$
(5)

We assume that the \(x_k\)’s are mutually independent under either regime. As we show in the accompanying Supplementary Information, slight deviations from the condition of perfect independence do not significantly affect the results of the proposed MAST. Different nations are characterized by different values of \(\sigma\), hence for each country \(\sigma\) is assumed known, because in practice it can be estimated on-line from the data. Conversely, the quantities \(\{\mu _{0,n}\}\) and \(\{\mu _{1,n}\}\) are modeled as deterministic but unknown sequences.

To detect the change of regime, we rely on the Generalized Likelihood Ratio Test (GLRT) approach12,13,14, a milestone of decision theory in scenarios where the statistical distributions of the data contain unknown parameters—in our case, the sequences of mean values \(\{\mu _{0,n}\}\) and \(\{\mu _{1,n}\}\), and the time at which the passage of regime occurs. If we assume the sequences of mean values in the two regimes to be constant and known, say \(\mu _{0,n}=1-\alpha\) and \(\mu _{1,n}=1+\alpha\) (cf. Eq. (1)), the GLRT solution to the quickest-detection problem would be the celebrated Page’s test: compare to an appropriate threshold the CUSUM statistic15,16,18

$$\begin{aligned} {\left\{ \begin{array}{ll} Q_0=0, \\ Q_{n}=\max \bigg \{ 0, Q_{n-1} + \frac{2 \alpha \, (x_n-1) }{\sigma ^2} \bigg \}, \quad n \ge 1. \end{array}\right. } \end{aligned}$$
(6)

In the presence of unknown and time-varying sequences complying with the constraints on \(\mu _{0,n}\) and \(\mu _{1,n}\), shown in (4)–(5), the GLRT solution to the quickest-detection problem amounts to compare the MAST statistic \(T_n\) to a threshold \(\chi\)49:

$$\begin{aligned} {\left\{ \begin{array}{ll} T_0=0, \\ T_{n}=\max \bigg \{ 0, T_{n-1} + \frac{(x_n-1)^2 {{\,\mathrm{sign}\,}}(x_n-1)}{2 \sigma ^2} \bigg \}, \quad n \ge 1, \end{array}\right. } \end{aligned}$$
(7)

and declaring the change of regime at the first occurrence of the threshold crossing. It is worth noting that the MAST statistic is formally obtained by replacing the unknown value of \(\alpha\) appearing in the CUSUM statistic, with the estimate \(\widehat{\alpha }_n=|x_n-1|\) (constant factors can be incorporated in the threshold). The reader educated in detection theory will recognize the analogy with the energy detector arising in testing the presence of an unknown time-varying deterministic signal buried in Gaussian noise32.

The threshold \(\chi\) employed in the MAST procedure is selected to trade-off decision delay \(\Delta\) and risk \(R\), two quantities that are defined as follows. The mean delay \(\Delta\) is the mean value of the difference between the time at which the MAST statistic crosses the threshold \(\chi\) and the time of passage from the controlled to the critical regime. The risk R is defined as reciprocal of the mean time between successive false alarms (in fact, “false alarm probability” is perhaps a more familiar jargon to readers with background in detection theory), where the false alarm is defined as a threshold crossing during the controlled regime (i.e., the regime in which there is no need for intervention). In this paper, the functions \(R(\chi )\) and \(\Delta (\chi )\) are obtained by standard Monte Carlo simulations 13,27,32 for relatively small values of \(\chi\), see e.g., Fig. 2c. We found that the first function is essentially exponential, and the second is essentially linear, consistent with known expressions for the Page’s test15,16. This allows us to extrapolate the behaviour of the two functions for values of \(\chi\) that would be problematic to obtain from real data or by computer experiments, such as those used in Fig. 3 and similar figures for other nations.