Next Article in Journal
The Potential Functions of Protein Domains during COVID Infection: An Analysis and a Review
Previous Article in Journal
The Impact of COVID-19 Era on Pulmonary Embolism Patients: Increased Incidence of Hospitalizations and Higher Mortality—What Can Be Done?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pandemic Growth and Benfordness: Empirical Evidence from 176 Countries Worldwide

1
School of Business and Management, IU International University, 10247 Berlin, Germany
2
Sydney Medical School, Nepean Clinical School, University of Sydney, Sydney, NSW 2747, Australia
*
Author to whom correspondence should be addressed.
COVID 2021, 1(1), 366-383; https://doi.org/10.3390/covid1010031
Submission received: 14 August 2021 / Revised: 8 September 2021 / Accepted: 9 September 2021 / Published: 13 September 2021
(This article belongs to the Topic Burden of COVID-19 in Different Countries)

Abstract

:
In the battle against the Coronavirus, over 190 territories and countries independently work on one end goal: to stop the pandemic growth. In this context, a tidal wave of data has emerged since the beginning of the COVID-19 crisis. Extant research shows that the pandemic data are partially reliable. Only a small group of nations publishes reliable records on COVID-19 incidents. We collected global data from 176 countries and explored the causal relationship between average growth ratios and progress in the reliability of pandemic data. Furthermore, we replicated and operationalized the results of prior studies regarding the conformity of COVID-19 data to Benford’s law. Our outcomes confirm that the average growth rates of new cases in the first nine months of the Coronavirus pandemic explain improvement or deterioration in Benfordness and thus reliability of COVID-19 data. We found significant evidence for the notion that nonconformity to BL rises by the growth of new cases in the initial phases of outbreaks.

1. Introduction

In January 2020, the World Health Organization (WHO) confirmed the first cases of Coronavirus, also known as COVID-19 or SARS-CoV-2 in Wuhan City, China [1]. With millions of incidents and deaths to date, a tidal wave of data on COVID-19 has emerged. Since the outbreak of the virus, countries unanimously reported two metrics, “new cases” (individuals testing positive for the virus) and “new deaths” (the daily number of deaths) [2].
Having access to reliable data is vital. Policymakers use statistics to make life-saving decisions on restricting interventions colloquially known as lockdowns, travel bans, and social distancing. Similarly, scientists use pandemic data to detect the characteristics of the germ and respond accordingly.
There have been irregularities in Coronavirus data. Several forensic studies emerged, inter alia, Koch and Okamura [2], Idrovo and Manrique-Hernández [3], Wei and Vellwock [4]. Lee et al. [5], and Isea [6]. Table 1 summarizes the prior research into the COVID-19 data.
Jackson and Sambridge [7] evaluated Coronavirus data from 51 countries from 16 January 2020 to 9 April 2020. In one of the most comprehensive studies, Farhadi examined over 100.000 integers from 154 countries [8] with the primary outcome that approximately 28% of countries published reliable data on Coronavirus spread. In contrast, six countries disclosed entirely inconsistent records in the first nine months of the global pandemic. In a further step, Farhadi and Lahooti [9] investigated the “progress of Benfordness” across 182 countries from 21 January 2020 to 6 June 2021. They used prior results to explain observed improvements in COVID-19 reliability. The dataset, as well as goodness of fit tests, were further extended to inspect the reliability of over 200,000 integers. Evidence was found that approximately 32% of nations worldwide accomplished measurable progress in Benfordness, while 68.2% showed no explicit improvement. The same results underpinned a moderate correlation between the goodness of fit tests for Benfordness and the Johns Hopkins Global Health Risk Index of 2019, suggesting a plausible relationship between national healthcare policies and the reliability of pandemic data.
All studies stated unanimously applied “the law of the first digits,” also known as Benford’s law (BL), a generally repeated forensic technique for detecting fraudulent data in academia and the business world. The core idea relates to the frequency of leading digits in naturally generated datasets by Equation (1) [10]:
P d = l o g 10 1 + 1 d ,   d   1 ,   2 ,   3 , ,   9
According to BL, the leading digits one to nine follow a particular logarithmic distribution: 30.1% for one, 17.6% for two, 12.5% for three, 9.7% for four, 7.9% for five, 6.7% for six, 5.8% for seven, 5.1% for eight, and 4.6% for nine [10,11]; see Table 2.
In an artificially generated data set, the observed frequencies are very likely to deviate from the BL distribution. Benfordness can be used on data with a geometrical tendency and is characterized by the absence of minima and maxima. BL is commonly used and has been widely applied in several disciplines such as finance and accounting [12,13], politics [14], and epidemiology [2,3,4,5,6,7,8,9].
Previous research on COVID-19 has three major shortcomings. First and foremost, it operationalized inconsistent data sets that were commonly limited to varying and frequently smaller sample sizes. In particular, it also utilized a variety of statistical techniques to assess Benfordness. Finally, due to their narrow scope, which was typically limited to forensic analysis of COVID-19 data, earlier studies did not elucidate the determinants of BL non-compliance.
A better understanding of changes in Benfordness is crucial. If the pandemic data are only partly trustworthy, as found in previous studies, then restrictive measures such as social distancing, lockdowns, or travel bans are ineffective interventions. For this reason, it is imperative to gain insights into why some countries are improving compliance with the BL while others are only partially complying or, in the worst case, not complying at all. This paper is therefore concerned with identifying the key drivers of irregularities and improvements in BL compliance.

Pandemic Growth

The new Coronavirus follows the simplest model with lumped parameters known as the autonomous logistic sigmoid function [15,16]; see Equation (2):
d N d t = r × N × 1 N N
where N is the total number of people affected by the epidemic, N∞ represents the entire population (or the maximum number of incidents) and r is the growth rate of the epidemic [16]. Infectious disease spreading in its full range is limited to the entire population or N. The severe acute respiratory syndrome (SARS) outbreak of 2002 [17,18] kept out about 8096 cases in 29 territories, and the Spanish flu of 1918, killing an estimated 50–100 million people [19,20]. Accordingly, the number of daily incidents is proportional to the number of existing cases, as underlined in Equation (3):
N d = 1 + E × p d × N 0
where, N 0 : number of new cases on the first day of the pandemic; N d : number of new instances on a given day; E: average number of people exposed to new infection on a given day; p: the probability of each exposure becoming an infection; d: number of days since the beginning of the pandemic [20].
Logistic growth of the new Coronavirus can only be stopped when either E or p declines, an ineluctable fact since viruses are subject to growth limitations at some point, for instance after herd immunity has occurred or when vaccination programs have been initiated. In consequence, even in the worst-case scenario in which COVID-19 spreads widely, a group of people may no longer be exposed to the virus; in other words, they will be immunized [20]. This group is expected to increase to N∞ over time. The number of new daily cases progressively grows on the sigmoid curve before hitting the inflection point, following constantly rising slopes. In contrast, the number of incidents digressively grows to obey to constantly decreasing slopes after passing the inflection point. One of the critical statistics to monitor pandemics is thus the growth ratio ( g d ), see Equation (4):
g d = N d N d 1
where N d is the number of new cases on a particular day, N d 1 is the same number on the previous day. The “growth ratio” (sometimes referred to “growth factor”) rests consistently above one ( g d > 1 ) on the exponential part of the sigmoid curve and before reaching the curve’s inflection point. At the inflection point, the slope equals one g d = 1 It further falls below one g d < 1 after moving away from the inflection point. This is why g d can have both progressive and degressive growth patterns.
Logic suggests that interventions and restricting policies, such as lockdowns or social distancing, turn progressive growth into degressive growth. Suppose countries publish average growth ratios below or equal to one. If so, it can be concluded that the spreading of infectious diseases is growing at a slower pace. Ma postulated that “the initial exponential growth rate of an epidemic is an important measure of the severeness of the epidemic and is also closely related to the basic reproduction number” [18].
Another important concept in our view is the volatility of the pandemic’s progressive or degressive growth. We denote the volatility as the fluctuation in the daily growth of new cases as measured by the standard division of the average growth ratios; see Equation (5):
δ = 1 N 1 i = 1 N   g d i g d i ¯   2
where δ stands for standard deviation (or volatility), N is the number of observations, g d i represents the growth ratio on a particular day. Expectedly, fluctuation in growth ratios occurs when daily incidents show sudden upward and downward movements. This may pertain to delayed reporting, e.g., when COVID-19 reports are conducted with a time lag. Larger volatility may signal inconsistent testing and reporting capabilities.
To illustrate the impact of growth ratios and volatility, we have compiled Table 3 and Figure 1, including the results of the most notable cases in our study. Table 4 summarizes the correlations between all variables. In previous studies, Tajikistan, Belarus, Bangladesh, Iran, and Turkey [9] were the major BL non-compliant countries [8,9]. These countries have striking statistical characteristics: our evaluation showed nonconformity to BL, low average growth rates, and low turnover in new cases for these countries. Unpredictably, Tajikistan proclaimed itself to be COVID-19-free in early 2021 [21]. Belarus faced ongoing political unrest and mass protests, which increased the risk of infections [22]. Bangladesh acknowledged an instant decline in Coronavirus cases with no reasonable explanations [23]. The Government of Turkey pushed for a revision of the epidemic guidelines to discourage reporting new SARS-CoV-2 cases [24].
In contrast, the conforming cases countries that conform to the BL—have substantially greater levels of growth and volatility ratios. One of the world’s most transparent and well-developed public health systems, Australia had a more realistic growth ratio of 1.26 and a volatility of 151%. Australia consistently adhered to Benfordness throughout the period and even showed progress in disseminating reliable data. We grouped these diverse jurisdictions with analogous characteristics; see Figure 1. Prominent BL representatives calmly disclosed daily growth rates that averaged close to 1.0. Compliant countries such as Israel, Australia, and Germany showed much higher average growth ratios and volatilities. We believe that that this growth could drive change in Benfordness.
We, therefore, hypothesize that epidemic growth ratios increase the distance from the expected Benford frequencies:
Hypothesis (H1).
Pandemic growth increases the distance to Benford’s law.
Hypothesis (H2).
Average growth ratio in the early stage determines the future pandemic growth.

2. Materials and Methods

2.1. Hypothesis Testing

The structural equation modeling (SEM) technique was utilized to examine the hypotheses in our research. We used the partial least squares (PLS) method to support the explanatory research [25] and to analyze both structural and measurement models. We chose SmartPLS software to further explore the predictive power of the theoretical framework [26]. Our objective has been to uncover the growing complexity of Coronavirus growth and its causal interrelationship with the epidemic data reliability. Explanatory research is beneficial when theory has not yet been established [26,27,28]. By employing PLS-SEM, we benefited from the high statistical power of the method [29,30]. As an alternative to covariance-based methods, SEM-PLS facilitates variance-based structural equation modeling. PLS is especially suitable for early phases of research when the phenomenon is new, and there are no theories already in place. PLS approach is especially appropriate for predictive studies [31].
To assess a change in Benfordness, we incorporated the results of earlier research and subsequently analyzed two phases of the Coronavirus pandemic data. Correspondingly, the first and second phases included 31 December 2020 to 24 September 2020 and 25 September 2020 to 6 June 2021. The simultaneous system is composed of two endogenous and one exogenous construct, as shown in Figure 2, including the latent variables “Benfordness Change,” “Growth Phase One,” and “Growth Phase Two.”
Each of the exogenous constructs, growth phase one and growth phase two, comprises one indicator, i.e., “average growth rate phase one” (or AGRP1) for the period 31 December 2019 to 24 September 2020, and “average growth rate phase two” (or AGRP2) for the period 25 September 2020 to 6 June 2020. The “Change in Benfordness” involves three reflective indicators, BL changes captured by Chi-square (CHI-Delta), K-S statistics (KS-Delta), and d* or d-factor (d-Delta). These stats were used in prior studies [2,3,4,5,6,7,8,9]. We regard these items as reflective indicators of the endogenous construct “Change in Benfordness” as they conceptually measure the same phenomenon. According to Jarvis, MacKenzie, and Podsakoff [25], the causality direction for reflective constructs runs from the construct to the item; a change in the indicator values will not change the construct.

2.2. COVID-19 Data Sampling

We collected data from the COVID-19 database of the Centre for Systems Science and Engineering at Johns Hopkins University. Our sample consisted of 87,011 integers on daily new cases and 77,236 on daily new deaths reported worldwide between 21 January 2020 to 6 June 2021. We purposefully excluded other variables, such as new deaths, new tests, and new vaccinations, since these items can be moderated and influenced by domestic public health systems and policies in different regions or countries. We focused on new cases only to capture and study the pandemic’s logistic growth curve between 21 January 2020 to 24 September 2020 or phase one, within 248 days and 25 September 2020 to 6 June 2021 or phase two, including 255 days.
On average, each country provided 344.69 observations in the first phase of our study. Logic suggests that countries with a smaller population or more limited health care capabilities may have yielded narrower data sets on the COVID-19 spread. However, both the average growth ratio of the logistic curve and the statistical tests for BL conformity depend on the sample size. A small sample size adversely affects the statistical testing and measurement of pandemic growth.
Thus, we focused additionally on those states with comparable and significant sample sizes over the average of 344.69 observations in the first phase. This led to an additive sample of 102 states that clearly met the specifications for acceptable sample size in SEM-PLS [6]. A frequently used methodology for estimating the minimum sample size in PLS-SEM is the ten-fold rule of thumb (28), stating that the minimum sample size should be more than ten times the maximum number of inner or outer model linkages to latent variables. All the 102 countries supplied over 570 observations in the second phase.
Prior research already measured and provided countries’ distance to BL by applying multiple statistical tests. The variables commonly used in previous studies were Kolmogorov–Smirnov statistic, Chi-square ( χ 2 ) goodness-of-fit test, and Euclidean distance [2,3,4,5,6,7,8,9,10]. To be consistent with prior research, we operationalized the same variables to assess the distance to Benford’s frequencies. For this reason, we replicated the results of previous studies [9] and calculated the changes in COVID-19 goodness of fit tests, i.e., Δ = τ B τ A , where τ   A and τ   B are countries’ goodness of fit tests for the first and second phases. A Δ < 1 signifies countable progress in Benfordness, while a Δ > 1 suggests the opposite, a worsening development. To exclude the undesired effect of serial correlation, we conducted the Durbin Watson test, resulting in an acceptable value of d = 1.975 for the variables AGRP1 and AGRP1. The rule of thumb suggests that if ( 1.5 < d < 2.5 ) is true, then autocorrelation is not a cause for concern.
Therefore, we computed the day-to-day growth ratios for all the countries reported in earlier reports (8-9). We then calculated the average growth rates on a country-by-country basis and initialized the periodic average growth rates as follows (see Equation (6)):
g ¯ d N i = 1 n i = 1 n d N i d N i 1
where n is the number of daily growth ratios per country. We applied the Monte Carlo Simulation of the pandemic logistic growth and conformity to BL based on the Chi-square goodness of fit test. Our simulation included 1000 iteration based on the average daily new cases of the countries included in the randomly selected cases of countries. We observed that 49% of randomly created cases had a growth ratio larger than one; on average, 52% of simulated cases violated the threshold for BL conformity. See Table 5 for the variables and Table 6 for the aggregate results of changes in Benfordness.

3. Results

We tested Hypotheses H1 and H2 using the samples of 176 and 102 countries and confirmed the explanatory power of the model in the sample. By simplifying the reflexive indicators and the endogenous construct change in Benfordness, we found a significant improvement in the out-of-sample measures and the predictive power of the PLS-SEM. Cases with missing values were excluded using the listwise deletion procedure. In this procedure, each row containing a missing value was deleted. Only the remainder of the sample is used.

3.1. Measurement Model

First, we examined the robustness of all constructs. All item loadings exceeded the threshold of 0.708 and confirmed over 50% of the variance of their indicators, as suggested in the literature [28,29,30,31]. For the sample of 102 countries, PLS signaled a KS-delta (0.699) and a d-delta (0.683), indicating that both item loadings may be removed in the context of the smaller sample. Single-item constructs attained high values of 1.000 for the items AGRP1 and AGRP2. The use of single-item constructs is an acceptable practice in PLS-SEM. The very significant T-statistics for each item loading reaffirmed the relevance of the convergent validity. The factor loading for each item on its respective construct was highly significant (p < 0.0001), as evidenced by the T-statistics. See Table 7—outer loadings.
Second, we checked internal consistency reliability, mainly by using Jöreskog’s [32]. Larger values between 0.70 and 0.90 are generally considered satisfactory to good. For constructs with more than one item, a value greater than 0.95 is problematic because it indicates that the items are redundant, which negatively impairs the construct validity. Our single-item constructs scored high (1.000) and met all the above requirements, meaning that these constructs were completely (100 percent) dependent on one item.
Third, convergent validity is concerned with the extent to which the construct explains the variance in its reflective items. As one of the building blocks of PLS model evaluation, and consistent with the guidelines of Fornell and Larcker [12], we determined the average variance extracted (AVE) to have a threshold of 0.50 or higher, implying that the construct explains at least 50 percent of the variance in its reflective items [26,33]. See Table 8: Measurement Model Evaluation, Changes in Benfordness.
Fourth, we also tested discriminant validity by applying the Heterotrait–Monotrait ratio (HTMT). The HTMT is defined as the mean of item correlations across constructs relative to the (geometric) mean of the average correlations for items measuring the same construct. Problems with discriminant validity are present when HTMT values are high [26] with a threshold of 0.90 for structural models. According to Rönkkö and Cho, the HTMT is indeed a new application of the parallel reliability coefficient [26]. High HTMT coefficients would indicate a problem of discriminant validity. The HTMT criterion outperforms classical approaches to assess discriminant validity, such as the Fornell-Larcker criterion and (partial) cross-loadings, which are widely unable to detect a lack of discriminant validity [34]. Our analysis confirms that the HTMT coefficients of all constructs did not exceed the recommended threshold of 0.90. See Table 9: discriminant validity by Heterotrait-Monotrait ratio (HTMT).

3.2. Structural Model

Before evaluating the structural model, multicollinearity was screened using the variance inflation factors (VIF), which specify the degree to which the items are inflated [26]. The VIF values should be close to 3 and below. Multicollinearity is induced by a high correlation between the independent variables and would affect the statistical power of the coefficients and weaken the reliability of the estimated p values. Our model based on the samples put into effect is not affected by highly correlated predictor variables. See Table 10: collinearity statistics VIF.
Following the satisfactory evaluation of the measurement model, we evaluated the structural model against four standard criteria, including (a) the coefficient of determination R2, (b) the path coefficients, and (c) the out-of-sample predictive power [35]
To test the predictive accuracy of the PLS path model, the coefficient of determination R2 in the context of the study was performed. As a general guideline, Hair et al. recommended the following thresholds: substantial (equal to or above 0.75), moderate (close to 0.50), and weak (less than 0.25) results (28). In our explanatory research, the endogenous constructs “change in Benfordness” and “growth rate phase two” achieved R2: 0.370 and 0.388 for the sample of 176 countries and R2: 0.590 and 0.437 for the sample of 102 countries [26]. The R2 indicates only the explanatory power of the model within the sample. These findings can be attributed to the statistical power of the smaller sample, including those countries with a sufficient number of COVID-19 records. See Table 11: coefficient of determination R2.
All path coefficients are positive (i.e., in the expected direction) and statistically significant (at p < 0.05). To model the interaction effects, we followed Chin et al. [33]. The interaction terms were expressed by multiplying the corresponding indicators of the predictor and moderator constructs. We also adhered to their recommended hierarchical process to construct and compare the models with and without the respective interacting constructs. Table 12 shows the results of the structural model with interaction effects for both samples with 176 and 102 countries.
Further assessment of the PLS model for predictability, as suggested by Shmueli et al. [35] and Hair et al. [26], affirmed moderate out-of-sample predictive power for our results. The actual SmartPLS software algorithm allowed us to retrieve k-fold cross-validated prediction error and prediction error summary statistics, the root mean square error (RMSE), to evaluate the predictive performance of their PLS pathway model. Independent from the PLS, the linear regression (LM) model offers prediction errors. In the LM approach, each exogenous indicator variable is regressed on every endogenous indicator variable to generate predictions. Thus, a side-by-side comparison with the PLS-SEM and LM outcomes indicates whether the use of a theoretically grounded path model improves (or at least does not worsen) the predictive performance of the indicators at hand. In this study, the RMSE showed a lower prediction error for the reflective indicators Chi-square Delta and K-S Delta than the LM as recommended by the literature. The changes in Benfordness captured in this study had out-of-sample predictive power. Table 13 includes all results on the out-of-sample prediction power assessment.
Based on the evaluation of the structural and measurement models, we can confirm the two hypotheses: H1, pandemic growth reduces Benfordness, and H2, pandemic growth in the initial phase determines future growth.

4. Conclusions

4.1. Findings

Initial exponential growth in the first nine months of the global pandemic explains the overall progress in line with BL and the future development of the pandemic.
We face an emerging question: why it is that the initial growth can explain changes in Benfordness? Logic suggests that BL non-compliance worsens when local authorities are confronted with uncontrolled epidemic growth in the early stages of the pandemic. These decision-makers may have lowered the number of COVID-19 incidents, which would ultimately result in a reverse development of Benfordness.
Our results confirm that the leading violators of the BL law showed similar behaviors in previous studies. Notably, Belarus and Iran-the top BL law violators-demonstrated the widest distance from BL frequencies in previous studies, based on at least one of the statistical tests. COVID-19 has exacerbated existing and, in some cases, deep-rooted political, economic, social, and security problems in those countries. Many of the challenges have troubled social cohesion in these countries, such as in Iraq as reported by the United Nations (https://reliefweb.int/report/iraq/impact-covid-19-social-cohesion-iraq; last accessed on 11 September 2021). On 16 March 2020, Alexander Lukashenko, the president of Belarus, denied the threat of Coronavirus. He called for people to work in the fields and drive tractors to overcome the pandemic: “You just have to work, especially now, in a village. There the tractor will heal everyone. The fields will cure everyone.” [36]. Ali Khamenei, the Supreme Leader of the Islamic Republic, downplayed the threat of the Coronavirus, banned vaccines from the United States and the United Kingdom, and expelled Médecins Sans Frontières, who provided pro bono health services to Iranians. The British BBC news channel reported in early August 2021 that the numbers of deaths and new cases in Iran were nearly triple and double the official figures, though [37]. The pandemic emerged at a time when public faith in the clerical regime was at a low ebb. Iran was amid economic turmoil, and regular protests were erupting throughout the country.
The vast majority of the countries showing significant BL improvements, such as Israel, the United Kingdom, or Australia, had an average growth ratio greater than or equal to 1.00 during the first nine months of the pandemic. According to the Johns Hopkins University Global Health Risk Index (GHRI) [38], these countries have vibrant public health capabilities to adequately react to epidemic outbreaks. The correlation analysis between the GHRI scores and the change in Benfordness showed no statistically significant relationship in this context. We conclude that higher GHRI scores do explain changes in BL conformity. Not unexpectedly, the observed cases of BL compliance, such as the United Kingdom, Israel, Australia, and Germany, adopted advanced vaccination programs and even performed a seminal role in cutting-edge research into COVID-19 vaccine development. Complementing previous research, our findings addressed the critical determinants of data reliability [2,3,4,5,6,7,8,9]. Figure 3 shows all countries based on their changes in Benfordness as measured by the metrics stated. Russia, Switzerland, and Iraq have demonstrated notable reverse development in BL conformity based on Δ KS ,   Δ d ,   and   Δ Chi .
Overall, the statistical tests conducted in this study support both proposed hypotheses. The moderate predictive power of the out-of-sample changes in Benfordness strongly suggested the potential applicability of the propounded theory in any future cases of the pandemic disease data. Given the evidence provided by our research, policymakers should give due consideration to the pandemic growth trajectory in countries affected by infectious diseases. To implement effective public policies to decelerate the outbreak, policymakers and scientists need to scrutinize epidemic data regarding anomalies in the logistic growth rates. Inconsistent growth ratios from outlier territories might signal poor conformity of pandemic data to BL.

4.2. Limitation

Notable cases in our study indicate that improvement in Benfordness depends on logistic growth in the initial phases of the pandemics. According to BL, we did not identify any qualitative factors that led to a larger distance from the expected frequency of the leading digits. In addition, we did not explore the different variants of COVID-19, i.e., alpha, beta, gamma, and delta, with particular attention to their transmissibility, possibly leading to the progression of the Coronavirus outbreak.

Author Contributions

Conceptualization, N.F.; methodology, N.F.; software, N.F.; validation, H.L.; formal analysis, N.F.; investigation, H.L.; resources, H.L.; data curation, N.F.; writing—original draft preparation, N.F.; writing—review and editing, N.F., H.L.; visualization, N.F.; supervision, H.L.; project administration, N.F., H.L.; funding acquisition, not applicable. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Our data on confirmed cases come from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (see https://systems.jhu.edu, last accessed on 21 August 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. Coronavirus Disease (COVID-19) Outbreak. Geneva: WHO. 2020. Available online: https://www.who.int/emergencies/diseases/novel-coronavirus-2019 (accessed on 13 June 2021).
  2. Koch, C.; Okamura, K. Benford’s Law and COVID-19 Reporting. 2020. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3586413 (accessed on 20 August 2021).
  3. Idrovo, A.J.; Manrique-Hernandez, E.F. Data Quality of Chinese Surveillance of 270 COVID-19: Objective Analysis Based on WHO’s Situation Reports. Asia. Pac. J. Public Health 2020, 32, 165–167. [Google Scholar] [CrossRef]
  4. Wie, A.; Vellwock, A.E. Is COVID-19 Data Reliable? A Statistical Analysis with Benford’s Law. 2020. Available online: https://www.researchgate.net/publication/344164702_Is_COVID-19_data_reliable_A_statistical_analysis_with_Benford%27s_Law (accessed on 13 June 2021).
  5. Lee, K.-B.; Han, S.; Jeong, Y. COVID-19 Flattening the Curve, and Benford’s Law. Phys. A Stat. Mech. Its Appl. 2020, 559, 125090. [Google Scholar] [CrossRef]
  6. Isea, R. How Valid are the Reported Cases of People Infected with Covid-19 in the World? Int. J. Coronaviruses 2020, 1, 53–56. [Google Scholar] [CrossRef]
  7. Sambridge, M.; Jackson, A. National COVID numbers—Benford’s law looks for errors. Nature 2020, 581, 384–385. [Google Scholar] [CrossRef]
  8. Farhadi, N. Can we rely on Covid-19 data? An assessment of data from over 200 countries. Sci. Prog. 2021, 104, 1–19. [Google Scholar] [CrossRef] [PubMed]
  9. Farhadi, N.; Lahooti, H. Are COVID-19 Data Reliable? A Quantitative Analysis of Pandemic Data from 182 Countries. COVID 2021, 1, 137–152. [Google Scholar] [CrossRef]
  10. Newcomb, S. Note on the Frequency of Use of the Different Digits in Natural 242 Numbers. Am. J. Math. 1881, 4, 39–40. [Google Scholar] [CrossRef] [Green Version]
  11. Benford, F. The Law of Anomalous Numbers. Proc. Am. Philos. Soc. 1938, 78, 551–572. [Google Scholar]
  12. Durtschi, C.; Hillison, W.; Pacini, C. The Effective Use of Benford’s law to Assist in Detecting Fraud in Accounting Data. J. Forensic Account. 2004, 5, 17–34. [Google Scholar]
  13. Grammatikos, T.; Papanikolaou, N.I. Applying Benford’s law to Detect Accounting 250 Data Manipulation in the Banking Industry. J. Financ. Serv. Res. 2020, 59, 115–142. [Google Scholar] [CrossRef]
  14. Roukema, B.F. A first-digit anomaly in the 2009 Iranian presidential election. J. Appl. Stat. 2014, 41, 164–199. [Google Scholar] [CrossRef] [Green Version]
  15. Castorina, P.; Iorio, A.; Lanteri, D. Data analysis on CoronavirusCoronavirus spreading by macroscopic growth Laws. Int. J. Mod. Phys. C 2020, 31, 2050103. [Google Scholar] [CrossRef]
  16. Pelinovsky, E.; Kurkin, A.; Kurkina, O.; Kokoulina, M.; Epifanova, A. Logistic equation and COVID-19. Chaos Solitons Fractals 2020, 140, 110241. [Google Scholar] [CrossRef]
  17. World Health Organization. Severe Acute Respiratory Syndrome (SARS). Available online: https://www.who.int/health-topics/severe-acute-respiratory-syndrome#tab=tab_1 (accessed on 4 August 2021).
  18. Ma, J. Estimating epidemic exponential growth rate and basic reproduction number. Infect. Dis. Model. 2020, 5, 129–141. [Google Scholar] [CrossRef] [PubMed]
  19. US Department of Health & Human Services. 1918 Pandemic (H1N1 Virus). Available online: https://www.cdc.gov/flu/pandemic-resources/1918-pandemic-h1n1.html (accessed on 4 August 2021).
  20. Sanderson, G. Exponential Growth and Epidemics. 2020. Available online: https://www.youtube.com/watch?v=Kas0tIxDvrg (accessed on 3 September 2021).
  21. Putz, C. If Only It Were That Easy: Tajikistan Declares Itself COVID-19 Free. The Diplomat. 2021. Available online: https://thediplomat.com/2021/01/if-only-it-were-that-easy-tajikistan-declares-itself-covid-19-free (accessed on 13 June 2021).
  22. Vector, D. What’s Happening in Belarus? Here Are the Basics. New York Times. 2021. Available online: https://www.nytimes.com/2021/05/26/world/europe/whats-happening-in-belarus.html (accessed on 13 June 2021).
  23. Deutsche Welle. Why Bangladesh is No Longer Fear the Coronavirus. 2021. Available online: https://www.dw.com/en/bangladesh-coronavirus-no-fear/a-55091050 (accessed on 13 June 2021).
  24. Yackley, A.J. Dollar Blow for Turkey as Tourism Season Runs into the Sand. Financial Times. 2021. Available online: https://www.ft.com/content/f7f4f65f-400d-437d-9ffa-e50fec485942 (accessed on 13 June 2021).
  25. Jarvis, S.B.; MacKenzie, S.B.; Podsakoff, P.M. A critical review of construct indicators and measurement model misspecification in marketing and consumer research. J. Consum. Res. 2003, 30, 199–218. [Google Scholar] [CrossRef]
  26. Hair, J.F.; Risher, J.J.; Sarstedt, M.; Ringle, C.M. When to use and how to report the results of PLS-SEM. Eur. Bus. Rev. 2019, 31, 2–24. [Google Scholar] [CrossRef]
  27. Hult, J.F.; Ringle, C.M.; Sarstedt, M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM); Sage Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
  28. Rönkkö, M.; Cho, E. An Updated Guideline for Assessing Discriminant Validity. Organ. Res. Methods 2020, 1094428120968614. [Google Scholar] [CrossRef]
  29. Brown, W. Some experimental results in the correlation of mental abilities. Br. J. Psychol. 1910, 3, 296–322. [Google Scholar] [CrossRef]
  30. Reinartz, W.J.; Haenlein, M.; Henseler, J. An empirical comparison of the efficacy of covariance-based and variance-based SEM. Int. J. Res. Mark. 2009, 26, 332–344. [Google Scholar] [CrossRef] [Green Version]
  31. Hair, J.F.; Hult, G.T.M.; Ringle, C.M.; Sarstedt, M.; Thiele, K.O. Mirror, Mirror on the wall: A comparative evaluation of composite-based structural equation modeling methods. J. Acad. Mark. Sci. 2017, 45, 616–632. [Google Scholar] [CrossRef]
  32. Jöreskog, K.G. Simultaneous factor analysis in several populations. Psychometrika 1971, 36, 409–426. [Google Scholar] [CrossRef]
  33. Chin, W. Issues and Opinion on Structural Equation Modeling. Manag. Inf. Syst. Q. 1988, 22, 1–11. [Google Scholar]
  34. Voorhees, C.M.; Brady, M.K.; Calantone, R.; Ramirez, E. Discriminant validity testing in marketing: An analysis, causes for concern, and proposed remedies. J. Acad. Mark. Sci. 2016, 44, 119–134. [Google Scholar] [CrossRef]
  35. Shmueli, G.; Ray, S.; Velasquez Estrada, J.M.; Shatla, S.B. The elephant in the room: Evaluating the predictive performance of PLS models. J. Bus. Res. 2016, 69, 4552–4564. [Google Scholar] [CrossRef]
  36. Chornokondratenko, M. Treated Like a ‘Toy’: Another Belarusian Athlete on Life under Lukashenko. 2021. Available online: https://www.reuters.com/world/europe/treated-like-toy-another-belarusian-athlete-life-under-lukashenko-2021-08-09 (accessed on 3 September 2021).
  37. BBC. Coronavirus: Iran Cover-Up of Deaths Revealed by Data Leak. Available online: https://www.bbc.com/news/world-middle-east-53598965 (accessed on 15 August 2021).
  38. Johns Hopkins University. Global Health Security Index. 2019. Available online: https://www.ghsindex.org/ (accessed on 13 June 2021).
Figure 1. Clustering most notable cases.
Figure 1. Clustering most notable cases.
Covid 01 00031 g001
Figure 2. Simultaneous system and hypotheses.
Figure 2. Simultaneous system and hypotheses.
Covid 01 00031 g002
Figure 3. Change in Benfordness.
Figure 3. Change in Benfordness.
Covid 01 00031 g003
Table 1. Prior research into COVID-19 data reliability measured by Benford’s law.
Table 1. Prior research into COVID-19 data reliability measured by Benford’s law.
ResearcherVariablesDeadlineNumber of Countries
Idrovo and Manrique-HernándlockezConfirmed cases, suspected cases, and deaths, cumulated confirmed cases, and cumulated confirmed deaths21 January 2020–15 March 20201
Koch and OkamuraDaily cases, deaths20 January 2020–10 April 20203
Lee, Han, and JeongDaily cases, deaths22 January 2020–6 April 202010
Wei and VellwockDaily cases, deathsNot stated–1 September 202020
IseaDaily cases, deaths29 December 2019–30 April 202023
Jackson and Sambridge Cumulated confirmed cases and deaths16 January 2020–9 April 202051
FarhadiDaily cases, deaths, tests31 December 2019–24 September 2020182
Farhadi and LahootiDaily cases, deaths, tests, vaccination21 January 2020–6 June 2021154
Table 2. Benford’s law distribution of the first digit.
Table 2. Benford’s law distribution of the first digit.
First Digit123456789
Benford’s frequency0.3010.1760.1250.0970.0790.0670.0580.0510.046
Table 3. Growth and volatility of top BL compliant and incompliant nations.
Table 3. Growth and volatility of top BL compliant and incompliant nations.
LocationTotal GrowthTotal STDEVPhase1 GrowthSTDEV Phase1Phase2 GrowthSTDEV Phase2
Afghanistan1.382.141.482.851.291.29
Germany1.371.591.341.631.401.55
Australia1.261.511.302.021.220.90
Israel1.252.161.231.431.262.62
Belgium1.100.581.110.671.090.50
Pakistan1.091.011.211.511.000.21
Kuwait1.060.541.110.781.020.18
Turkey1.050.591.090.821.020.29
Netherlands1.040.311.070.421.010.18
Bangladesh1.040.271.070.351.020.19
Iraq1.040.361.070.511.010.16
Russia1.040.491.080.741.000.05
Indonesia1.040.351.070.491.010.16
Iran1.030.211.060.301.010.09
Belarus1.030.671.040.981.020.27
Tajikistan0.960.280.960.280.950.26
Sweden0.951.191.211.430.570.48
Table 4. Correlation of Benfordness, periodic growth ratios, and volatility.
Table 4. Correlation of Benfordness, periodic growth ratios, and volatility.
d*CHIK-SGrowthPhase1Phase2 δ All δ I δ II
d-Factor100%
CHI49%100%
K-S73%45%100%
Growth12%58%15%100%
Phase126%75%28%78%100%
Phase27%46%10%98%64%100%
δ All 9%53%9%98%70%99%100%
δ I 21%76%19%78%96%65%73%100%
δ II 8%50%8%98%67%99%100%69%100%
Table 5. Descriptive statistics of variables and indicators.
Table 5. Descriptive statistics of variables and indicators.
VariableDefinitionNMeanStd. DeviationMinimumMaximum
d-Deltad* improvement between Phase One and Phase Two
Δ d = τ B τ A
τ B : d-factor for Period B from 21 January 2020 to 6 June2021;
τ A : d-factor for Period A from 31 December 2019 to 24 September 2020
1740.8900.4420.2602.537
1020.9720.4910470.2932.538
KS-DeltaK-S change between Phase One and Phase Two
Δ KS = τ B τ A
τ B : K-S statistic for the period B from 21 january 2020 to 6 June 2021
τ A : K-S statistic for period A from 31 December 2019 to 24 September 2020
1744.3776.52300.24836.530
10224.99014.6256.41071.117
CHI-DeltaChi-square change between Phase One and Phase Two
Δ CHI = τ B τ A
τ B : Chi-square for Period B from 21 January 2020 to 6 June 2021
τ A : Chi-square for Period A from 31 December 2019 to 24 September 2020
1744.78213.0630.006137.382
1046.25616.7300.006137.382
AGRP1 g ¯ d N i = 1 n i = 1 n d N i d N i 1
Average growth ratio for the period from 21 January 2020 to 24 September 2020
1761.2020.7980.05610.546
1021.3110.9520.18810.546
AGRP2 g ¯ d N i = 1 n i = 1 n d N i d N i 1
Average growth ratio for the period from 25 September 2020 to 6 June 2021
1761.2421.6680.00017.901
1021.3822.1540.00017.901
Table 6. Covid-19 data assessment.
Table 6. Covid-19 data assessment.
LocationSample SizeCHI-DeltaKS-Deltad-DeltaAGRP1AGRP2
Afghanistan8141.012.510.401.481.29
Albania12810.033.950.011.201.04
Algeria8970.412.280.181.111.01
Andorra3940.622.810.221.830.95
Angola6772.673.390.791.261.18
Antigua and Barbuda1675.605.760.970.120.59
Argentina13690.672.360.281.091.04
Armenia106311.012.933.761.211.13
Australia8780.051.770.031.301.22
Austria12651.652.370.701.281.02
Azerbaijan88811.012.524.371.031.10
Bahamas3681.142.360.491.140.79
Bahrain11151.002.490.401.441.02
Bangladesh13324.112.371.731.071.02
Barbados3180.973.830.250.741.14
Belarus9167.032.223.171.041.02
Belgium13413.572.211.621.111.09
Belize3950.564.110.140.791.00
Benin1682.161.661.300.370.06
Bhutan6361.218.050.150.701.25
Bolivia12676.212.302.701.171.06
Bosnia and Herzegovina10301.132.970.381.200.83
Botswana1800.603.460.170.080.05
Brazil8884.312.251.921.161.09
Bulgaria11945.122.581.981.321.46
Burkina Faso4643.252.381.371.331.06
Burundi2553.724.400.850.481.45
Cambodia3001.155.360.210.920.96
Cameroon2470.081.430.061.060.20
Canada13735.842.372.471.271.17
Cape Verde8830.204.310.051.671.23
The Central African Republic2111.631.561.041.070.45
Chad4683.702.941.261.191.35
Chile13100.542.290.241.071.03
China57210.191.466.981.531.15
Colombia12470.702.150.321.111.02
Comoros2430.134.760.030.492.51
Congo11451.762.170.810.190.00
Costa Rica10811.512.250.671.200.70
Cote d’Ivoire9859.452.403.941.201.78
Croatia12161.542.440.631.131.31
Cuba11170.592.540.231.231.07
Cyprus102916.565.443.041.221.06
Dem. Rep. of Congo9924.152.401.731.341.26
Denmark12561.112.420.461.7617.90
Djibouti4323.322.531.321.331.66
Dominican Republic120511.412.215.171.101.14
Ecuador12841.082.310.471.521.70
Egypt9010.662.350.281.281.01
El Salvador11554.222.541.661.060.88
Equatorial Guinea1550.722.770.260.200.02
Eritrea1800.284.000.070.150.81
Estonia11450.912.680.341.391.11
Ethiopia12222.542.630.971.201.05
Finland11215.572.322.401.331.05
France1281137.382.4755.6610.5514.79
Gabon2931.562.060.760.590.04
The Gambia6142.095.200.400.680.94
Georgia7403.403.560.961.321.22
Germany9117.442.233.331.341.40
Ghana7938.422.293.670.910.73
Greece12332.022.540.791.511.07
Guatemala129511.213.822.931.331.49
Guinea10443.852.531.520.971.10
Guinea-Bissau2092.892.401.200.750.93
Guyana5771.163.140.371.201.49
Haiti4401.381.810.761.400.84
Honduras80310.532.274.631.331.12
Hong Kong5742.512.231.121.191.24
Hungary12997.192.472.911.211.05
Iceland7975.852.182.691.110.91
India13201.202.300.521.211.00
Indonesia12421.052.300.461.071.01
Iran11270.122.060.061.051.01
Iraq123116.202.406.751.071.01
Ireland12085.312.352.261.351.04
Israel13350.212.240.101.231.26
Italy14011.132.390.471.061.02
Jamaica9119.204.382.101.141.15
Japan14033.993.321.201.091.05
Jordan9041.354.280.321.231.01
Kazakhstan11600.512.530.201.542.65
Kenya9990.502.130.231.131.21
Kosovo6130.712.050.351.530.63
Kuwait12361.492.470.601.111.02
Kyrgyzstan67715.502.576.021.450.99
Latvia11540.662.770.241.661.19
Lebanon8076.622.782.381.361.04
Lesotho2411.983.950.500.291.82
Liberia2811.121.460.771.290.53
Libya10321.712.610.661.150.88
Liechtenstein6861.5013.720.110.471.62
Lithuania11900.383.250.121.301.08
Luxembourg10276.302.312.721.390.84
Macedonia1273---1.390.84
Madagascar6252.132.470.861.320.93
Malawi73910.132.144.731.701.28
Malaysia11950.322.610.121.401.05
Maldives9360.922.570.361.261.12
Mali6410.982.680.371.601.54
Malta10811.862.770.671.351.11
Mauritania6213.144.780.661.331.21
Mauritius1880.923.620.251.080.98
Mexico14120.372.180.171.081.16
Moldova88414.042.375.921.121.10
Monaco3351.384.040.340.851.31
Mongolia5281.568.380.191.051.16
Montenegro7074.333.291.321.231.03
Morocco13080.202.400.081.211.20
Mozambique10465.672.832.011.461.11
Myanmar8977.565.861.291.381.10
Namibia9511.913.370.561.431.21
Nepal11701.402.610.541.031.04
Netherlands8896.362.382.671.071.01
New Zealand7115.322.012.650.951.38
Nicaragua1321.432.130.670.060.00
Niger4231.632.600.630.941.26
Nigeria9220.892.020.441.161.17
Norway10660.652.370.271.271.07
Oman6953.121.931.611.010.63
Pakistan128610.482.294.581.211.00
Palestine10251.773.960.451.250.99
Panama13276.912.303.001.831.04
Papua New Guinea2120.553.590.151.200.61
Paraguay12280.192.610.071.491.04
Peru121653.132.1624.561.150.71
Philippines13293.802.381.601.281.04
Poland12961.422.410.591.081.05
Portugal13430.532.260.231.071.07
Qatar11141.512.260.671.231.30
Romania12510.212.200.101.111.04
Russia129730.112.2813.231.081.00
Rwanda9171.952.450.801.431.25
Saint Lucia2361.0814.750.070.270.91
St Vincent and Grenadines1621.045.590.190.270.91
San Marino2901.102.590.431.530.93
Sao Tome and Principe2791.722.760.623.061.59
Saudi Arabia13451.632.310.711.131.01
Senegal12192.052.340.881.171.15
Serbia13180.463.600.131.081.01
Seychelles1732.447.520.320.411.25
Sierra Leone3990.751.950.391.481.15
Singapore5161.941.990.971.061.22
Slovakia11500.252.990.081.911.31
Slovenia12310.222.780.081.221.26
Somalia3311.742.140.820.851.12
South Africa13072.412.321.041.001.03
South Korea12990.102.250.041.001.03
South Sudan3843.063.001.021.001.03
Spain6900.071.950.041.020.64
Sri Lanka11050.572.700.211.681.17
Sudan8472.002.490.800.850.99
Suriname521---1.221.43
Sweden9192.081.941.071.210.57
Switzerland113227.691.9913.941.140.70
Syria70410.733.762.851.101.01
Taiwan8530.332.340.140.751.37
Tajikistan3013.011.661.810.960.95
Thailand10280.742.680.281.341.29
Timor1392.1111.580.180.791.00
Togo95181.662.3734.431.521.18
Trinidad and Tobago8533.826.370.601.331.56
Tunisia7410.471.790.271.220.98
Turkey13252.712.451.111.091.02
Uganda8194.603.161.451.081.12
Ukraine11780.682.340.291.121.03
UAE13130.982.340.421.041.03
UK13580.012.290.001.041.03
USA13896.561.873.511.041.03
Uruguay10130.372.800.131.431.12
Uzbekistan6551.232.220.561.201.06
Venezuela7931.392.880.481.181.00
Vietnam4521.072.690.401.181.89
Yemen5480.702.080.341.400.93
Zambia10100.012.900.001.761.22
Zimbabwe10043.712.991.241.441.39
Table 7. Outer loadings.
Table 7. Outer loadings.
SampleChi-SquareK-Sd-Factor
176 countries, full sample0.920
p-Value: 0.000
0.760
p-Value: 0.000
0.767
p-Value: 0.000
102 countries with significant data1.000
p-Value: 0.000
1.000
p-Value: 0.000
1.000
p-Value: 0.000
Table 8. Assessment of the measurement model, Change in Benfordness.
Table 8. Assessment of the measurement model, Change in Benfordness.
SampleCronbach’s AlphaComposite ReliabilityAVE
176 countries, total sample0.8081.2130.858
102 countries with a large sample size1.0001.0001.000
Table 9. Discriminant validity by way of the Heterotrait–Monotrait Ratio (HTMT).
Table 9. Discriminant validity by way of the Heterotrait–Monotrait Ratio (HTMT).
ConstructSizeChange in BenfordnessGrowth Ratio Phase OneGrowth Ratio Phase Two
Change in Benfordness 176000
102000
Growth Ratio Phase One1760.71000
1020.76800
Growth Ratio Phase Two1760.4610.6230
1020.4640.6610
Table 10. Collinearity statistics VIF.
Table 10. Collinearity statistics VIF.
Items176 Countries (Full Sample)102 Countries with a Large Sample Size
CHIDelta1.3591.000
KSDelta2.653-
Phase11.0001.000
Phase21.0001.000
dDelta2.724-
Table 11. Coefficient of determination, R2.
Table 11. Coefficient of determination, R2.
SampleR2
176 countries (total sample)
Change in Benfordness 0.370
Growth Ratio Phase Two0.388
102 countries with a large sample size
Change in Benfordness 0.590
Growth Ratio Phase Two 0.437
Table 12. Path coefficient analysis.
Table 12. Path coefficient analysis.
NOriginal Sample (O)Sample Mean (M)Standard Deviation (STDEV)T Statistics (|O/STDEV|)p Values
Growth Ratio Phase One -> Change in Benfordness 1760.6090.5030.2362.5820.01
1020.6230.570.2662.3440.019
Growth Ratio Phase One -> Growth Ratio Phase Two1760.7680.550.3871.9860.048
1020.6610.6430.2462.6870.007
Table 13. Out-of-sample prediction power assessment.
Table 13. Out-of-sample prediction power assessment.
ItemsRMSE
LMPLS
KSDelta13.66213.343
dDelta0.4370.443
CHIDelta12.45811.681
Phase21.4791.479
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Farhadi, N.; Lahooti, H. Pandemic Growth and Benfordness: Empirical Evidence from 176 Countries Worldwide. COVID 2021, 1, 366-383. https://doi.org/10.3390/covid1010031

AMA Style

Farhadi N, Lahooti H. Pandemic Growth and Benfordness: Empirical Evidence from 176 Countries Worldwide. COVID. 2021; 1(1):366-383. https://doi.org/10.3390/covid1010031

Chicago/Turabian Style

Farhadi, Noah, and Hooshang Lahooti. 2021. "Pandemic Growth and Benfordness: Empirical Evidence from 176 Countries Worldwide" COVID 1, no. 1: 366-383. https://doi.org/10.3390/covid1010031

Article Metrics

Back to TopTop