Introduction

During December 2019, several cases of pneumonia of an unknown aetiology were reported in Wuhan, a city within the Hubei province of China (Gao et al.2020). Within a week, investigators found that the initial cases were all associated with a seafood market where live poultry and wild animals were being sold (Whitworth 2020). Since then, the disease has been registered, and become known, as the coronavirus or COVID-19 which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). This disease has shown that in early stages of infection, symptoms of severe acute respiratory infection can occur. These may include a cough, fever, and shortness of breath (Munthali and Xuelian 2020). Some patients may then develop acute respiratory distress syndrome (ARDS) and other serious complications which may potentially lead to multiple organ failure (Gao et al.2020). Since mid-December, COVID-19 has spread to all seven continents, increasing its prevalence throughout the entire world, and was declared a pandemic by the World Health Organization (WHO) on the 11th of March (Ghebreyesus 2020). This rapid spread has been fuelled by the fact that the majority of infected people do not experience severe symptoms, which makes it more likely to remain mobile and hence infect others (Qiu 2020). The transmission primarily occurs through contact from person to person, coughing or sneezing and touching of contaminated surfaces (Guarner 2020).

On the 14th of February, the first case of COVID-19 was reported in Africa, in the City of Cairo, by the Egyptian Ministry of Health and Population. The first South African case was confirmed by The National Institute for Communicable Diseases (NICD) on the morning of the 5th of March 2020. Since then, the number of infections and deaths has risen drastically. President Cyril Rhamaphosa was praised by the director-general of the WHO, Dr Tedros Adhanom Ghebreyesus, for his leadership and approach to protecting South Africans during these trying times. The British Broadcasting Corporation (BBC) also commended President Cyril Ramaphosa for his leadership and for South Africa’s “ruthlessly efficient” response to the coronavirus (Pitjeng 2020). On the 15th of March, President Cyril Ramaphosa declared a national state of disaster, the terms of the Disaster Management Act which enable the focus to be put on preventing and reducing the risk of the virus spreading (SANews 2020), and only a few days later on the 23rd of March, the President declared a national lockdown commencing on midnight of the 26th of March.

In South Africa, these extreme measures are absolutely necessary, as the country contains a high-risk population combined with low-income country characteristics (Schlüter et al. 2021). The main concerns, which are thought to escalate the spread of the coronavirus, are the large and densely populated areas and townships, including a high level of poverty and movement within these areas. Combined with existing epidemics such as the human immunodeficiency virus (HIV), tuberculosis (TB), and malaria, this might lead to an increase in morbidity and mortality (Chen et al. 2020 and Mousquer et al. 2020). Since the wide spread of non-communicable diseases, such as chronic obstructive pulmonary disease (COPD), heart disease, hypertension, and diabetes, in Africa which are known risk factors for severe cases of COVID-19, these may also increase the death rate in these lower-income countries (Thienemann et al. 2020). As winter is approaching, overcrowded houses and the large immunocompromised population will contribute to the increase in the number of COVID-19 cases (World Health Oranization 2020).

Olivier et al. (2020) showed that what the optimal lockdown level in South Africa should be given the policy that is in place, as determined by the closed-loop feedback controller. In this study, we make use of the global Moran index to spatially identify the spread in South Africa with respect to the provinces for the COVID-19 infections. Moreover, it reveals whether there is a significant spatial association between the provinces or the number of COVID-19 cases is scattered uniformly on South Africa. With this information, South Africa will be better able to predict local outbreaks and develop public health policies to better manage and update medical procedures currently set in place. Furthermore, a generalized logistic model of the growth trend will be employed to show the difference between the hotspot areas and the areas outside of it. With the continuing growth and development of COVID-19 in South Africa, this analysis might be helpful to guide political leaders and health authorities to manage the allocation of resources and prepare for future virus control.

The effect of COVID-19 is still in early stages in South Africa, but different tendencies have already been observed when compared to the USA and other European countries (World Health Organization 2020). Understanding these tendencies will be very important in guiding the fight against COVID-19 in South Africa as well as the rest of Africa. This is an initial study from which many other interesting studies will follow and it will be very important to continue with analyses as more cases are reported and more data becomes available. South Africa, with the most confirmed COVID-19 cases, will need to be the leader in guiding the fight against COVID-19 in Africa.

Study area and materials

South Africa, formally known as the Republic of South Africa, is situated at the southernmost tip of Africa and covers a surface area of 1,219,602 km. With a coastline stretching more than 3000 km from the desert border of Namibia touching the Atlantic Ocean, around the tip of Africa to the northern bordered of Mozambique on the Indian Ocean side. South Africa shares common boundaries with Namibia, Botswana, Zimbabwe, and Swaziland, with the Mountain Kingdom of Lesotho landlocked by SA. The Prince Edward and Marion islands lie some 1,920 km south-east of Cape Town (Tibane 2019).

With a population of more than 59 million, South Africa is the world’s 25th most populated nation consisting of nine different provinces. South Africa has three designated enhanced risk of COVID-19 cities; executive Pretoria, judicial Bloemfontein, and legislative Cape Town. The largest city and main economic hub is Johannesburg, which is also the main entry point for visitors from other countries via OR Thambo International Airport (Tibane 2019).

The following timeline of the major interventions and the statistics of the COVID-19 outbreak in South Africa are shown in Figs. 1 and 2, respectively.

Fig. 1
figure 1

Timeline of major interventions

Fig. 2
figure 2

Covid-19 epidemic in South Africa

The Data Science for Social Impact research (DSFSI) group at the University of Pretoria captures the COVID-19 data, number of cases, on national and provincial level. Missing values and anomalies in the provincial data are adjusted or imputed using data from the University of Cape Town COVID-19 dashboard.

As it is stated in Ribeiro et al. (2020) and Souch and Cossman (2020), results for the USA indicate that more rural states have lower testing rates and detect disproportionately fewer cases of COVID-19. Thus, distances between the provinces of South Africa are determined using the cities with enhanced risk of COVID-19 in each province. The mentioned cities are indicated in Table 1.

Table 1 The cities with enhanced risk of COVID-19 in each province of South Africa

Methods

For investigating the spatial correlation between the 9 provinces, Northern Cape, Eastern Cape, Free State, Western Cape, Limpopo, North West, KwaZulu-Natal, Mpumalanga, and Gauteng in South Africa, we use the Moran’s autocorrelation coefficient, also known as the Moran index (denoted by \(\mathcal{I}\)) in geographic health science. Furthermore, we make use of generalized logistic function (GLF) for identifying an appropriate growth curve of COVID-19. Hence, this section is devoted to the definition of Moran index (Moran’s \(\mathcal{I}\)) as well as the GLF.

Spatial correlation coefficient

The Moran index, originally defined by Moran (2018), is a measure of spatial association or spatial autocorrelation which can be used to find spatial hotspots or clusters and is available in many software applications. This index has been defined as the measure of choice for scientists, specifically in environmental sciences, ecology, and public health (Bivand and Wong 2018). Some other indices include the Getis’' G index, Geary’s C, local Ii and Gi, spatial scan statistics, and Tango’s C index (Zhang et al. 2008; Bivand and Wong 2018). The Moran index has both a local and global representation. The global Moran’s \(\mathcal{I}\) is a global measure for spatial autocorrelation, while the local Moran’s \(\mathcal{I}\) index examines the individual locations, enabling hotspots to be identified based on comparisons to the neighbouring locations (Zhang et al. 2008). This local Moran’s \(\mathcal{I}\) has been successfully applied to hotspot identification for infection clusters such as those investigated by Milne et al. (2019), who researched the bovine tuberculosis breakdowns (bTB) in Northern Irish cattle herds in order to access the spatial association in the number and prevalence of chronic bTB across Northern Ireland. Other areas where this index has been successfully applied and commonly used are diseases, mortality rates, environmental planning, and environmental sciences. It’s important to note that the result can be affected by the definition of the weight function, data transformation, and existence of outliers (Zhang et al. 2008).

Until now, not many COVID-19 related research has made use of the Moran index and no research was found for South African specific cases at provincial level. Some studies that include the use of this index are (1) Kang et al. (2020) who explored the spatial epidemic dynamic of COVID-19 in mainland China in order to determine whether a spatial association of the COVID-19 infection existed; (2) Huang et al. (2020) who applied the Moran index to a spatial panel which showed that COVID-19 infection is spatially dependent and mainly spread from Hubei Province in Central Chine to neighbouring areas; (3) Chiyomaru and Takemoto (2020) who used a global dataset of COVID-19 cases as well as a global climate database and investigated how climate parameters could contribute to the growth rate of COVID-19 cases while simultaneously controlling for potential confounding effects using spatial analysis; (4) Dahlberg et al.(2020) who used data on all mobile phone users to examine the impact of the Coronavirus outbreak under the Swedish mild recommendations and restrictions regime on individual mobility and if the changes in geographical mobility vary over different socioeconomic strata; and (5) Adegboye et al. (2020) who investigated the influence of spatial proximities and travel patterns from Italy on the further spread of the SARS-CoV-2 around the globe.

This index is an extension of the Pearson’s product-moment correlation coefficient for spatial pattern recognition. Observations in close proximity are more likely to be similar than those far apart (Cliff and Ord 1981; Paradis 2019). In order to formulate the Moran index for our purpose, assume we have \(d\) provinces and the pair \(({x}_{i},{x}_{j})\) is for the attribute (variable) \(x\) in provinces \(i,j=1,\cdots ,d\), respectively. Then, the spatial weight \({w}_{ij}\) quantifies the level of closeness between \({x}_{i}\) and \({x}_{j}\) and the Moran index is defined by:

$$\mathcal{I}=\frac{d}{{S}_{o}}\times \frac{\sum_{i=1}^{d}{\sum }_{j=1}^{d}{w}_{ij}\left({x}_{i}-\overline{x }\right)\left({x}_{j}-\overline{x }\right)}{{\sum }_{i=1}^{d}{\left({x}_{i}-\overline{x }\right)}^{2}}$$
(1)

where  \(\overline{x }={n}^{-1}{\sum }_{i=1}^{n}{x}_{i}\) and \({S}_{o}={\sum }_{i=1}^{n}{\sum }_{j=1}^{n}{w}_{ij}\); \(i\ne j\).

The Moran’s \(\mathcal{I}\) takes value on \(\left[-1,1\right]\) and its mathematical expectation is \(-\frac{1}{n-1}\). Values which are significantly less than \(-\frac{1}{n-1}\) indicate negative spatial autocorrelation while values which are significantly greater than \(-\frac{1}{n-1}\) indicate positive spatial autocorrelation.

According to Gittleman and Kot (1990), there are two ways to identify the weights. In our context, we identify the \((i,j)\)-th element of the weight matrix \(W\), from taxonomic level classification viewpoint, as:

$${w}_{ij}=\left\{\begin{array}{c}\begin{array}{cc}1 & \mathrm{if the provinces i and j are connected}\end{array}\\ \begin{array}{cc}0& \mathrm{ otherwise}\end{array}\end{array}\right.$$

Using the phylogenetic tree classification (geographical distance), we assign the weights following:

$${w}_{ij}=\left\{\begin{array}{c}\begin{array}{cc}\frac{1}{{D}_{ij}^{\alpha }}& if {D}_{ij}\le \lambda \end{array}\\ \begin{array}{cc}0& if {D}_{ij}>\lambda \end{array}\end{array}\right.$$

where \({D}_{ij}\) is the distance between the province centre \(i\) and province centre \(j\), \(\lambda\) is a distance threshold, and \(\alpha\) is a power level parameter. See Chen (2012) for more detail and comparison between different weights.

Modeling population growth

In this section, we predict the number of infections via the logistic and Gompertz growth curve modeling. The logistic growth model (LGM) is commonly used for dynamic modeling in many branches of science including chemistry, physics, material science, forestry, disease progression, and sociology. For our purpose and generality, we follow the Richards’ differential equation (RDE) due to Richards (1959) given by:

$$\frac{dP(t)}{dt}=\frac{1}{\alpha }\left(1-{\left(\frac{P\left(t\right)}{K}\right)}^{\nu }\right)P(t)$$

with initial condition \(P\left({t}_{o}\right)={P}_{o}\), \(K\) is the carrying capacity, the maximum capacity or total population here, \(\alpha ,\nu >0\) to obtain the generalized logistic curve (GLC):

$$P\left(t\right)=\frac{K}{{\left(1+Q{e}^{-\frac{\nu \left(t-{t}_{o}\right)}{\alpha }}\right)}^{\frac{1}{\nu }}}$$
(2)

with \(Q={\left(\frac{K}{{P}_{o}}\right)}^{\nu }-1\).

The typical logistic curve which is widely used in modeling is the special of the GLC for \(\nu =1\). Furthermore, the Gompertz growth model (GGM) can be obtained for the limiting case \(\upnu \to {0}^{+}\). See Lei and Zhang (2004) for more details and applications of the GLC. In fact, both of the models are sigmoid-shaped and bounded between two asymptotes, but have some different features as well. For instance, the inflection point in the LGM is exactly located at the middle of the asymptotes, and there is a radial symmetry with respect to this point, while these properties do not hold for GGM. Thus, it is recommended that both of them should be fitted on the actual data (Salehi et al.2020).

While only a few studies applied the logistic growth models to COVID-19 specific research questions, only one combined the model with the use of the Moran index to show that the infection is spatially dependent (Huang et al.2020), with no studies for South African data at the provincial level. Some of these studies, which applied only the logistic growth model, include (1) Vattay (2020) who uses the logistic growth equation to describe the process on a macroscopic level and (2) Zou et al. (2020) who reviews the epidemic virus growth and decline curves in China using the phenomenological logistic growth model.

Mathematical description of biological growth (i.e. population models) is very important in many research disciplines. Among these population models, especially noteworthy are the clear analytical solutions of the generalized logistic functions (Szparaga and Kocira 2018).

It is however worthwhile to point that the Gumbel growth model is a useful model for extreme analysis, and according to (Yu et al.2020), it is very competitive to the LGM.

Results and discussions

In this section, we start off with a general inspection on the provincial distribution of the total confirmed and death cases given in Fig. 3. This figure also exhibits the total number recoveries and their ratio (the total number of recoveries divided by the total number of confirmed cases). From Fig. 3, it is observed that the heatmaps of confirmed and death cases agree, and therefore, more confirmed cases are followed by more deaths. Furthermore, it is observed that the hotspots are Eastern Cape, Western Cape, and Gauteng, with the former having the highest risk of infection.

Fig. 3
figure 3

The heatmap of the provincial distribution of the total confirmed cases, deaths, and recoveries of COVID-19 up to December 29 in South Arica

In order to test the spatial autocorrelation of COVID-19 in South Africa, the interaction between provinces is estimated using Moran’s \(\mathcal{I}\) given by Eq. (1) from March 8, 2020, to December 29, 2020, based on both absolute and relative counts. The results are shown in Figs. 4, 5, 6, and 7. As it is observed from these figures, Moran coefficients are mostly significant at the significance level of 5% based on the adjacency 0–1 weight matrix (excluding some time lags), and their values are around the interval of 0–1. Hence, there is a positive correlation among the confirmed cases according to the geographical structure. In other words, the provinces with touching borders have affected positively on each other. Comparatively, no significant spatial correlation is tested out based on spatial geographic distance, which indicates that the spreading direction in South Africa is mainly based on adjacent areas to neighbours, and does not matter how far the distance to the infectious centre. So, adjacent provinces are at higher risk.

Fig. 4
figure 4

The corresponding p-values of Moran’s test over the time, on the basis of the absolute counts

Fig. 5
figure 5

The difference between Moran’s observed and expected values over the time, on the basis of the absolute counts

Fig. 6
figure 6

The corresponding p-values of Moran’s test given over the time, according to the relative counts (absolute counts divided by 1 M residents)

Fig. 7
figure 7

The difference between Moran’s observed and expected values over the time, according to the relative counts

Additional analysis has been taken into account to validate the results based on the Moran index characteristics. Note that the expected value of Moran index is \(-\frac{1}{N-1}=-\frac{1}{8}=-0.125\) in our case. Since the null hypothesis (that states that there is no spatial autocorrelation between the provinces) is not rejected for the geographical distance, we can argue that there is no evidence of negative auto-correlation here, as with random data, you would expect it to be a negative value more often than positive.

The impact of President Cyril Ramaphosa’s decision in containing the outbreak by strict lockdown regulations is supported by the p-values in Fig. 5. However, second wave is observed when the lockdown turned to level 1.

Figure 8 represents the time series plots of the cumulative and daily confirmed cases of COVID-19 in all provinces until December 29. It seems that we have two clusters in the provinces; GP, KZ, WC, and EC have started their second wave from the end of October while the other ones remain the same with no significant increase. In order to remove daily fluctuations and then having more reliable predictive models, we have used the 14-day (bi-weekly) COVID-19 confirmed cases rather than the raw data displayed in Fig. 8. Figures 9, 10, and 11 display observed cumulative confirmed cases, the fitted LGM given by Eq. (2) with \(\nu =1,\) the corresponding 95% confidence interval for each province, and GGM given by Eq. (2) with \(\upnu \to {0}^{+}\). In addition, the accuracy of the models LGM and GGM adopted on the provincial data are tested using three performance criteria, namely the mean absolute error (MAE), the root means square error (RMSE), and the mean absolute percentage error (MAPE) given by:

$$MAE= \frac{1}{m}\sum\nolimits_ {t=1}^{m}{|x}_{t}-{\widehat{x}}_{t}|$$
$$RMSE= \sqrt{\frac{1}{m}\sum\nolimits_ {t=1}^{m}{{(x}_{t}-{\widehat{x}}_{t})}^{2}}$$
$$MAPE= \frac{100}{m}\sum\nolimits_ {t=1}^{m}{|x}_{t}-{\widehat{x}}_{t}|$$

where \({x}_{t}\) and  \({\widehat{x}}_{t}\) are the actual and the predicted values, respectively. The above three criteria are computed using the testing times (the green point appeared in Figs. 9, 10, and 11) and presented in Table 2. As it is observed from Figs. 9, 10, and 11 and Table 2, the growth models suggested fitted well. The high values of the R-squared as well as the low values of MAPE also confirm the goodness of fit of the models for the provinces. Hence, we can rely on predictions given as red and blue lines.

Fig. 8
figure 8

The time series plots of the cumulative and daily confirmed cases of COVID-19 in the provinces of South Africa until December 29

Fig. 9
figure 9

Observed cumulative 14-day confirmed cases (the grey points), the fitted LGM (the red line) given by (2), the fitted GGM (the blue line) given by (3), and the corresponding 95% confidence intervals for EC, FS, and GP provinces

Fig. 10
figure 10

Observed cumulative 14-day confirmed cases (the grey points), the fitted LGM (the red line) given by (2), the fitted GGM (the blue line) given by (3), and the corresponding 95% confidence intervals for KZ, LP, and MP provinces

Fig. 11
figure 11

Observed cumulative 14-day confirmed cases (the grey points), the fitted LGM (the red line) given by (2), the fitted GGM (the blue line) given by (3), and the corresponding 95% confidence intervals for NC, NW, and WC provinces

Table 2 The performance criteria of the growth models LGM & GGM adopted on provinces

Conclusions

Despite the inaccuracies associated with medical predictions, identifying hot spots and logistic modelling is still invaluable for better understanding of the spread in South Africa. The results of the Moran index showed that the impact of President Cyril Ramaphosa’s decision in containing the outbreak by strict lockdown regulations and the inter-provisional travelling prohibition has a positive role in tapering the counts. The results indicated that the spreading direction in South Africa is mainly based on adjacent areas to neighbours, and does not matter how far the distance is to the infectious centre.

The logistic and Gompertz growth models show a good fit to the provincial data, with R-square values above 0.9 and low values of MPAE. With South Africa facing out the strict lockdown as of the beginning of May 2020, implementing the risk-adjusted strategy and economic activity plan, South African workers returned to their workplace and COVID-19 cases rose again. This initial study highlights the importance of continued analysis and showcases the valuable input that can be obtained from these analyses results. As one of the reviewers mentioned, one of the limitations of this work is that the adjacency does not lead to higher risk of infection between provinces. Thus, for future direction of our work, one may be interested in refining the spatial measures to suitably link adjacency with higher risk.