Abstract

The goal of this paper is to develop an optimal statistical model to analyze COVID-19 data in order to model and analyze the COVID-19 mortality rates in Somalia. Combining the log-logistic distribution and the tangent function yields the flexible extension log-logistic tangent (LLT) distribution, a new two-parameter distribution. This new distribution has a number of excellent statistical and mathematical properties, including a simple failure rate function, reliability function, and cumulative distribution function. Maximum likelihood estimation (MLE) is used to estimate the unknown parameters of the proposed distribution. A numerical and visual result of the Monte Carlo simulation is obtained to evaluate the use of the MLE method. In addition, the LLT model is compared to the well-known two-parameter, three-parameter, and four-parameter competitors. Gompertz, log-logistic, kappa, exponentiated log-logistic, Marshall–Olkin log-logistic, Kumaraswamy log-logistic, and beta log-logistic are among the competing models. Different goodness-of-fit measures are used to determine whether the LLT distribution is more useful than the competing models in COVID-19 data of mortality rate analysis.

1. Introduction

Models are at the heart of almost all statistical work. A statistical model is a classification of probability distributions. The distribution family can be parametric, semiparametric, or nonparametric. Parametric models produce more efficient estimates with lower standard errors than nonparametric and semiparametric models [1], more specifically, if the distributional assumption is correct. In general, probability distributions have been widely used to model lifetime data in a variety of fields, particularly biomedical sciences and engineering. Because of the variability of the data, the statistical models chosen have a significant impact on the quality of the modelling in terms of providing the best description of the phenomenon under consideration.

The generated family of distributions has a large influence on the quality of statistical analysis procedures, and much effort has gone into developing new statistical models. There are, however, a number of significant issues with actual data that do not fit into any of the commonly used mathematical models. As a result, the technique of extending a family of distributions by introducing new parameters is acknowledged in the statistical literature.

The log-logistic distribution is an excellent choice for analyzing data with unimodal or decreasing failure rates. However, in a number of situations where data behave monotonically, such as increasing failure rates, or nonmonotonically, such as bathtub- or modified bathtub-shaped failure rates, the log-logistic model is not a good candidate model to use [25]. As a result, new extensions and generalizations to existing models are required for accurate and precise data modelling. As a result, numerous statistical techniques are intended to discover new modifications to classical models in order to achieve a better fit to the data of interest. Other techniques, on the other hand, such as the Sin-G family, Cos-G family, Tan-G family, and Sec-G family, provide a versatile generalization of the existing probability distribution without adding any extra parameters [610]. The techniques that are effectively used to extend the classical distributions without adding extra parameters are termed as “new trigonometric classes of probability distribution” [11].

A random variable X is said to have a log-logistic distribution with shape parameter and scale parameter denoted by X ∼ LLog (), if its cumulative distribution function (cdf) is defined by the following equation:

The probability density function (pdf) is given by

The reliability (survival) function is given by

The failure (hazard) rate function is given by

The reversed hazard rate function (also known as the retro function) is given by

The cumulative hazard rate function is given bywhere is a vector of parameters.

It is well understood that the classical log-logistic distribution fails to capture the accurate phenomenon under investigation in many cases. As a result, several modifications have been proposed and researched. Inducting one or more parameters to the classical log-logistic distribution yields a modified form of the log-logistic distribution. When compared to the classic log-logistic distribution, several of these modified distributions have been found to be more flexible and adaptable of modelling real-life data. An up-to-date survey of recent modifications of the log-logistic distribution can be found in [12].

Most modifications of the log-logistic model in the statistical literature have been derived by adding extra parameters to control the shape of the skewness (or asymmetry) and the kurtosis of the distribution; see the exponentiated LL distribution [13], beta LL distribution [14], gamma LL distribution [15], Marshall–Olkin LL distribution [16], transmuted LL distribution [17], cubic transmuted LL distribution [18], McDonald LL distribution [19], and alpha power transformed LL distribution [20, 21]. Other generalizations and modifications of the log-logistic distribution developed recently can be seen in [22, 23].

The majority of techniques for extending the classical LL model produce a heavy-tailed distribution. Unfortunately, the abovementioned generalizations of the log-logistic distribution have some limitations. For instance, (i) adding extra parameter(s) to the distribution enhances flexibility, but even so, such practices usually result in reparameterization issues; (ii) the number of model parameters is increased, causing difficulty in estimating the model's parameters; (iii) some extending techniques reduce the tractability of the cdf, making manual computation of statistical properties more difficult; and (iv) other generalization techniques complicate the pdf, resulting in computational issues; incorporating new extra parameters into existing models increases flexibility, which is a desirable feature. On the other hand, it makes inferences more difficult [24, 25]. As a result, in this study, we modified the classical log-logistic distribution with a two-parameter continuous distribution referred to the log-logistic tangent distribution (LLT).

The primary goal of this current study is it to introduce and investigate a versatile modification of the log-logistic distribution using the Tan-G family generator method. As previously stated, the proposed two-parameter distribution may be more flexible than other popular log-logistic model extensions. It would eventually become clear that the proposed distribution family accommodates both monotone and nonmonotone hazard rates.

The rest of the paper is organized as follows. The proposed family is discussed in Section 2. Section 3 presents the proposed distribution. Some mathematical and statistical properties of the proposed distribution are discussed in Section 4. Section 5 presents the estimation of the unknown parameters of the model. Section 6 discusses a Monte Carlo simulation for the proposed model. In Section 7, an application to a real-life dataset is presented and discussed. The comparison of some of the parametric probability distributions and their submodels is presented. Finally, concluding remarks and a work summary are presented in Section 8.

2. The Proposed Family

Several continuous probability distributions involving trigonometric functions have been developed in the statistics literature by many researchers, with the tangent distribution being notable due to its wide range of applications in many real-life datasets from various disciplines. A useful survey in this context can be found in [11, 2427]. In this study, we concentrate on the trigonometric classes of continuous probability distributions, which are described by a cdf concerning trigonometric functions (tangent, cosine, sine, secant, and different mixtures of them). The Tan-G family developed in [11, 2629] is the fundamental work.

As indicated by its name, Souza et al. [28] introduced a new method for extending the classical probability distributions, leading to greater flexibility in analyzing and modelling different data types. They looked at a parent distribution, which is an arbitrary continuous probability distribution with a cdf and a corresponding pdf. The cumulative distribution function (cdf) is defined by the associated tangent family of distributions.

Equation (7) can be written aswhere is the cdf of the parent (baseline) distribution, and if has pdf , the pdf of the class is given by

The survival function is obtained by

The failure rate function is given by

The reversed hazard rate function is given by

Also, the cumulative hazard rate function is given by

Recently, Souza et al. [28] proposed and studied the Burr-XII tangent distribution which serves as a potential lifetime model. Ampadu [29] introduced and studied the Weibull tangent distribution with applications to health science data. Hence, the Tan-G family is a special family for extending the classical well-known lifetime distributions without adding an extra parameter.

3. The Log-Logistic Tangent Distribution

In this section, the LLT distribution has been examined, considering that is the cdf of the log-logistic distribution.

The distribution function (cdf) of the log-logistic tangent distribution, for can be expressed as

The corresponding probability density function to the abovementioned cdf is given by

The survival function is expressed as follows:

The hazard rate function is obtained by

The reversed hazard rate function is expressed as follows:

The cumulative hazard function can be given as follows:

With is the vector parameter in all of the abovementioned equations, respectively.

Some possible shapes of the pdf, cdf, hazard function, survival function, and the reversed hazard functions of the log-logistic tangent distribution are displayed in Figures 17.

The hazard rate function of the proposed distribution can accommodate for both monotone (increasing and decreasing) and nonmonotone (i.e., unimodal) hazard rates, as it can be seen in Figures 46, and Figure 7 represents the reversed hazard function of the LLT distribution.

4. Some Statistical Properties of the Proposed Distribution

In this section, we use numerical examples to derive some mathematical properties of the log-logistic tangent distribution, such as the quantile function, skewness and kurtosis, moments, and residual and reverse residual life.

4.1. Quantile Function

For this model, the quantile function of the LLT distribution is used in theoretical aspects of distribution theory such as statistical simulations and applications. To generate random samples, the simulation algorithm used a quantile function by following the steps in Algorithm 1.

Let X be the Tan-G-distributed random variable. The quantiles can be utilized to obtain data of the distribution according to

(1)Generate
(2)Specify
(3)Obtain an outcome of X by

The LLT distribution's quantile function is given by

The lower quartile (Q1), median (Q2), and upper quartile (Q3) of LLT distribution can be derived from the equation of the quantile function by setting , respectively.

The quantiles of the LLT distribution for some parameter values are shown in Table 1.

4.2. Skewness and Kurtosis

Some of the properties of the continuous distribution can be studied through its asymmetry and kurtosis. The mathematical form of the Moors Kurtosis and Galton asymmetry (or skewness) of the LLT model with two parameters is defined by the following relationship:where Q describes different quartile values.

The equations above can be solved as functions of the LLT quantile function. These measures have the advantage of being less sensitive in the presence of outliers and existing even when the distribution is devoid of moments.

4.3. Moments

Moments are essential in statistical modelling, particularly in role in applications. The LLT distribution’s rth moment is defined as

In fact, we have

The first five moments followed by the standard deviation, coefficient of variance, skewness, and kurtosis for some parameter values are shown in Table 2.

4.4. Residual Life and Reverse Residual Life

The residual lifetime function (rlf) has broader uses in risk management and survival analysis. The rlf of the LLT random variable (r.v.) can be expressed as follows:

Furthermore, the reverse residual lifetime function (rrlf) of the LLT r.v. can be obtained as follows:

5. Estimation of the Parameters

The maximum likelihood approach is used in this section to estimate the unknown parameters of the log-logistic tangent distribution based on a complete sample. Let represent independent random variables drawn from the LLT distribution. The sample’s likelihood function is defined as

The log-likelihood function can be written as follows:

By taking the 1st derivatives of the log-likelihood function in equation (28) with respect to α and β parameters and setting the result to zero, we get

It is worth noting that the MLE’s of respectively, can be achieved by equating the outcomes to zero and solving the system of nonlinear equations numerically. The well-known theory on MLE can be used under some standard regularity conditions, ensuring nice asymptotic properties [30].

6. Simulation Study

In this section, we use Monte Carlo (MC) simulation to evaluate the effectiveness of the Maximum Likelihood Estimation (MLE) method for estimating the log-logistic tangent distribution parameters. The simulation study is carried out to investigate the average bias (AB), mean square error (MSE), and root mean square error (RMSE) for the parameters of the proposed model. The simulation experiment was conducted by running a number of simulations with varying sample sizes and parameter values. The quantile function given in Eq. (21) was used to generate random samples for the LLT. The MC simulation study was iterated 750 times with sample sizes and the parameter scenarios in set I and in set II.

The MLEs are ascertained for each item of simulated data, say () for and the AB, MSEs, and RMSEs of the parameters were calculated bywhere

Table 3 shows the AB, MSE, and RMSE values of the parameters for various sample sizes. Based on these results, we draw the conclusion that MLEs do a good job of estimating parameters and that the estimates seem to be reasonably constant and closer to the true values for these sample sizes. In addition, Table 4 shows that the RMSE decreases as the sample size increases, as expected. Furthermore, as sample size increases, so does the AB. As a result, even if the sample size is small, the MLEs and their asymptotic results can be used to calculate confidence intervals for model parameters.

The simulation results for the aforementioned measures are depicted in Figures 8 and 9. These plots demonstrate that increasing the sample size n reduces the estimated biases. Furthermore, as the sample size n is increased, the estimated MSEs and RMSEs decay toward zero. These findings demonstrate the MLEs' efficiency as well as their consistency.

7. Applications to the COVID-19 Dataset

The log-logistic tangent distribution derivation's main interest is its application in data analysis goals, which makes it valuable in different disciplines, especially those associated with mortality data analysis. A number of proposed distributions for COVID-19 datasets have recently been proposed; for more information on these, please see [3139].

The log-logistic tangent distribution was applied to a real-life COVID-19 mortality rate dataset from Somalia for demonstration purposes, and its performance was compared to the performance of other fitted models such as the log-logistic, Weibull, Gompertz, and kappa distributions. For the log-likelihood, the most common information criteria including Akaike Information Criterion (AIC), Hannan–Quinn Information Criterion (HQIC), Corrected Akaike Information Criterion (CAIC), and Bayesian Information Criterion (BIC) values were used to select the most appropriate. The Anderson–Darling (A) statistic, the Cramer–von Mises (W) distance value, and the Kolmogorov–Smirnov (KS) statistic, as well as the corresponding value, are all recorded.(1)Log-logistic:(2)Gompertz distribution:(3)Kappa distribution:

See also the other four recently modified log-logistic distributions with three parameters and four parameters [14, 16, 40, 41].

The AIC is

The BIC is

The CAIC is

The HQIC iswhere the log-likelihood function is represented by , the sample size by , and the number of model parameters by . The following goodness-of-fit measures are being considered.

The Anderson–Darling (A∗) test statistic is given by

The Cramer–von Mises (W∗) test statistics is given bywhere is the ith observation in the sample and n is the sample size. When the data are sorted in ascending order, is calculated.

The best model has the lowest AIC, CAIC, BIC, and HQIC, a well as the lowest A∗, W∗, and KS values. In addition, the best model is chosen as the one with the highest log-likelihood function value, and values for the KS statistics are applied to compare the competitive distributions.

7.1. Data I : Somalia COVID-19 Mortality Rate Dataset

The dataset contains COVID-19 mortality rate from Somalia during the time between 1st March 2021 to 20th April 2021 (see https://covid19.who.int/). These data are made up of a rough mortality rate. The data frame contains 51 observations, which are formed by using the daily cumulative cases (DCCs) and daily new deaths (DNDs). The data are as follows:

3.008, 2.963, 4.762, 5.263, 11.382, 5.330, 10.471, 2.857, 4.274, 8.633, 5.882, 10.280, 5.330, 8.730, 7.377, 8.696, 8.257, 4.294, 5.330, 6.769, 11.627, 8.547, 9.302, 7.742, 6.173, 6.015, 5.330, 9.770, 5.330, 4.294, 5.330, 8.120, 7.547, 7.563, 5.330, 6.875, 8.800, 11.429, 5.330, 9.898, 5.330, 5.330, 9.630, 3.750, 5.330, 11.808, 5.330, 5.330, 5.330, 5.785, and 3.265.

7.2. Exploratory Data Analysis

The primary goal of data analysis is to extract information from the data. We used five different techniques to perform exploratory data analysis in this work: (1) data descriptive statistics; (2) Box plot; (3) TTT plot; (4) histogram; and (5) time-series plot. Figure 10 displays the Box, TTT, histogram, and times-series plots for our dataset.

The total-time-on-test (TTT) plot is a graphical method for determining the failure rate function's shape. There is qualitative information about the shape of the failure rate function in many real-world applications that can aid in the selection of a particular distribution. The TTT plot for our dataset in this work is shown, and it shows an increasing failure rate shape.

Table 4 shows us the descriptive statistics of the COVID-19 data for mortality rate between 1st March to 20 April 2021 by computing specific aspects of the data (central tendency and spread).

The proposed LLT distribution has the lowest AIC, CAIC, BIC, and HQIC values and the highest log-likelihood values in Table 5, so it is selected as the most suitable model among the competing distributions considered in this work.

Table 6 shows the parameter estimate and p value for the Cramer–von Mises (W∗), Anderson–Darling (A∗), and Kolmogorov–Smirnov (K–S) tests for all competing distributions using abovementioned dataset I. According to Table 6, the proposed LLT distribution has the lowest A∗, W∗, and K–S tests, as well as the highest p value. As a result, among the competing distributions considered in this study, the proposed LLT distribution is chosen as the most appropriate model.

The estimated pdf for the competing models is shown in Figure 11, the estimated cdf for the competing models is shown in Figure 12, and the estimated cdf, Kaplan–Meier, pdf, and PP plots for the proposed model are shown in Figure 13.

8. Conclusions

The two-parameter log-logistic model has found widespread use in statistical sciences, especially survival analysis, biomedical sciences, engineering, and actuarial sciences. In this paper, we proposed a new probability distribution that derives from a combination of log-logistic distribution and a Tan-G trigonometric class of distribution. This model was named log-logistic tangent distribution “LLT distribution” and has been successfully extended in this work. Reliability function, pdf, cdf, failure rate function, reversed failure rate function, and the cumulative failure rate function were carefully derived, and expressions for the basic statistical properties for the log-logistic tangent distribution have been developed. The log-logistic tangent distribution was applied to the COVID-19 mortality rate dataset and provided a better fit than the known two-parameter, three-parameter, and four-parameter competitors. Gompertz, log-logistic, kappa, exponentiated log-logistic, Marshall–Olkin log-logistic, Kumaraswamy log-logistic, and beta log-logistic based on goodness-of-fit tests, selection criteria such as AIC, BIC, CAIC, and HQIC values, and the log-likelihood value were applied. We, therefore, conclude that the log-logistic tangent distribution is the most adequate model among the ones considered, as well as a very competitive model for describing different datasets in different areas of application.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the Taif University Researchers Supporting Project, no. TURSP-2020/220, Taif University, Taif, Saudi Arabia.