Abstract

Coronavirus disease (COVID-19) has created an unprecedented devastation and the loss of millions of lives globally. Contagious nature and fatalities invariably pose challenges to physicians and healthcare support systems. Clinical diagnostic evaluation using reverse transcription-polymerase chain reaction and other approaches are currently in use. The Chest X-ray (CXR) and CT images were effectively utilized in screening purposes that could provide relevant data on localized regions affected by the infection. A step towards automated screening and diagnosis using CXR and CT could be of considerable importance in these turbulent times. The main objective is to probe a simple threshold-based segmentation approach to identify possible infection regions in CXR images and investigate intensity-based, wavelet transform (WT)-based, and Laws based texture features with statistical measures. Further feature selection strategy using Random Forest (RF) then selected features used to create Machine Learning (ML) representation with Support Vector Machine (SVM) and a Random Forest (RF) to make different COVID-19 from viral pneumonia (VP). The results obtained clearly indicate that the intensity and WT-based features vary in the two pathologies that are better differentiated with the combined features trained using SVM and RF classifiers. Classifier performance measures like an Area Under the Curve (AUC) of 0.97 and by and large classification accuracy of 0.9 using the RF model clearly indicate that the methodology implemented is useful in characterizing COVID-19 and Viral Pneumonia.

1. Introduction

The extremely infectious nature of coronavirus disease (COVID-19), which has been declared a pandemic, has paved the way for a phenomenally high infection rate, leading to overburdened healthcare systems globally [1, 2]. COVID-19 pneumonia has an irreversible tendency to progress into respiratory system collapse, multiple organ dysfunction, and even fatality. Chest X-ray (CXR) radiography is used predominantly for screening, assessment, and diagnosing different categories of pneumonia and proved to be cost effective[36]. Researchers concluded that CXR proved to be useful in disease prognostic studies [7]. With the aid of certain CXR-based characteristic features, it has become possible for radiologists to diagnose viral pneumonia (VP) [3, 8].

Some of the CXR characteristics pertaining to COVID-19 pneumonia encompass consolidation, ground glass opacity and spread across peripheral and lower zones with bilateral involvement [6]. CXR was employed for triage to determine the precedence of patients to be treated [9]. Currently, machine learning (ML) plays a predominantly instrumental part in addressing several diagnostic challenges, including detection of breast cancer, brain tumour detection, and lung cancer, [10, 11]. The relentless evolution and outreach of deep learning (DL) has further enhanced and led to a wider usability of artificial intelligence in medical informatics, including CXR processing [9]. Differentiating normal CXR and different categories of cases of pneumonia, including COVID-19, has been attempted with the aid of Alex Net-based DL [11, 12]. Handcrafted feature extraction using different transforms and texture computations in conjunction with ML-based models has also been investigated to serve the purpose of providing aid in CXR-based screening for COVID-19 [13, 14]. There exist several approaches to extract features from images, including histogram based, texture based, transform based, and key point based. Based on the features of the image, they need to be extracted which further need scientific evaluation. The texture-based feature extraction adopts several techniques to compute the texture, to name a few grey co-occurrences matrix-based computation, Laws texture computation, fractal-based models, and Gabor filter based texture extraction [15, 16]. Essentially texture is the information that reveals how frequently the intensity patterns available in a given image manifests repeatedly. Texture provides very useful insight into the inherent characteristics of the image that could be used for image analysis [17]. It has proven its usability in object recognition, segmentation, and content-based image retrieval in a broad range of image processing applications including medical images, remote sensing, and multimedia images [18]. Pixel intensity value and texture play a key role in visual recognition of the subtle patterns in an image; the ability of the human visual recognition system to process this stimulus is the primary skill to interact efficiently with the surrounding environment [19]. A comprehensive literature survey was carried out for the problem statement and was listed in Table 1.

Processing and reproducing these human features using computer systems has been a much-researched topic in the current era [20, 21]. Laws texture features were extensively applied to extract the texture features and further build machine learning models to categorize a different set of images including medical images such as Ultrasound images, microscopic images, CT, and MRI based images to cater for various pathologies [22, 23]. Microscopic biopsy images were used to extract texture features using Laws, GLCM, Wavelet, and Tamura’s features with an impression that these features were easily interpretable and further with the aid of the ML model classified as cancerous and noncancerous [24]. Histogram of gradient (HOG), local binary patterns (LBP), Haralick, and other features were explored, and for each category of features, an individual ML model is constructed to explore the usability of texture features to categorize COVID-19 images from normal [25, 26]. The thresholded version of LBP texture features with the ML model and simple intensity-based statistics were explored to categorize different staining patterns of immune-fluorescence (IIF) microscopic images [27, 28]. Effective feature extraction does need efficient preprocessing of input images and several approaches were attempted with emphasis on preserving edge information along with conventional preprocessing techniques. Kumar et al. evolved the analysis of stages implicated in the augmentation of microscopic images. The segmentation of background cells and features extraction was considered in their work which ends in classification [29].

The Group Search Optimization Algorithm depicted for optimization to optimize the sequences obtained from the mining process was discussed by Lakshmanna and Khare [30]. The technique concatenating spatial pyramid Zernike moments based shape features and Law’s texture derived for capturing the macro and microdetails of each facial expression [31].

Dourado et al. deduced an approach and is validated among three medical databases. The cerebral vascular accident images, lung nodule image data set, and skin image data set for stroke type, malignant, and melanocytic lesions classifications, respectively, [32]. Krishnamurthy et al. proposed an algorithm for liver diagnosis using ultrasound images. Usage of segmentation is another important feature in this work. It creates a space for researchers to pave a path for the interesting challenge at each phase from segmentation towards classification [33]. Ohata et al. derived a technique for automatic detection of COVID-19 infection using CXR images through transfer learning [34]. Hasoon et al derived a two classification model to detect the abnormal case of COVID-19. The preprocessing is applied for image thresholding and noise removal in this technique. The morphological operation, ROI detection, and feature extraction are salient features of this work [35].

Susaiyah et al. implemented image segmentation by means of multilevel Otsu thresholding. This was validated through using Dice’s coefficient; the Jaccard index and accuracy recommend the first amongst four levels in Otsu [36]. Parvaze and Ramakrishnan considered thirteen Positive and intermediate intensity level images with homogenous, Centro mere, and Nucleolus patterns for LS based segmentation to extract objects of interest [37]. Image Denoising Technique for Ultrasound Images was deduced for structure preserving ability and efficacy [38]. An anisotropic diffusion smoothing filter is utilized to obtain a smoothing effect across the boundaries [39]. Another optimization algorithm derived from the PS algorithm implemented for the mining of sequences. The three different parameters length, weight, and RE are used for identifying frequent patterns. Ramaniharan et al. implemented a technique to analyze the shape changes. The shape based Laplace Beltrami (LB) Eigen value features. The machine learning is the optimum in this case and is highlighted in the work [40]. Bhattacharya et al. implemented a Deep learning technique in COVID-19 analysis [41]. Saleem et al. derived an approach with Situation-aware BDI Reasoning for identifying the early on symptoms of COVID-19 by means of a smart watch [42]. Basha et al. used a denoising algorithm for the accurate calculation using WNNM in ultrasound Images of COVID-19 [43, 44]. Different image compression techniques which are further useful for retrieving the data and transmission in multimedia in the post-COVID scenario [45, 46]. Gadekallu et al. contributed the procedures for near the beginning detection of Retinopathy due to diabetes. The techniques employed are PCA-Firefly based Deep Learning model. [47, 48].

Along with preprocessing, segmentation of the region of interest is necessary for designing effective strategies to delineate the tissue of interest [14, 49, 50].

After a widespread literature survey, we derived a technique made to formulate a comprehensive strategy to distinguish COVD-19 and VP with the aid of CXR by computing first-order statistical features, wavelet-based features, and laws texture features. The extracted features were subjected to feature selection using Random Forest (RF) and finally training the above features using a support vector machine (SVM) and RF classifiers. The salient features of the work is to use the fusion of statistical features, Wavelet features, and Laws texture features within the threshold region.

The main contributions are as follows:(i)Utilization of the multithreshold approach to segment and thereby extract the texture features(ii)Visualization of the feature maps(iii)Investigating the handcrafted features that could identify the desired pathology

Thus, the novelty in our work is the utilization of the most significant features to construct the machine learning model.

2. Methodology

The block diagram representing the step-by-step computation is shown in Figure 1. The input CXR images considered herein were derived from the Kaggle repository [2, 10]. The goal of this study was to scrutinize the variations in CXR images pertaining to COVID-19 and VP; hence, only these two categories were taken up for a thorough examination. In the dataset used herein, there were 3617 COVID-19 images and 1345 VP images. The input images of both categories were first converted into greyscale images. Next, the images were filtered using a median filter with a mask size of 3 × 3 to eliminate spurious intensities and preserve the edge information. Validation of the filtering approach and comparison with other methods were not attempted herein. Filtered images were subjected to multithreshold-based Otsu segmentation [12]. Three segmented masks were obtained in this process and the mask with the highest mean intensities from CXR was selected for further analysis. We hypothesise that the highest mean intensity mask might represent the region of interest (ROI), which includes COVID-19 as well as VP. The segmented ROI is not validated owing to the unavailability of the reality. Eight features were computed, which included first-order statistics-based features such as mean, standard deviation, skewness, and kurtosis, bottom 5 percentile, bottom 10 percentile, top 90 percentile, and top 95 percentile from original images within the segmented mask from the segmented ROI region in the CXR. The same 8 features were obtained from wavelet transformed images across four decompositions LL, LH, HL, and HH. Also, eight statistics features were computed from Laws texture maps within the threshold masks, thus making a feature vector of 72 features for each image. Biorthogonal wavelets are subjected to a closer study owing to their multi-resolution properties [8, 51].

The mathematical representation of the wavelet transform is shown in the (1), where “a” and “b” are the scale and translation parameters, respectively. The SVM based classifier model operates by minimizing the cost function and can be represented as follows:

The Random Forest based classifier model functionality is to put up a strong learner from an ensemble of learners, by partitioning the data into individual trees in the forest as shown in the equation.

The 72 extracted features were then subjected to critical statistical analysis to measure the extent of the significance of the features. Subsequently, the features were subjected to Random Forest based feature selection to extract the most useful and viable features which were employed in building the classifier model using SVM with linear kernel and RF classifier with 60% of data retained for training and 40% of data for purpose of evaluation. The classifier models are then subjected to validation using 40% of the data. Furthermore, for validation purpose the performance measures were computed including receiver operating characteristics (ROC) from which AUC is computed [52]. The classifier performance measures were also computed adopting the measures presented as follows:where TP = True Positive, TN = True Negative, FP= False Positive and FN= False Negative.

The computation is performed using Python compiler 3.6 and packages including Scikit-learn. (Algorithm 1)

Input: image x
(1)Preprocess given image using a median filter
(2)Apply multi-threshold on the filtered image and get 3 masks
(3)Consider the segmented with the brightest pixel intensities
(4)Extract Wavelet features at 4 decompositions, Laws texture features, and basic statistical features
(5)Use Random Forest based feature selection for obtaining the best features
(6)Construct an ML model using SVM and RF
(7)Perform the validation using training and test data
Output: Trained ML model based prediction of Pneumonia and COVID-19

3. Results

Representative CXR images of COVID-19 and VP are shown in Figure 2 along with the respective histograms and gradient images. Figures 2(a) and 2(d) depict COVID-19 and VP images, respectively, whereas (b) and (e) represent the respective histograms, while (c) and (f) depict the respective gradient images. Both pathologies demonstrate the presence of infection spread but with varying intensities and possible density of high intensities observed in COVID-19 that could also be observed from the histogram. The gradient images reveal the edge information pertaining to various anatomical structures.

The median-filtered images of the representative COVID-19 and VP along with histograms and gradient images are shown in Figure 3. Figure 3(a) and 3(d) depict COVID-19 and VP images filtered using a median filter, respectively, whereas (b) and (e) represent respective histograms, while (c) and (f) depict the respective gradient images. Marginal variation is observed after performing the filtering operation compared with the histograms in Figure 2. Gradient images seem to have augmented edge information representing sharp boundaries in both pathologies as compared to the image gradient images in Figure 2. In particular, as shown in Figure 2(f), the edge information is more pronounced.

The median-filtered image and multithreshold-based Otsu segmentation outputs are depicted in Figure 4. The median-filtered images of COVID-19 and VP are shown in (a) and (f), respectively. The three regions segmented using multithreshold-based Otsu segmentation are shown in Figures 3(b) and 3(g), respectively. The three regions are binarised according to the labels shown in Figures 3(c)–3(e) belonging to COVID-19 and (h–j) to VP, respectively.

The segmented regions representing the high mean intensities pertain to label three in both COVID-19 and VP, as represented in Figures 4(e) and 4(j), respectively, are considered for feature extraction.

The wavelet-based decomposed images and histograms of the representative COVID-19 and VP are shown in Figure 5. In Figures 5(a)–5(d) the LL, LH, HL, and HH of the COVID-19 image are depicted, and Figures 5(e)–5(h) represent the corresponding histograms. In Figures 5(i)–5(l), the LL, LH, HL, and HH of the VP image are depicted and Figures 5(m)–5(p) represent the corresponding histograms. Reasonable variations were observed in the LH, HL, and HH histograms of the COVID-19 and VP images. Especially, a higher number of positive coefficients could be observed in the LH and HH sub-bands in the COVID-19 image compared to that of the VP.

The feature maps obtained using Laws texture computation are depicted in Figure 6. Figures 6(a)–6(d) represent the Laws feature maps for the COVID-19 image, while the images Figures 6(e)–6(h) represent the maps belonging to the VP image. On qualitative observation, there appears to be variation in between the texture feature values of COVID-19 and VP images.

First-order, Wavelet features, and texture features were obtained directly from the segmented ROI-CXR images that were further subjected to statistical significance and to build an ML model.

The RF-based feature selection algorithm was able to pick 39 features from 72 feature vector as important features, which were further used for formulating the classifier model. The representative 8 feature values computed from HH decomposition of Wavelet transformed images were incorporated in Table 2.

The ROC plot generated using the SVM and RF classifier model for the test data is shown in Figure 7(a). The AUC of 0.97 obtained indicates that the RF classifier model can differentiate COVID-19 and VP to a large extent in comparison with the SVM. The confusion matrices for the SVM and RF are shown in Figures 7(b) and 7(c), respectively. A higher number of TPs was observed in the confusion matrix of the RF classifier, while marginally fewer FNs could be noticed in the confusion matrix of the SVM.

The performance measures obtained using the two classifier models are listed in Table 3. In comparison, increased sensitivity, F1-score, and AUC can be seen from the RF classifier.

4. Discussion

Efficient screening and diagnosis of COVID-19 methods executed with the advent of state-of-the-art image processing and ML-based approaches is needed. CXR images used by physicians for screening purposes provide information about the presence of the infection region and the spread of the infection. The median filter is a standard filtering process to reduce the variations in pixel intensities while preserving the edgelike information. A mask size of 3 was selected in this study to preserve the local morphology of the anatomical structures. In this work, the preprocessed images subjected to the generation of segmented masks were observed to be effective in segmenting the infection region; however, a threshold-based approach over segmentation resulting in noninfection regions was also observed. This might be due to the resemblance of the infection regions and certain anatomical regions with respect to the pixel intensities. The analysis of the histograms is deliberately attempted as the features obtained from the preprocessed images and the wavelet-decomposed images are histogram-based features.

Hence, the morphology of the histograms was of assistance in comparative analysis. The certain intensity and WT-based feature values were observed to be more effective for differentiating COVID-19 and VP. Even though both pathologies seem to be represented by bright regions, there exist subtle variations which are picked up by most features. The differentiation of COVID-19 from VP and other CXR images has been attempted with DL and other artificial-based methods in a broad range of studies [14]. Sekeroglu B evaluated the transfer learning approach by means of pretrained networks like VGG19, and Inception ResNet [57]. Pal depicted a random forest classifier with a combination of tree classifiers. In this technique, each classifier is generated with a random vector sampled autonomously from the input vector. Each tree is used for classifying an input vector [58]. An audio signal of 4-second duration was considered for extraction of features. Finally, it was transformed onto a spectrogram and the extracted features were added and classified using ML algorithms [59]. Intracranial haemorrhage (ICH) is a serious concern with high rates of mortality. The Deep Learning technique proposed which depends on the massive amount of slice labels for training purpose [60]. Kuruoglu and Li proposed a technique using the Unscented Kalman Filter for Epidemiological Parameters for COVID-19. The non-Gaussianity and nonlinearity offers computational simplicity in this paper [61].

Rodrigues et al., feature extractors were applied to the Region of Interests (ROI) that includes nodules. The analysis of malignancy of the nodules can be studied at some stage in the classification step by incorporating ML techniques [62]. The results of the proposed technique are compared with the works cited in references [5356]. However, these analyses were performed using entire images in the DL sense.

In this work, handcrafted features were investigated from preprocessed, Wavelet-decomposed images and Laws texture maps to understand the local inherent intensity variations. The combined features and the feature selection from the set were important to formulate the most feasible feature vector to build the ML model. ML-based results lay further stress on the comprehensive pipeline that is necessary to address the challenges of categorising the pathologies. Even though the two ML approaches provided better results, the RF-based model outperformed the overall results. In comparison with the DL approaches, the RF model with the extracted features herein seems to have outperformed some of the studies as depicted with the accuracy parameter in Table 4. The time elapsed to compute different modules using a desktop with Intel core I5, Python 3.7, and SK-Learn ML package is represented in Table 5.

5. Conclusion

The meticulous design encompassing a comprehensive methodology utilising image analysis techniques and ML models can aid physicians and radiologists in performing efficient and accurate screening for COVID-19. Preprocessing and determining the effective region for extracting features might be essential, as comprehended from the study. In particular, the segmentation mask, even though not robust, can locate the majority of the local infection region, which might be critical in the pipeline design. The study with first-order features from preprocessed, Wavelet sub-bands and Laws texture maps integrated with the ML approach serves to discriminate the effects of COVID-19 from Viral Pneumonia for effective and exact diagnosis to mitigate the spread of the infection. Identifying the handcrafted features is a very exhaustive process and is the main limitation of the work. In the future, researchers can incorporate texture features and other forms of features including morphological features to distinguish the pathologies.

Data Availability

The data used in this paper are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding this work.

Acknowledgments

This work was funded by Princess Nourah bint Abdulrahman University Researchers Supporting under Project no. PNURSP2022R197, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.