1 Introduction

The arrival of Covid-19 has brought significant threat to human life which started from China in November 2019 and later on spread across the world. It has been reported that more than 63.2 million people have already been infected in the world which includes approximately 1.47 million deaths. The world health organization (WHO) continuously provides the necessary information for nations to protect against Covid-19 (Fong et al. 2021). Countries like United States, India, Brazil, Russia, France, Italy, China are the highly suffered nations from this threat (Salepci et al. 2020). No vaccine for this threat has been developed in the first 6–8 months of the Covid-19 as many countries were working on its production. Finally, a few countries have been able to develop vaccine for Covid-19, which is still in the development or testing stage. The people infected from Covid-19 have moderate or mild symptoms such as fever, cough, and breathe shortness. However, people suffered from severe pneumonic conditions in their lungs that resulted in death as well (Elibol 2020; Padda et al. 2020; Smith et al. 2020; Sharma et al. 2020). Most of the people, died from Covid-19, had suffered high chest congestion (pneumonia) due to significant reduction in oxygen level which led to major heart attack (Chen et al. 2021). On the other hand, pneumonia is also a kind of lung disease that leads to inflammation in the small air sacs within the lungs of the human body. Lungs may fill up with a significant amount of fluid, causing difficulty in breathing. Pneumonia may be caused by viral infections (like Covid-19 or flu), common cold, and bacterial infections. Due to the arrival of Covid-19 disease, it is very challenging task for medical experts to detect lung infections (either viral/bacterial pneumonia or Covid-19 pneumonia) from chest X-ray images (Bentivegna et al. 2020; Tabatabaei et al. 2020; Zhan et al. 2021). The lung infection caused by a novel coronavirus is called Novel Coronavirus Infected Pneumonia (NCIP) (Fong et al. 2021).

Apart from this, lung cancer is another type of disease that has a significant threat to humans (Sun et al. 2016; Zhou et al. 2002). The WHO claims that approximately 8 million people have been suffered from lung cancer till date. But this number is not higher than the lung diseases caused by Covid-19 and pneumonia within 15 months. Several studies have been presented on lung cancer for its early prediction using computer vision and soft computing methods (Kumar et al. 2015; Li et al. 2018; Makaju et al. 2018). Medical imaging techniques such as X-ray, magnetic resonance imaging (MRI), computed tomography (CT), isotope, etc., have been regularly used for detection of lung diseases. Among these, X-ray and CT images are frequently used by radiologists and physicians to discover lung diseases. Hence, many doctors recommend the chest X-ray for the lung diseases analysis, especially during the Covid-19 period. For many decades, medical practitioners have used the X-ray imaging to analyze and explore the various abnormalities in human body organs (Khatri et al. 2020). Many studies revealed that X-ray technique is a cost-effective method for disease diagnosis while providing the pathological alterations along with its economic efficiency and non-invasive properties (Yasin et al. 2020). The lung infections in chest X-ray images have been found in the form of consolidations, blunted costophrenic angles, broadly distributed nodules, cavitations, and infiltrates (Angeline et al. 2020). Therefore, radiologists detect several conditions like pneumonia, pleurisy, nodule, effusion, infiltration, fractures, pneumothorax, pericarditis using X-ray images (Padma and Kumari 2020; Rousan et al. 2020).

Detection and classification of lung diseases using chest X-ray images is a complex process for radiologists. Therefore, it received significant attention from the researchers to develop automatic lung disease detection techniques (Avni et al. 2011; Jaeger et al. 2014; Pattrapisetwong and Chiracharit 2016). Since the past decade, many computer-aided diagnosis (CAD) systems have been introduced for lung disease detection using X-ray images. But such systems were failed to achieve the required performance for lung disease detection and classification. The recent Covid-19 assisted lung infections have made these tasks very challenging for such CAD systems. It is essential to detect the appearance of pneumonia in the lungs and its classification into Covid-19, bacterial, and viral infections. This classification provides appropriate medical attention to pneumonic patients. Several works have been presented with CAD systems for Covid-19, and automated image processing and deep learning techniques (Ge et al. 2019) have been developed for pneumonia disease detections using chest X-ray images. As deep learning is a fully automatic feature learning and extraction technique, it takes longer time for complete dataset training and detection. Therefore, such solutions are not reliable and robust against the increased number of datasets. Deep learning techniques such as convolutional neural network (CNN) gained attention for lung disease detection due to better accuracy and feature representation (Asuntha and Srinivasan 2020). However, computational complexity is not yet discussed and addressed by any of the research. The computational complexity of CNN is higher due to high-dimensional feature space for each X-ray image.

In this paper, we aim to focus on early detection and classification of lung diseases from the raw X-ray images for appropriate treatment using the semi-automated approach for robust feature extraction and deep learning with minimum computation overhead. The requirement of robust and reliable detection and classification of lung diseases (like Covid-19 and viral/bacterial pneumonia) motivates the proposed framework. The proposed model is called Fusion and normalization features based RNN-LSTM (F-RNN-LSTM). This consolidated framework focuses on the robustness and high performance with minimum computational efforts. The input raw X-ray image is pre-processed by applying the median filtering and histogram equalization techniques. To extract the region of interest (ROI) from the enhanced image, a dynamic region growing technique for the image of different modalities and dimensions has been proposed. The sets of feature vectors have been built from the ROI images using visual, texture, intensity and invariant moment features. We have also proposed a robust feature normalization technique to enhance the detection accuracy. After that, we have applied a number of machine learning and soft computing techniques for classification which includes SVM, ANN, KNN, and ensemble classifier. In addition to this, a deep learning approach using RNN with the LSTM model is designed to estimate the probabilities of lung conditions and early prediction to meet higher accuracy and minimum computational efforts. A deep experimental evaluation of the proposed model has been performed with the state-of-the-art Covid-19 detection methods.

Rest of the article is organized as follows: Sect. 2 presents a brief review of related works. Motivations and research contributions are given in Sect. 3. Section 4 illustrates the proposed methodology. Section 5 demonstrates the experimental results on different datasets. Finally, Sect. 6 summarizes the major findings and concludes this research work.

2 Related works

Recently, accurate detection of lung diseases like Covid-19 and pneumonia (viral/bacterial) has received significant attention. Computer vision and soft computing techniques have designed CAD systems for X-ray-based lung disease prediction. Recently researchers heavily relied on deep learning approaches due to their superiority in automatic feature extraction and high detection accuracy. In this section, we present a brief review of deep learning and computer vision techniques for pneumonia and Covid-19 disease detection using chest X-ray images. Furthermore, we summarize the research motivation and highlights the major contributions of this work.

2.1 Disease detection using X-ray images

The deep learning-based approach for chest disease detection was proposed in Abiyev and Maaitah (2018) using X-ray scans. They designed and evaluated an automated CNN model for chest disease diagnosis using an X-ray image dataset. The author disclosed significant performance of the CNN model with other soft computing techniques in terms of training accuracy, testing accuracy, and training time. The pneumonia detection from chest X-ray images has been performed using computer vision and soft computing techniques in Varela-Santos and Melin (2020). In this method, ROI were extracted using the segmentation of X-ray images, followed by texture feature extraction, and then neural network was applied for classification. CNN was used for the detection of pneumonia from chest X-ray images in Lin et al. (2019). They prepared dataset from the Kaggle repository and designed ConvNet to process the input X-ray image. Another lung disease detection approach using computer vision methods and cooperative CNN model was proposed in Wang et al. (2019). The segmentation algorithm was applied to locate ROI in lung images and then local and global features were extracted for effective pneumonia classification. The cooperative CNN model was designed to perform the classification. Another deep learning-based approach was introduced in Thakur et al. (2021) where the author used a variant of CNN, called VGG16 for classification of pneumonia using X-ray chest images dataset. They used the transfer learning and fine-tuning approach during the learning phase. The approach for detecting pneumonia is proposed in Angeline et al. (2020) using X-ray images and CNN. They trained the CNN to classify the input X-ray image into normal or pneumonic class. The accurate and efficient pneumonia detection from the input chest X-ray images was presented in Sarkar et al. (2020). They first pre-processed the input X-ray image using bilateral filtering and contrast enhancement techniques. For classification, they build deep residual learning with the separable convolutional networks. Another CNN-based pneumonia disease detection system was introduced in Nath and Choudhury (2020). They trained the X-ray images of normal and abnormal conditions and prepared a model to detect the presence of pneumonia. A weighted soft computing method was proposed in Hashmi et al. (2020) using weighted anticipations from the conventional deep learning systems like DenseNet121, MobileNetV3, Xception, and ResNet. A supervised learning mechanism was proposed to predict the outcomes according to dataset representation. The ensemble technique for pneumonia detection from the chest X-ray input images was proposed in Habib et al. (2020). They designed a deep CNN model called CheXNet with VGG-19 for feature extraction. These features were ensembled for classification purposes. They introduced methods like synthetic minority oversampling technique (SMOTE), random over sampler (ROS), random under sample (RUS) to address the problem of data irregularity.

Similarly, recently various deep learning-based studies have been proposed for Covid-19 disease detection using chest X-ray images. The deep learning model proposed in Abbas et al. (2020) was called decompose, transfer, and compose (DeTraC) and used for the classification of Covid-19 disease from X-ray images. The DeTraC model is robust against the data irregularities problems. Similarly, Covid-19 disease detection from chest X-ray images using deep learning was proposed in Jain et al. (2021). They collected X-ray images of Covid-19 and normal patients. They applied pre-processing and data augmentation. For classification, CNN model was designed via automatic feature extraction.

CNN was designed again in Dansana et al. (2020) for the classification of pneumonia using VGG-19, decision tree, and Inception_V2 over CT scan images and X-ray images. The automatic framework of coronavirus disease detection and classification was proposed in Apostolopoulos and Mpesiana (2020). They build the dataset for normal and Covid-19 subjects by collecting the chest X-ray images. They prepared and analyzed the CNN model for automatic disease prediction. An investigation-based approach was proposed in Pham (2021), where the authors designed fine-tuned pre-trained CNNs for the Covid-19 disease classification using chest X-ray images. They investigated the fine-tuned technique of pre-trained CNN to introduce the Artificial Intelligence (AI) solutions for rapid and effective Covid-19 detection. The expert-designed model called COVIDDetectioNet was proposed (Turkoglu 2021) for the classification of Covid-19 from chest X-ray images. They used features chosen from the combination of deep features. They employed a pre-trained CNN-assisted AlexNet model with a transfer learning mechanism. The relief feature selection technique was introduced to select the robust features from all the layers of deep learning architecture. Then SVM was applied for classification. In Butt et al. (2020), the author first reviewed the various CNN models, used for classification of lung conditions into Covid-19, viral pneumonia, and normal using chest scans. They designed CNN model to classify pneumonia and Covid-19 lung infections using chest CT scans. In Hira et al. (2020), a method of detection and classification of Covid-19 disease into bacterial pneumonia, viral pneumonia, and the normal class was proposed. They applied the proposed methodology on various chest X-ray datasets of different sizes using a deep transfer learning approach. The two-ensemble deep transfer learning systems were designed (Gianchandani et al. 2020) for Covid-19 disease detection using chest X-ray images. They used the pre-trained models to enhance detection performance. They performed the detection of covid-19, bacterial pneumonia, and viral pneumonia.

More recently, a few deep learning models have been proposed which included CNN and transfer learning for Covid-19 prediction using chest X-ray images. In Singh et al. (2020), a CNN based model was proposed and enhanced using multi-objective adaptive differential evolution technique for Covid-19 detection using chest X-ray images. Another deep model based on densely connected convolutional networks, ResNet152V2 and VGG16 (Singh et al. 2021) was ensembled to extend the accuracy of the proposed model which classified the given chest X-ray images into Covid-19, pneumonia, tuberculosis and healthy. A modified VGG16 and DenseNet201 with ResNet152V2 was proposed for multiclass and binary classification of the chest X-ray images (Gianchandani et al. 2020). Covid-19 positive, pneumonia and normal classes have been used for multiclass classification whereas Covid-19 positive and negative classes are used for binary classification.

3 Research motivations

The recently proposed methods for detection/classification of lung diseases such as Covid-19, viral pneumonia, and bacterial pneumonia utilized the deep learning models primarily based on CNN. However, differentiating the Covid-19 caused pneumonia from viral and bacterial pneumonia is still a challenging research problem due to lack of the sufficient experiments on scalable datasets. The following research gaps of recent methods for Covid-19 and pneumonia disease detection using X-ray scans are the key motivations for the proposed method.

  • Approximately 95% of CNN-based methods were failed to consider the challenges of X-ray image quality enhancement. Therefore, infected areas of X-ray images were not accurately identified during automatic CNN feature extraction.

  • Existing CNN-based works took complete lung image for automatic feature extraction, but only the features of infected lung regions are relevant for diagnostics. The lack of ROI estimation in chest X-ray images leads to high-dimensional and irrelevant feature extraction for classification. It also restricts the severity analysis of disease due to the lack of ROI-specific features.

  • High training time of CNN is a challenging issue that leads to a computationally inefficient solution for early detection of lung diseases.

  • Covid-19 and pneumonia detection methods using deep learning have been evaluated on small X-ray samples in which 10–15% samples were considered for testing and 80–85% samples were used for training and validation. To claim efficiency and reliability, such models require a better training and testing ratio.

  • The automatic feature extraction approaches of CNN relied on pre-training models using irrelevant ImageNet datasets for lung disease predictions. As the Covid-19 caused pneumonia is relatively new, lack of sufficient medical data in CNN pre-trained models led to unreliable feature extraction.

3.1 Contributions

To address the major research issues of the existing Covid-19 prediction methods, this work proposed a novel framework for early detection of lung diseases into classes Covid-19, viral pneumonia, bacterial pneumonia, and normal using machine and deep learning. The proposed model F-RNN-LSTM uses robust computer vision and soft computing algorithms. Deep learning technique RNN using LSTM has been introduced to classify lung diseases with high accuracy and less training time. Major contributions of this research are as follows.

  • We propose a consolidated framework to detect lung disease using chest X-ray scans by enhancing the raw image, locating the lung ROI, extracting the ROI-specific features, and applying the soft computing methods for automatic classification.

  • The raw X-ray images have been pre-processed for quality enhancement using median filtering, histogram equalization, and intensity adjustment. The enhancement image visualizes the infected areas effectively for accurate anticipation.

  • The enhanced X-ray images have been further used for ROI estimation using an adaptive segmentation algorithm. The segmentation algorithm extracts only lung-specific regions and discards other areas before applying the feature extraction.

  • The robust set of features such as Histogram of Oriented Gradient (HOG), texture, invariant moment and ROI intensity features have been extracted from the segmented image. After that, we apply the feature scaling (normalization) method to enhance detection accuracy. The normalized features can be used for disease severity analysis by medical practitioners effectively.

  • RNN deep learning model takes input of normalized features sequentially and creates the LSTM network followed by a fully connected network, SoftMax and classification layer. We designed the F-RRN-LSTM model with the aim to minimize computation time and maximize accuracy.

  • Soft computing methods like SVM, KNN, ANN, and ensemble have been trained with proposed computer vision method for evaluation.

  • The results have been obtained and presented for the proposed model using two scalable datasets using a proper training and testing ratio (70–30%). The outcome of the proposed model F-RRN-LSTM has been compared with the existing state-of-art methods.

The proposed work is deeply focused on the detection and classification of Covid-19 diseases. Firstly, it provides an effective way to analyse lung diseases using soft computing and machine learning techniques and then, it suggests a new deep learning approach F-RNN-LSTM for accurate prediction. Additionally, it provides the disease detection with severity analysis which is vital for any CAD tool. The proposed model emphasizes on the enhancement of raw chest X-ray images, extraction of infected portions, i.e., ROI and relevant features which is better than the high dimensional feature representation of CNN and its variants. Feature scaling or normalization is another key aspect of the proposed model which not only improves the classification accuracy but also helps in the estimation of the disease severity. Considering the real-time examples, automatic severity analysis becomes vital for appropriate treatment, however, existing techniques do not support severity analysis due to the lack of ROI-specific features. Vanishing gradients is one of the major issues in deep CNN models which affects both automatic feature extraction and classification. This problem limits the classification accuracy significantly. The proposed model overcomes this problem by designing the sequential deep learning model which considers the normalized hybrid feature vector as input to the network and provides better detection and classification.

4 Proposed methodology

This section presents the step-by-step details of the proposed model for robust and efficient classification of Covid-19 disease from input chest X-ray images. Figures 1, 2 show the proposed two architectures using soft computing, machine and deep learning techniques which has been referred as F-RRN-LSTM.

Fig. 1
figure 1

Framework of lung disease detection using soft computing techniques

Fig. 2
figure 2

Architecture of proposed F-RNN-LSTM for lung disease detection

The model shown in Fig. 1, takes raw Chest X-ray images, enhances the quality of raw X-ray images, extracts the ROI of lung regions, ROI/lung-specific feature extraction, features fusion and normalization, and finally uses soft computing and machine learning methods for detection and classification. These techniques are SVM, ANN, Ensemble, and KNN which take the pre-trained database (DB) as input and return the detection outcome. The outcome provides detection and classification of the particular lung disease.

Figure 2 shows the proposed deep learning-based approach to achieve better efficiency, and robustness for lung disease detection. We propose RNN-LSTM deep learning architecture which trains with the normalized and fused features of the test samples and predicts the lung diseases into Covid-19, viral pneumonia, bacterial pneumonia, and normal categories. The pre-trained DB has been used during the probability estimation process, where the sequential features of the input images have been partially learnt with pre-trained DB and estimated the anticipation probability with minimum processing time. The proposed design of the deep training and detection model helps to reduce both training and testing time in comparison to the CNN-based models. Furthermore, after disease detection (shown detected as Covid-19 in Fig. 2), disease severity analysis has been performed with the lung-specific features.

To the best of our knowledge, this is the first attempt of consolidating the deep learning and severity analysis for the prediction of Covid-19 and pneumonia diseases. The severity analysis is possible due to computer vision methods, which has been used to estimate the lung-specific features (low-level and high-level). Along with the detection of lung diseases, severity analysis is vital for appropriate treatment of the patients. In this regard, according to feature ranges \(0.3>\), \(0.3< \& 0.6>\), and \(0.6<\), we estimated the chest congestion probabilities as low, mild, and high respectively. We have estimated these ranges by considering the mean of the normalized hybrid feature vector. The normalization technique considered for the proposed approach is min–max. For both datasets, we have computed the mean values for each image in the range of 0–1. These values have been correlated with the visual appearance of infected regions for actual estimations. From this correlation and experimental analysis, we have found the above ranges for low, mild, and high chest congestions.

4.1 X-ray image quality enhancement

As the quality of input chest X-ray scans is low and contains noise, it may lead to suppressions of lung regions affected by congestions or fluids. Most recent techniques directly applied the deep learning models without quality enhancement; however, such methods may not be reliable for a longer time. The input chest X-ray image \(I\) has been pre-processed in the proposed model F-RRN-LSTM using adaptive intensity values adjustment, median filtering, and histogram equalization. The first operation focused on the adjusting the image intensity values for low contrast X-ray images. This technique mainly used to enhance the contrast as:

$$I^{1} = imadjust(I)$$
(1)

where, \({I}^{1}\) is outcome of contrast enhancement step using function \(imadjust\).

The median filtering has been applied further to remove the noise in the contrast-enhanced image. Adjusting the image intensity values leads to noise and also X-ray scan introduces the noise in the image. Median filtering shows the effective enhancement in comparison to the adaptive bilateral filtering (Asuntha et al. 2020), average filtering, and wiener filtering for X-ray datasets. Median filtering is a lightweight technique and commonly used in many image processing applications as it is more effective while considering the constraints of noise reduction and edge preservation. We used 2D median filtering in the proposed model that works by moving via the image pixel by pixel, replacing every value with the neighbouring pixel’s median value. The neighbour’s pattern is decided by the size of the window. The window size of a 3 × 3 neighbourhood has been used in this work. The 2D median filter is applied on \({I}^{1}\) as:

$$I^{2} (i,j) = median\,\left\{ {I^{2} (i,j)\left| {(i,j)} \right. \in w} \right\}$$
(2)

where, \({I}^{2}\) is outcome of the median filtering and \(w\) is the size of window.

The outcome of the image enhancement function in the proposed model is demonstrated on two sample chest X-ray images as shown in Fig. 3. The poor-quality X-ray images have been enhanced effectively by persevering the edges and removing the noisy portions. We have also tested the other filtering approaches (adaptive bilateral filtering, average filter, and wiener filter) and contrast enhancement techniques (histogram equalization and contrast limited adaptive histogram equalization), but these suffered from loss of edges information, increased background noise contrast, and lung regions data loss. The outcome shows the raw X-ray images improved with optimal contrast and image quality. This enhancement helps to accurately detect the lung regions during the ROI extraction process.

Fig. 3
figure 3

Outcomes of chest X-ray image enhancement

4.2 ROI extraction

The next approach of the proposed model is ROI extraction. For chest X-ray images, the ROI region represents the lung area that may have infections. In previous studies, the entire image was supplied to the network for feature extraction which results in redundant and irrelevant classification and poor severity analysis. The ROI extraction has been performed using image segmentation that detects the boundaries around the enhanced image. The image is divided into two regions- meaningful (white) and non-meaningful (black). The white areas represent the image ROI and are used in the feature extraction process. Although, several techniques are available for image segmentation; however, none of the existing works have used the ROI extraction approach for lung images to detect Covid-19 and pneumonia infections. The frequently used segmentation techniques are thresholding, binarization, K-means, watershed, Fuzzy C-means, and segmentation using optimization methods are ant colony optimization (ACO), genetic algorithm (GA), artificial bee colony (ABC). These methods suffer from problems such as over-segmentation, inaccurate ROI extraction, adaptability, and high computation time. This work propose an adaptive segmentation method based on region growing and morphological operations. The proposed segmentation method is aiming at accurate ROI extraction with minimum computation time.

figure a

Algorithm 1 shows a simple and accurate approach for adaptive ROI extraction from an enhanced chest X-ray image. The first step is to detect the edges by applying the canny edge detection to preserve the edge information. The segmentation starts with the edge detected image which is divided into \(N\) number of grids. For each grid, we apply the dynamic thresholding approach to perform the segmentation. Once all grids are segmented, these are replaced in the original image. Post-processing has been applied to deliver accurate and original intensity information in ROI. In post-processing, we first perform the morphological operations erosion and closing. Morphological erosion discards small objects and islands, and keeps only substantive regions. The morphological closing is applied to fill the small holes in erosion output and preserves the size and shape of segmented regions. Segmented areas are filled with the original intensity values of the enhanced image data for further processing. The key aspect of this approach is adaptive threshold computation for each input image. Dynamic thresholding is an effective technique for accurate ROI extraction while reducing the background noise. To compute the dynamic threshold \(T\) of the image \({I}^{2}\), we used the following algorithm:

1. Compute the initial threshold \(t\) value as:

$$t = mean\,(mean(I^{2} ))$$
(3)

2. Divide image into parts \({p}^{1}\& {p}^{2}\)

$$p^{1} = \left\{ {\begin{array}{*{20}c} {I^{2} \left( {i, j} \right), I^{2} \left( {i, j} \right) \le t } \\ {0, Otherwise} \\ \end{array} } \right.$$
(4)
$$p^{2} = \left\{ {\begin{array}{*{20}c} {I^{2} \left( {i, j} \right), I^{2} \left( {i, j} \right) > t } \\ {0, Otherwise} \\ \end{array} } \right.$$
(5)

3. Compute average mean values of both \({p}^{1}\& {p}^{2}\)

$$m^{1} = mean\,(mean(p^{1} ))$$
(6)
$$m^{2} = mean\,(mean(p^{2} ))$$
(7)

4. Compute the final threshold value of \({I}^{2}\) as

$$T = (m^{1} + m^{2} )/2$$
(8)

Figure 4 demonstrates the outcome of the proposed segmentation approach for accurate and robust ROI extraction from the enhanced chest X-ray images. The outcomes show the importance of both enhancement and ROI extraction. The ROI of both test samples were extracted accurately regardless of their modalities. The input images are blurry and low contrast on lung regions. However, the proposed ROI extraction shows the detection of lung regions correctly that helps to estimate the accurate lung-specific features for detection and severity analysis.

Fig. 4
figure 4

Illustrations of ROI extraction of raw X-ray images

4.3 Feature extraction and normalization

The ROI extracted images have been used for feature extraction. We consider different categories of features such as visual, texture, intensity, and geometric moment features. The visual features are extracted by HOG descriptor. The different types of texture features have been extracted using Gray level co-occurrence matrix (GLCM) with four offsets. Eight types of intensity feature and geometric moment invariant features have been extracted from ROI. From each ROI image, a total of 36 features are extracted that consists of 4 HOG features, 8 Intensity features, 8 geometric moment features, and 16 texture features. The idea of selecting these features is to build a lung-specific feature vector for accuracy enhancement with minimum computational efforts. There are other kinds of feature extraction techniques such as scale invariant feature transform (SIFT), speeded-up robust features (SURF), discrete wavelet transform (DWT), local binary patterns (LBP) which may be explored and exploited for efficient learning and classification.

4.3.1 HOG features

The HOG is a type of visual features descriptor that is similar to SURF and SIFT. HOG represents the image gradients distribution in different orientations. HOG focuses on the shape/structure of an object and extracts edge directions. It works as:

  • Divide the ROI image into number of blocks of size 16 × 16 and each block divided into four cells of size 8 × 8.

  • Compute orientations and gradient directions of each cell.

  • Compute the histogram of gradient directions and orientations of each cell.

  • Link orientation histograms of cells to represent block.

  • Concatenate all blocks to form the HOG features.

As the HOG returns a 2D array of features, we applied the statistical methods to compute the 4 HOG features which are mean, standard deviation, variance, and max.

4.3.2 Texture features

The well-known GLCM technique is used to extract the 16 texture features of 4 GLCM properties- contrast, correlation, energy, and homogeneity. We first compute the GLCM of ROI image using four offset [0 1;− 1 1; − 1 0; − 1 − 1:] as:

$$Gm = glcm\,(I^{3} ,[0\,1; - 11; - 10; - 1 - 1])$$
(9)

Using \(Gm\), four texture features computed of size \(1\times 4\) of each. This builds the \(1\times 16\) feature vector for each input ROI image. Let, \(Gm\) is the GLCM matrix and \(L\) is maximum possible quantized value. Table 1 shows the texture features with equations.

Table 1 Texture features

4.3.3 Intensity features

The lung-specific features must include the intensity features for reliability and efficiency of classification and severity analysis. Thus, we have extracted 8 intensity features from the ROI images which are mean, standard deviation, skewness biased, skewness corrects, kurtosis biased, kurtosis corrects, entropy, and maximum. Table 2 shows these parameters to be applied on ROI image \({I}^{3}\) of size \(m \times n\). The skewness with biased means that it tends to differ from the population skewness (represented by 0) and skewness with correct means it corrects for systematic bias (represented by 1). Same scenario is applicable for kurtosis also.

Table 2 Intensity features

4.3.4 Geometric invariant moments

The geometric moment features of different orders are extracted from the ROI image. Total 8 geometric features have been extracted in this work. The image of \(i-j\) plane with non-zero elements is having the moments. The geometric moment of order \((p, q)\) for a 2D ROI image is computed as:

$$m_{pq} = \mathop \sum \limits_{i = 1}^{m } \mathop \sum \limits_{j = 1}^{n} i^{p} j^{q} I^{3} \left( {i,j} \right)$$
(25)

Geometric moments of lower orders have an intuitive meaning: m00 is a “mass” of the ROI, m10/m00 and m01/m00 define the centroid of ROI image. Second-order moments m20 and m02 represents the “distribution of mass” of the ROI with respect to the coordinate axes. In the case of geometric moments, we have central geometric moments of order \((p, q)\):

$${\mu }_{pq}=\sum_{i=1}^{m }\sum_{j=1}^{n}(i-\overline{i }{)}^{q}{I}^{3}(i,j)$$
(26)

where, \(\overline{i }=\frac{{m}_{10}}{{m}_{00}}\) and \(\overline{j }=\frac{{m}_{01}}{{m}_{00}}\) are the coordinates of the object centroid. In this way, we have computed 8 moments as described below:

$${m}_{00}=\sum_{i=1}^{m }\sum_{j=1}^{n}{I}^{3}(i,j)$$
(27)
$${m}_{10}=\sum_{i=1}^{m }\sum_{j=1}^{n}{i I}^{3}(i,j)$$
(28)
$${m}_{01}=\sum_{i=1}^{m }\sum_{j=1}^{n}{j I}^{3}(i,j)$$
(29)
$${\mu }_{11}=\sum_{i=1}^{m }\sum_{j=1}^{n}\left(i-\overline{i }\right)(j-\overline{j }){ I}^{3}(i,j)$$
(30)
$${\mu }_{12}=\sum_{i=1}^{m }\sum_{j=1}^{n}\left(i-\overline{i }\right)(j-\overline{j }{)}^{2}{I}^{3}(i,j)$$
(31)
$${\mu }_{21}=\sum_{i=1}^{m }\sum_{j=1}^{n}((i-\overline{i }))^{2}(j-\overline{j }){ I}^{3}(i,j)$$
(32)
$${\mu }_{30}=\sum_{i=1}^{m }\sum_{j=1}^{n}((i-\overline{i }))^{2}((j-\overline{j }))^{2}{ I}^{3}(i,j)$$
(33)
$${\mu }_{30}=\sum_{i=1}^{m }\sum_{j=1}^{n}{ ((i-\overline{i }))^{3}I}^{3}(i,j)$$
(34)

4.3.5 Feature normalization

After the extraction of 4 visual features (\(F1\)), 16 texture features (\(F2\)), 8 intensity features (\(F3\)), and 8 geometric moment features (\(F4\)), we form one features vector F as:

$$F=\{F1, F2, F3, F4\}$$
(35)

Since the different kinds of features extracted from the ROI image, therefore, the feature vector contains significant variations. These features with a higher range play a decisive role in the training process of machine learning algorithms. As the machine learning methods focused on numbers without understanding their significance, the feature set with great variations may lead to inaccurate classification results. Also, a raw set of features takes a longer convergence time for neural networks. Thus, feature normalization is required to enhance speed and accuracy of the training network. Normalization is used to bound features in range of 0–1. Various normalization methods are available in the literature such as min–max, standard vector, power transform, unit vector, and max absolute normalization. We have applied min–max and robust normalization methods as represented in Eq. (36) and Eq. (37) respectively.

$${F}^{\mathrm{min}\_max }= \frac{(F-\mathrm{min}\left(F\right))}{(max\left(F\right)-\mathrm{min}\left(F\right))}$$
(36)
$${F}^{robust}= -sign\left(F\right)\times log10|F|$$
(37)

4.4 Soft computing techniques

We have used conventional soft computing methods ANN, SVM, KNN, and Ensemble for detection and classification. The deep learning method RNN using LSTM called RNN-LSTM is proposed for automatic detection and accuracy enhancement of lung diseases. All methods (ANN, SVM, KNN, Ensemble, and RNN-LSTM) have been applied on 70% datasets for training and validation, and 30% for testing. For evaluation, we supplied the complete datasets to all soft computing methods and then partition the dataset into a 70:30 ratios before training of the network. As the implementation of ANN, SVM, KNN and Ensemble learning are already available and discussed in several literatures, this section is focused on the discussion of RNN-LSTM approach only.

Recurrent neural networks: The basic RNN function takes input as temporal sequence of vectors \(({x}_{1}, {x}_{2}, ..., {x}_{T} )\) and delivers outputs as high-level representation of sequence of vectors \(({h}_{1}, {h}_{2}, ..., {h}_{T} )\). These high-level representations are generated by non-linear transformation of the input sequence from \(t = 1 to T\) as:

$${h}_{t}=\int \left({W}_{{x}_{t}}+{Hh}_{t-1}+b\right)$$
(38)
$${y}_{t}=softmax\left({W}_{y}{h}_{t}+b\right)$$
(39)

where \(f\) is a non-linear function applied element-wise, and \({y}_{t}\) is the softmax probabilities of the events having seen the observations up to \({x}_{t}.\) \(W, H, b, {W}_{y}, {b}_{y}\) are the parameters that are learned. In a standard RNN, a common choice for \(f\) is \(tanh\) or \(sigmoid\). RNNs with selection of \(f\) suffers from a problem of vanishing gradients, and hence are poor at capturing long temporal dependencies, which are essential for early detection. A simple solution to vanishing gradients is to substitute tanh nonlinearities with LSTM cells. In this work, we have used the RNN using LSTM for disease detection and classification.

Long-short term memory cells: LSTM is a system of neurons that executes a memory cell. The fundamental purpose behind using LSTM is that the memory cell can manage its environment over time. If connected with RNN, LSTM acknowledges the recurrent network to retain long term context dependencies. LSTM is made up of input gate \(i\), output gate \(o\), and forget gate \(f\), and a memory cell \(c\). For every time \(t\), LSTM computes its gate’s activations {\({i}_{t}\),\({f}_{t}\)} and updates its memory cell from \({c}_{t-1}\) to \({c}_{t}\), it then computes the output gate activation \({o}_{t}\), and finally outputs a hidden representation \({h}_{t}\). The inputs to LSTM are the observations \({x}_{t}\) and the hidden representation from the previous time step \(ht-1\). Following equations are applied in LSTM for update functions:

$${i}_{t}=\sigma \left({W}_{i}{x}_{t}+{U}_{i}{h}_{t-1}+{V}_{i}{c}_{t-1}+{b}_{i}\right)$$
(40)
$${f}_{t}=\sigma \left({W}_{f}{x}_{t}+{U}_{f}{h}_{t-1}+{V}_{i}{c}_{t-1}+{b}_{f}\right)$$
(41)
$${c}_{t}={f}_{t}{\varvec{\theta}}{c}_{t-1}+{i}_{t}{\varvec{\theta}}\mathrm{tanh} \left({W}_{c}{x}_{t}+{U}_{c}{h}_{t-1}+{U}_{c}{h}_{t-1}\right)$$
(42)
$${o}_{t}=\sigma \left({W}_{o}{x}_{t}+{U}_{o}{h}_{t-1}+{V}_{i}{c}_{t-1}+{b}_{o}\right)$$
(43)
$${(c}_{t})={o}_{t}{\varvec{\theta}}\mathrm{tanh} \left({c}_{t}\right)$$
(44)

where \(\theta\) is an element-wise product, and \(\sigma\) is the logistic function, and \(tanh\) are applied element-wise. \({W}_{*}\), \({V}_{*}\), \({U}_{*}\), and \({b}_{*}\) are the parameters, and the weight matrices V ∗ are diagonal. The input and forget gates cooperate in refreshing the memory cell. The forget gate examines the memory part to forget, and the input gate estimates new values according to the current view written to the memory cell. The output gate together with the memory cell estimates the hidden description. As the LSTM cell activation includes summation over time and derivatives are distributed over sums, the gradient in LSTM gets scattered over a longer time are vanishing. The non-linear function is replaced by LSTM in RNN to capture long temporal dependencies. The recurrent LSTM operation in the F-RNN-LSTM framework represented as:

$$(h_{t} ,c_{t} ) = lstm\,(x_{t} ,h_{t - 1} ,c_{t - 1} )$$
(45)

5 Simulation results

This section presents the experimental results and performance evaluation of the proposed lung disease detection model. The proposed model was implemented in MATLAB for two publically available lung disease datasets on the Kaggle repository (https://www.kaggle.com). We first investigate the proposed model for different soft computing and features normalization methods. Then, the proposed model F-RNN-LSTM is compared with the similar studies in terms of overall accuracy and training and detection time. The first dataset is the Covid-19 Radiography Database (C19RD) collected from Kaggle (https://www.kaggle.com/tawsifurrahman/covid19-radiography-database). The C19RD is the recent collection of chest X-ray images in Qatar University. The C19RD consists of 2905 samples in three classes- normal (1341), Covid-19 (219), and viral pneumonia (1345). Chest X-Ray Images for Pneumonia (CXIP) (Kermany et al. 2018) is the second lung disease dataset collected from Kaggle (https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia). CXIP dataset consists of 5856 samples in three categories- normal (1583), bacterial pneumonia (2790), and viral pneumonia (1483). The objective of the first dataset is to distinguish between Covid-19 pneumonia and viral/bacterial pneumonia and similarly, second dataset is used to distinguish between viral pneumonia and bacterial pneumonia. Both datasets are divided into 70% training and 30% testing for the analysis of the proposed methodology. Tables 3 and 4 show the other statistics of both datasets. Sections 5.1 and 5.2 provide the simulation results for C19RD and CXIP datasets respectively. The proposed methodology has been analyzed using detection accuracy, F1-score, precision, recall, and specificity. Section 5.3 presents the comparative analysis with the existing methods in terms of accuracy and computational time.

Table 3 Statistics of C19RD dataset
Table 4 Statistics of CXIP dataset

5.1 Experiments with C19RD dataset

The performance of the proposed model is shown in Figs. 5, 6 for C19RD dataset. The experimental results have been shown for different soft computing techniques (SVM, ANN, KNN, Ensemble, and RNN-LSTM) and feature normalization (without normalization, min–max normalization, and robust normalization). Figures 5, 6, 7, 8, 9 demonstrate the detection accuracy, precision, recall, specificity, and F1score respectively. The quantitative observation of the C19RD dataset has been presented in Tables (up to two decimal places only) for accuracy, precision, recall, specificity, and F1-score respectively. These parameters are computed with aid of four measures, namely True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). The TP, TN, FP and FN have been defined in Table 5.

Fig. 5
figure 5

Experimental analysis using C19RD dataset: A features normalization analysis, B soft computing techniques analysis

Fig. 6
figure 6

Precision analysis using C19RD dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 7
figure 7

Recall analysis using C19RD dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 8
figure 8

Specificity analysis using C19RD dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 9
figure 9

F1-score analysis using C19RD dataset: A features normalization analysis and B soft computing techniques analysis

Table 5 Performance evaluation parameters

TP = correct samples identified as correct samples (correctly identified).

FP = incorrect samples identified as correct samples (incorrectly identified).

TN = incorrect samples identified as incorrect samples (correctly rejected).

FN = correct samples identified as incorrect samples (incorrectly rejected).

The results demonstrate the outcomes of soft computing and feature normalization methods using the proposed computer vision approaches for image enhancement and ROI extraction for the C19RD dataset. Among all the features normalization techniques, the robust feature normalization method improved accuracy, precision, recall, specificity, and F1-score metrics, compared to raw features and min–max normalization techniques. The raw features without applying any feature normalization led to poor classification performance. It clearly highlights the importance of using feature normalization in machine learning applications. The detection accuracy (Fig. 5 and Table 6) shows that the min–max normalization improved the accuracy approximately by 4%, and robust normalization improved the accuracy approximately by 6% when compared to raw features. A similar trend has been observed for precision (Fig. 6 and Table 7), recall (Fig. 7 and Table 8), specificity (Fig. 8 and Table 9), and F1-score (Fig. 9 and Table 10) for the C19RD dataset.

Table 6 Detection accuracy performance for C19RD
Table 7 Precision analysis for C19RD dataset
Table 8 Recall performance for C19RD dataset
Table 9 Specificity computation for C19RD dataset
Table 10 F1-score analysis for C19RD dataset

We have shown the results for the soft computing methods using the C19RD dataset. Among all the soft computing methods, the proposed deep learning model F-RNN-LSTM showed enhanced lung disease classification performance. The performance of the KNN classifier is found to be poor than the all-other techniques using raw features, min–max normalization, and robust normalization. The ANN classifier has shown the second-best performance among all the soft computing techniques. Using the deep learning model RNN-LSTM, detection accuracy, and F1-score performance is increased by 4% approximately to the ANN. The proposed model F-RNN-LSTM using robust-normalization delivered the best classification performances with the highest accuracy and F1-score for the C19RD dataset.

5.2 Experiments with CXIP dataset

Similar to the C19RD dataset, we have evaluated the proposed soft computing and features normalization using the CXIP dataset. Compared to C19RD, the CXIP dataset contains more chest X-ray images. The results have been shown in Figs. 10, 11, 12, 13, 14 which demonstrate the performance of soft computing and features normalization techniques for the CXIP dataset. Figure 10 (Table 11) shows the detection accuracy analysis for each soft computing and feature normalization approaches. The robust normalization using the RNN-LSTM model provides higher accuracy in comparison to the other techniques. The robust normalization scales the features effectively in the range [0, 1] compared to the min–max approach. In min–max normalization, due to significant variations in features range, it discards few features.

Fig. 10
figure 10

Accuracy analysis using CXIP dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 11
figure 11

Precision analysis using CXIP dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 12
figure 12

Recall analysis using CXIP dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 13
figure 13

Specificity analysis using CXIP dataset: A features normalization analysis and B soft computing techniques analysis

Fig. 14
figure 14

F1-score analysis using CXIP dataset: A features normalization analysis and B soft computing techniques analysis

Table 11 Detection accuracy analysis for CXIP dataset

The precision rate (Fig. 11 and Table 12) claims that the F-RNN-LSTM using robust feature normalization achieved highest precision. Similarly, the recall rate (Fig. 12 and Table 13) claims that F-RNN-LSTM using robust feature normalization achieved 5% and 3% recall performance improvement using RNN-LSTM soft computing and robust feature normalization techniques respectively. The specificity performance is demonstrated in Table 14 and Fig. 13 using the CXIP dataset. The specificity of robust normalization improved by 7% approximately and RNN-LSTM is enhanced by 11% approximately. Finally, the F1-score performances demonstrated in Fig. 14, and Table 15 shows the overlapping outcomes with detection accuracy. The performance of the F1-score mainly depends on the precision and recall parameters. The performance improvement of the deep learning model RNN-LSTM and robust features normalization is significant for lung disease classifications.

Table 12 Precision analysis for CXIP dataset
Table 13 Recall analysis for CXIP dataset
Table 14 Specificity performance for CXIP dataset
Table 15 F1-score analysis for CXIP dataset

5.3 State-of-the-art evaluations

We have studied a number of recent CNN-based techniques for lung diseases classification into pneumonia and Covid-19 using chest X-ray images. This section focuses on the state-of-the-art evaluations of the proposed F-RNN-LSTM model (using RNN-LSTM and robust feature normalization) along with the efficient image quality enhancement and adaptive ROI extraction. We implemented and evaluated the existing methods Chest Disease Detection using CNN (CDD-CNN) (Abiyev et al. 2018), Covid-19 Detection using Deep Learning (CDDL) (Pham 2021), COVIDetectioNet (Turkoglu 2021), CNN using ResNet23 (CNN-RN) (Butt et el. 2020), ResNeXt-50 (Hira et al. 2020), CNN using Ensemble approach (CNN-E) (Gianchandani et al. 2020). These all methods have been implemented with common hyperparameters such as the number of epochs (70), minimum batch size (27), gradient threshold (1), and execution environment (CPU). The number of the hidden layers has been set for 100 in the proposed RNN-LSTM model. Under this hyperparameters setting, we have received the best classification performances. We have selected these methods as these are closely related to the proposed model of lung disease detection using the chest X-ray images datasets. Additionally, these methods claimed significant results for the chest X-ray image datasets. All these methods with the proposed model have been implemented and evaluated on an Intel i5 processor with 4 GB RAM. We Comparative analysis of the proposed model for C19RD and CXIP datasets with the aforementioned methods have been shown in Tables 16, 17 for detection accuracy and average training and detection time. The bold values in Tables 16, 17 indicate the highest detection accuracy and lowest training and detection time which is desired and achieved by the proposed F-RNN-LSTM approach.

Table 16 Comparative analysis using C19RD dataset
Table 17 Comparative analysis using CXIP dataset

The experimental results clearly indicate that the proposed model F-RNN-LSTM not only improves the detection accuracy but also reduced the training and detection time significantly. The recent state-of-art methods have suffered from the high computation cost and has high detection time. In comparison to these, the proposed F-RNN-LSTM model reduces these computation efforts approximately by 50% and improves the detection accuracy by 2.5–3% approximately in comparison to the existing methods. The proposed model has reduced the computational efforts due to RNN with LSTM by using sequential learning that takes less time to compute the anticipation probabilities of each class during the training and disease detection process. Furthermore, detection accuracy has been improved for F-RNN-LSTM due to quality enhancement, accurate and robust ROI extraction, and lung-specific feature extraction. The existing techniques relied on pre-defined deep learning models for high dimensional feature extraction which limits the detection accuracy. Another purpose of extracting lung-specific features is to provide severity analysis for appropriate treatment purposes which is possible only with the proposed F-RNN-LSTM model. The outcome of the proposed model may be beneficial to the internet of things (IoT) enabled smart healthcare systems where disease detection and severity analysis is highly required (Patel et al. 2020; Mahajan et al. 2018; Mahajan and Badarla 2019; Mahajan et al. 2020). The proposed method can be further enhanced by using a number of deep models and their ensembles as for chest X-ray image datasets for Covid-19 detection (Singh et al. 2020, 2021).

6 Conclusions and future work

In this work, we have proposed an approach for lung disease detection from chest X-ray images for Covid-19 and pneumonia classification. The proposed framework is based on the soft computing, machine and deep learning techniques. This model in quite different from the existing Covid-19 detection methods in the sense that it considers the lung specific features only by considering a proper image enhancement, followed by ROI based feature extraction and normalization. In order to enhance the classification performance with minimum computational efforts, we have proposed a deep learning model F-RNN-LSTM. The results have been shown for soft computing and machine learning techniques such as SVM, ANN and KNN as well as for F-RNN-LSTM using the publically available C19RD and CXIP datasets. Due to feature normalization and efficient feature vector representation, the proposed model F-RNN-LSTM shows the effective performance in term of detection accuracy, precision, recall, F1-score and specificity. Moreover, the state-of-the-art evaluation of the proposed method has been performed with Chest Disease Detection using CNN (CDD-CNN) (Abiyev et al. 2018), Covid-19 Detection using Deep Learning (CDDL) (Pham 2021), COVIDetectioNet (Turkoglu 2021), CNN using ResNet23 (CNN-RN) (Butt et al. 2020), Se-ResNeXt-50 (Hira et al. 2020), CNN using Ensemble approach (CNN-E) (Gianchandani et al. 2020) in terms of the detection accuracy and training and detection time. It has been found that the proposed F-RNN-LSTM model provides better accuracy approximately 95% with low computational efforts (approximately 50% of the compared methods). The proposed model can be further enhanced in terms of adaptive model building and severity analysis for additional datasets with multiple classes (more than 5) and deep models as reported in the recent literatures (Singh et al. 2020, 2021).