Introduction

Coronavirus Disease 2019 (COVID-19) is an infectious disease that started to proliferate from Wuhan China, in December 2019 [1]. Within a short period of time, this disease is ravaged every corner of the world and the World Health Organization declared this disease as a pandemic on 11 March 2020 [2]. This disease is caused by the strain of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). In July 2020, cases reached almost 12 million worldwide, and death due to this disease kept rising day by day, and the death toll is 562,039 [3]. From the Worldometers data, the total deaths and total cures (based on month) is illustrated in Fig. 1 [3]. Observing the statistics and properties of COVID-19 it can be asserted that this life-threatening virus can unfurl from individual to individual via cough, sneezing, or even close contact. As a result, it has become important to detect the affected people earlier and isolate them to stop further spreading of this virus.

RT-PCR is a procedure of assembling samples from a region of a person’s body, where the coronavirus is most likely to congregate, such as a person’s nose or throat. Then this sample goes through a process called extraction, which separates the genetic data from any virus that may exist. A particular chemical along with a PCR machine (thermal cycler) is applied, which originates a reaction that creates millions of copies of SARS-CoV-2 virus’s genome. A beam of light is produced by one of the chemicals if SARS-COV-2 is present in the sample. The beam of light is traced by the PCR machine which indicates a positive test result for the existence of the coronavirus.

Though RT-PCR can distinctly identify coronavirus disease, it has a high false-negative rate, where the model predicts the result as negative but actually, it is positive (false-negative). Furthermore, in many regions of the world RT-PCR’s availability is limited. Hence, medical images such as Computer Tomography (CT) and X-ray images can be the next best alternative to detect this virus as most of the medical or hospital commonly have this apparatus to generate images. Also, CT or X-ray images are readily available, where there is no RT-PCR. Moreover, RT-PCR is expensive and consumes a considerable amount of time for the identification. In addition, proper training is required for the health workers to collect samples for PCR, whereas it is relatively easy to handle and produce CT and X-ray images.

Fig. 1
figure 1

Total case, total death, and total cured (by month) from worldometer [4]

To work on these medical images, deep learning methods are the most conventional and might be the only direction. Deep Learning is an emerging field that could play a significant role in the detection of COVID-19 in the future. Till now researchers have used machine learning/deep learning models to detect COVID-19 using medical images such as X-ray or CT images and obtained promising results. Many researchers also used transfer learning, attention mechanism [5], and Gradient-weighted Class Activation Mapping (Grad-CAM) [6] to make their results more accurate. Shi et al. [7] and Ilyas et al. [8] discussed some artificial intelligence-based models for diagnosis of COVID-19. In addition, Ulhaq et al. [9] reviewed some papers that worked on diagnosis, prevention, control, treatment, and clinical management of COVID-19. Besides, Ismael et al. [10] approached different types of Machine Learning and Deep Learning techniques COVID-19 detection working on X-ray images. Furthermore, a majority voting-based enseble classifier technique is employed by Chandra et al. [11]. However, as time goes by researchers are finding advanced and improved architectures for the diagnosis of COVID-19. In this paper, we have tried to review these new methods alongside with the basic structures of the earlier COVID-19 classification models. This survey will cover the research papers that are published or in pre-print format. Although it is not the most favorable approach due to the likelihood of below standard and research without peer-review, we intend to share all proposals and information in a single place while giving importance to the automatic diagnosis of COVID-19 in X-ray and CT images of lungs.

The fundamental aim of this paper is to systematically summarize the workflow of the existing researches, accumulate all the different sources of data sets of lung CT and X-ray images, sum up the frequently used methods to automatically diagnose COVID-19 using medical images so that a novice researcher can analyze previous works and find a better solution. We oriented our paper as follows:

  • First, the Data set source and different types of images used in the papers are described in “COVID-19 Dataset and Resouce Description”.

  • Second, the methodology where data preprocessing and augmentation techniques, feature extraction methods, classification, segmentation, and evaluation that researchers obtained are charactized in “Methodologies”.

  • Finally, a discussion is made to aid the new researcher to find future works in detecting COVID-19.

COVID-19 Data Set and Resouce Description

The diagnosis of any disease is like the light at the end of the tunnel. In the case of the COVID-19 pandemic, the importance of earlier diagnosis and detecting the disease is beyond measure. The initial focus must be on the data by which we need to efficiently train a model. This data will help Machine Learning (ML) or Deep Learning (DL) algorithms to diagnose COVID-19 cases. Due to the disadvantages of RT-PCR, researchers adopted an alternative method which is the use of Artificial Intelligence on chest CT or X-ray images to diagnose COVID-19. Fundamentally, a chest CT image is an image taken using the computed tomography (CT) scan procedure, where X-ray images are captured from different angles and compiled to form a single image. A depiction of the CT images (COVID-19 infected and Normal) is illustrated in Fig. 2.

Fig. 2
figure 2

Lung CT-scan images a COVID-19 affected, b normal

Although a CT scan consumes less time to demonstrate, it is fairly expensive. As a result, many researchers adopted X-ray images instead of CT images to develop a COVID-19 detection model. A chest X-ray is a procedure of using X-rays to generate images of the chest. In addition, it is relatively economical and convenient to maintain. X-ray images of different people with COVID-19, viral pneumonia, bacterial pneumonia, and a person without any disease (normal) are shown in Fig. 3. Furthermore, in this section, an overview of the data set sources used in the existing papers is characterized and data sets of both CT and X-ray images are illustrated and covered in this section.

Fig. 3
figure 3

X-ray images a COVID-19, b viral pneumonia, c bacterial pneumonia, d normal from COVID19-XRay-data set

Data Set and Its Sources

Nowadays, the exchange of information between researchers and physicians creates difficulties due to the lockdown phase. Hence, massive COVID-19 data are out of reach or difficult to find for many researchers. As a deep learning architecture needs a considerable number of images to learn a model appropriately and efficiently, the existing COVID-19 automation researches are still in preliminary stages. However, some COVID-19 data sets are proposed and employed by the researchers which show exceptional results in detecting the COVID-19 affected lungs. To corroborate a beginner researcher, we have accumulated the abstract information of the data sets and their sources. A list of the data set sources from February 2020 to June 2020 is embellished in Table 1. In the following, we will cover both CT and X-ray images and their fundamental attributes.

Table 1 Summary of different data sources used in the papers

Some of the most popular data sets were collected from the following hospitals. Xu et al. [3] collected their data set from First Affiliated Hospital of Zhejiang University, the No. 6 People’s Hospital of Wenzhou, and the No. 1 People’s Hospital of Wenling. Song et al. [64] collected their data sets from three hospitals—Renmin Hospital of Wuhan University, and two affiliated hospitals (the Third Affiliated Hospital and Sun Yat-Sen Memorial Hospital) of the Sun Yat-sen University in Guangzhou. Chen et al. [73] built their data from the Renmin Hospital of Wuhan University (Wuhan, Hubei province, China). Shi et al. [75] built their data set from three hospitals Tongji Hospital of Huazhong University of Science and Technology, Shanghai Public Health Clinical Center of Fudan University, and China–Japan Union Hospital of Jilin University. Selvaraju et al. [6] used five different hospitals data to build their data set, including Beijing Tsinghua Changgung Hospital, Wuhan No. 7 Hospital, Zhongnan Hospital of Wuhan University, Tianyou Hospital Affiliated to Wuhan University of Science and Technology, and Wuhan’s Leishenshan Hospital. Zheng et al. [62] took data from Union Hospital, Tongji Medical College, Huazhong University of Science and Technology.

CT Image Sources

As CT images are said to be detailed than X-ray images, the diagnosis of COVID-19 and developing a model becomes more convenient by employing the CT-scan images. For CT images-based works, four papers used the COVID-19 CT segmentation data set to develop a classification architecture. This data set contains hundred axial CT images from forty patients [17,18,19,20]. Chen et al. [17] and Qiu et al. [19] achieved 89% and 83.62% accuracy, respectively, using this data set. Furthermore, two authors adopted the Lung Image Database Consortium (LIDC) data set and accomplished an accuracy above 90% [12, 13]. Besides these, some authors used Societa Italiana di Radiologia Medica e Interventistica to generate data sets [15], Lung Segmentation, and Candidate Points Generation [16], COVID-CT [21], and HUG data set [14] for their purpose. A representation of these data set sources is characterized in Table 1 and depicted in Fig. 4 (based on months). From the table, we can infer that the COVID-19 CT segmentation data set was used mostly in April 2020 and Lung Image Database Consortium (LIDC) data set was used in March 2020 and June 2020 [17,18,19,20]. Some researchers also used other lung disease images apart from these mostly used data sets. Nevertheless, they collected CT images from different hospitals to build these data sets.

Fig. 4
figure 4

Bar chart showing seven publicly available CT data sets used from March, 2020 to June, 2020

X-ray Image Sources

X-ray image data set is more available than the CT images as the cost of capturing an X-ray image is considerably more economical than a CT image. Studying the existing literature, most of the authors used the COVID-chest X-ray data set [32, 42, 47, 63]. Moreover, Kaggle RSNA Pneumonia Detection Data set [35, 45, 50], COVID-19 database [33, 52, 54], Chest X-ray Images (Pneumonia) is adopted to evaluate their model [27, 46, 59]. These are the most common data set for Chest X-ray-based COVID-19 research (Table 1). However, these data sets contain a limited number of COVID-19 infected lung images which is not efficient to train a deep learning model as the model can overfit the data. For this purpose, most of the researchers utilized different preprocessing techniques to increase the data set size, one of them is data augmentation. Furthermore, the existing works are trained on a hybrid data set combining the COVID-19 data set and normal lung images from another repository. For X-ray-based works, Al-antari et al. [44] used COVID-19 Radiography Database for alternative lung diseases. An illustration of the eighteen X-ray data set usage is depicted in Fig. 5. From there it can be noticed that the COVID-chest x-ray data set was used by most of the authors followed by Kaggle’s Chest X-ray images (Pneumonia) which was used mostly in March 2020, April 2020 and June 2020. Some papers also used both CT and X-ray images from the COVID-19 X-rays and BIMCV COVID-19+ data sets. From both Figs. 4 and 5, it can be observed that BIMCV COVID-19+ emerged in June 2020 in terms of developing a COVID-19 classification model.

Fig. 5
figure 5

Bar chart showing 18 publicly available X-ray data sets used from March, 2020 to June, 2020

Types and Properties of Images in the Data Set

Diseases such as Pneumonia, Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), Influenza and Tuberculosis affect the lungs such as COVID-19 which can lead to misclassification of X-ray and CT images. To avoid this problem, researchers have adapted their data set to have images of diseases affecting similar regions as COVID-19. Moreover, it is important to correctly distinguish COVID-19 patients from people who do not have COVID-19. For this purpose, the authors also used normal lung images collected from healthy people. These data sets are developed by combining COVID-19 images, other lung disease images such as Viral pneumonia [3, 22, 23, 28, 49], Bacterial pneumonia [22, 23, 64,65,66,67], fungal pneumonia [68], SARS [60, 69,70,71], MERS [70], Influenza [13], Tuberculosis [65, 71, 72] and images of healthy people. The distribution of different types of lung disease or normal images and the number of CT images used by papers are illustrated in Table 2. There ‘Not specified (NS)’ indicates that papers used that type of image but did not state the number explicitly and ‘N/A’ indicates those types of images were not used.

Table 2 Summary of different type of lung disease and normal (healthy patients) CT images used by papers

Furthermore, the number of different types of CT images used in the papers is presented in Fig. 6. From there it can be seen that the number of COVID-19 CT images used for classification is 121,700. A total number of normal CT Images, Pneumonia CT Images, and Other lung disease CT images are 120,438, 50,268, and 15,999, respectively.

Fig. 6
figure 6

Total number of CT images of different diseases and normal CT images used from February 2020 to June 2020

In addition, the distribution of different types of X-ray images is depicted in Table 3, where the total number of different images used in fifty research works from February to June is represented. In Table 2, the distribution of different types of images was shown for thirty papers which were also from February 2020 to June 2020. Moreover, ‘Not specified (NS)’ and ‘N/A’ are used in Table 3 with the same purpose as it did in Table 2.

Table 3 Summary of different type of lung disease and normal (healthy patients) X-ray images used by papers

A depiction of the total number of COVID-19, normal, Pneumonia, and other lung disease X-ray images used by papers is shown in Fig. 7. From the figure, it can be seen that 50 papers used 21,062 COVID-19 images, 168,223 normal images, 127,456 Pneumonia images, and 114,094 other lung disease images were used in total. It can be said that more CT images of COVID-19 were used than COVID-19 X-ray images by comparing Figs. 6 and 7.

Fig. 7
figure 7

Total number of X-ray images for different disease and normal patients used from February, 2020 to June, 2020

Methodologies

After data collection, several preeminent steps must be followed to diagnose COVID-19, hence this section depicts different techniques employed by different papers. First, preprocessing techniques along with their characteristics and properties is described. Second, feature extraction methods are thoroughly discussed. After that, segmentation methods and classification techniques are reviewed. Finally, the results obtained in the existing studied papers are briefly described. The workflow of diagnosing COVID-19 from X-ray images demonstrated in Fig. 8.

Fig. 8
figure 8

Fundamental architecture of diagnosing COVID-19 from X-ray images

Preprocessing Techniques

There is a high chance that a COVID-19 data set is built with some obscure, duplicate, blur, etc. images that degrade the performance of a model. Hence, it is necessary and mandatory to perform preprocessing techniques on redundant images. Various types of preprocessing techniques can be carried out based on the difficulties of the data set. One of the major problems of deep learning is overfitting. To minimize the effect of overfitting data augmentation is used in the pre-processing stage. Resizing, scaling, cropping, flipping, rotating are the most employed data augmentation techniques. Some of these data augmentation techniques are discussed below:

  • Resizing is necessary, because the images are not always within the same estimate which postures an issue, whereas preparing the model. To generalize the data set all the images are resized into a fixed dimension such as 224 \(\times \) 224 or 299 \(\times \) 299.

  • Flipping or Rotating is done to increase the sample size of the data sets. Mainly horizontal and vertical flipping is used to do this as depicted in Fig. 9a.

  • Scaling or Cropping is the next most used augmentation technique is scaling or cropping. All the portions of the images are not necessary to use. Therefore, to reduce the redundancy researchers used the cropping method as illustrated in Fig. 9b.

  • Brightness or Intensity adjusting is mandatory to increase or reduce the brightness of the images. An example is shown in Fig. 9c.

As the COVID-19 data set is built with an insufficient number of COVID infected images, Generative Adversarial Networks (GAN) can be employed to generate COVID affected lung images which can be a path to avoid overfitting or data insufficiency. GAN is an unsupervised learning process structured on generative modeling embedded with deep learning architectures. It finds the patterns, similarities in the input data sets and generates new data which is similar to the input data set. GAN [93] increases the sample size in the data set but the quality of the samples is not guaranteed.

Fig. 9
figure 9

Some examples of applying Pre-processing Techniques [a flipping by \(180^{\circ }\), b cropping, and c adjusting brightness]

Table 4 Summary of the preprocessing and augmentation methods used by the papers

A representation of the papers—applying augmentation techniques on their model is characterized in Table 4 and the percentage usage of these augmentation techniques is depicted in Fig. 10. From there it can be seen that resize and flipping has the highest percentage of 27.9% and 27.0%, respectively. Scaling or Cropping, Contrast Adjusting, Brightness Adjusting, and GAN is 22.1%, 12.3%, 7.4%, and 3.3%, respectively. Besides these techniques, some authors used various traditional image preprocessing techniques such as Histogram Equalization [70], Adaptive Winner Filter [80], Affine Transformation [29, 40], Histogram Enhancement [40], Color Jittering [29].

Fig. 10
figure 10

Pie chart illustrates the augmentation techniques used by different papers (Here the percentage of usage of six different augmentation techniques is shown)

Segmentation

It is necessary to train a model with the most significant features as unnecessary features or image region discredit the performance of the model. Therefore, extracting the Region of Interest (ROI) is the preeminent task before the training stage. For that purpose, segmentation comes into the hand as it can segregate the irrelevant and unnecessary regions of an image. In digital image processing and computer vision, image segmentation is defined as the technique of partitioning a digital image into different segments based on some pre-defined criteria, where a segment delineates as a set of pixels. Like other areas of medical image processing, segmentation boosts the effectiveness of COVID-19 detection by finding the ROI such as the lung region. Areas of the image that are redundant and not related to the significant feature area (out of the lung) could meddle the model performance. Using segmentation methods, only ROI areas are preserved which reduces this adverse effect of considering the out of the boundary features. Segmentation can be carried out manually by radiologists, but it takes a substantial amount of time. Several open-source automatic segmentation methods, such as region-based, edge-based, clustering, etc., are feasible to adopt. In the following, we will try to describe the prominent segmentation architecture and their properties.

The U-Net architecture is built with the help of Convolutional Neural Network (CNN) and it is modified such that it can achieve better segmentation in the domain of medical imaging [55]. The main advantage of U-Net is that the location information from the downsampling path and the contextual information in the upsampling path are combined to get general information—containing context and localization, which is the key to predicting a better segmentation map. U-Net-based strategies were utilized in [12,13,14, 17, 18, 38, 40, 61, 62, 66, 73, 74, 76, 77, 80, 81, 94, 95] for efficient and programmed lung segmentation extracting the lung region as the ROI.

For CT images, to keep contextual information between slices some researchers applied 3D versions of U-Net for lung segmentation named 3D U-Net ([3, 76]). Due to the low contrast at the infected areas in CT images and because of a large variety of both body shape, position over diverse patients, finding the infected areas from the chest CT scans was very challenging. Considering this issue, Narin et al. [27] developed a deep learning-based network named VB-Net. It is a modified 3D convolutional neural network based on V-Net [96]. In some other existing works, this segmentation method is adopted which alleviates the performance of the model [75, 83]. SegNet is also an efficient architecture for pixelwise denotation segmentation [97].

Segmentation methods, such as U-Net, Dense-Net, NABLA-N, SegNet, DeepLab, etc., were also used for the segmentation of lung images in different papers. The different segmentation methods used by different papers are illustrated in Table 5 and the number of papers in which a specific segmentation method is used is shown by a bar chart in Fig. 11.

Table 5 Summary of different segmentation methods used in COVID-19 detection
Fig. 11
figure 11

Bar chart showing number of times different segmentation models used in different papers

Feature Extraction Methods

Feature extraction is an essential step for classification as the extracted features provide useful characteristics of the images. For image feature extraction, Deep Neural Networks have extraordinary capabilities to extract the important features from a large-scale data set. As a result, these are used extensively in computer vision algorithms and CNN which is also known as ConvNet. In the following, some of the feature extraction models are briefly described.

Convolutional Neural Network (CNN)

In visual imagery fields, CNN architectures are mostly employed and adopted methods [100]. A CNN architecture is built with various types of network layer—pooling layer, convolutional layer, flatten, etc. corroborating the development and performance of a model.

Convolution layer is the core building block of a CNN. The layer’s parameters are made up of a set of discoverable kernels or filters which have a little responsive field but enlarge through the full input volume. Non-linear layer is the layer, where the change of the output is not proportional to the change of the input. This layer uses activation functions to convey non-linearity to data by adding after each convolution layer. Used activation functions can be Rectified Linear Unit (ReLU) [101], Tanh, etc.

Pooling layer is another important part of CNN architecture, where it is used to downsize the matrix. Pooling can be done in several methods: Max Pooling, Min Pooling, Average Pooling, and Mean Pooling. Fully connected layer is the layer, where every Neuron of a layer is connected with every other neuron of another layer. Traditional Multilayer Perceptron neural networks (MLP) and this layer have common principles.

Existing Pre-trained CNN Models

Most of the COVID-19 diagnosis architectures used various pre-trained CNN models. A representation of the usage of this pre-trained model is shown in Table 6 (CT images) and Table 7 (X-ray images). To work with CT images, Residual Network (ResNet) [102], Densely Connected Convolutional Network (DenseNet) [103], Visual Geometry Group (VGG) [104], SqueezeNet [49] architecture are the most adopted pre-trained architectures by researchers (Table 6) and ResNet [102], DenseNet [103], VGG [104], Inception [105] [106], InceptionResNet [107] models are employed for X-ray images (Table 7). Some of the most used existing pre-trained CNN models are described in the following.

ResNet [102] is a CNN architecture, designed to enable hundreds or thousands of convolutional layers. While previous CNN architectures had a drop off in the effectiveness of additional layers, ResNet can efficiently add a large number of layers leads to strong performance as an outcome of the model. ResNet is convenient and efficient for data-driven approaches. It has different variants, such as ResNet18, ResNet169, ResNet50, ResNet152, etc. focusing on distinct perspectives. Moreover, studying the works we can infer that, ResNet is the most used architecture for both CT and X-ray- based COVID-19 research. Fourteen papers that have used ResNet in their proposed models for CT image-based works are shown in Table 6 and 27 papers are used for X-ray-based works are represented in Table 7.

DenseNet [103] is one of the current neural networks for visual object recognition. It is quite similar to the architecture ResNet but has some fundamental differences. This model ensures maximum information flow between the layers in the network that helps to extract the optimal features. By matching feature map size all over the network, the authors connected all the layers directly to all of their subsequent layers—a Densely Connected Neural Network, or simply known as DenseNet. DenseNet made strides in the data stream between layers by proposing these distinctive network designs. Unlike many other networks such as ResNet, DenseNets do not sum the output feature maps of the layer with the incoming feature maps but concatenate them. This architecture has different types of variants (DenseNet101, DenseNet169, DenseNet201, etc.) and it has an input shape of \(224 \times 224\). In Table 6 (CT image), DenseNet architecture is used by four papers, and from Table 7 (X-ray image), it is used by seventeen papers.

VGG [104] is another important CNN architecture for the purpose of feature extraction. VGG Network consists of 16 or 19 convolutional layers and is very convenient to demonstrate because of its very uniform architecture . In our survey, we studied four papers who work with VGG for COVID-19 detection purposes to get the features from CT images that are illustrated in Table 6, and fifteen papers from X-ray-based works are shown in Table 7.

Inception [105, 106] is a transfer learning-based method consists of two segments: feature extraction from images with the help of CNN and classification with softmax and fully connected layers . Various versions of Inception architectures are used in the medical imaging field. Among these, InceptionV1, InceptionV2, InceptionV3, and InceptionV4 are the prominent ones with an input image shape of 299 x 299. Twelve papers used an Inception-based model for X-ray-based classification of COVID-19 given in Table 6 and only one paper [82] for CT images used this model to classify COVID-19 disease.

InceptionResnet [107] is the similar architecture as InceptionV4 . InceptionResNetv1 is a half breed Initiation adaptation that encompasses a similar computational fetched to Inceptionv3. InceptionResNetV2 is a convolutional neural arrangement that is prepared on more than a million pictures from the ImageNet [108] database. The arrangement is 164 layers profound and can classify images into 1000 distinct categories. Eight papers are given in Table 7 utilized this strategy for X-ray pictures including extraction.

Table 6 Summary of image feature extraction methods used by different papers for CT images

The number of different CNN models used for CT Images is shown in Fig. 12 (based on month). For feature extraction from CT images, researchers used various types of CNN models from which ResNet is the most used architecture within these 5 months. In February 2020 three types of CNN models are used, two papers used ResNet, one paper used DenseNet, and another paper used VGG. During March 2020 four papers used ResNet, and AlexNet and FCN-8s used once. Whereas in April 2020 ResNet was used four times, three papers used VGG and SqueezeNet, CrNet, EfficientNet, GoogLeNet, and Inception are used once. Moreover, in May 2020 ResNet is used twice and SqueezeNet is used once, and finally, in June 2020 DenseNet and ResNet are used by one and two papers, respectively.

Fig. 12
figure 12

Bar chart for describing used CNN for CT images (by month)

Table 7 Summary of image feature extraction methods used by different papers for X-ray images

Different CNN models and the number of times of its usage per month is shown for X-ray images in Fig. 13, where ResNet is one of the most used models for feature extraction. In our survey, during March 2020, ResNet is used six times, DenseNet two times, VGG three times, Inception four times, InceptionResNet three times, Xception two times, and AlexNet, SqueezeNet one times each. In April 2020 ResNet is used ten times, DenseNet eight times, VGG six times, Inception three times, InceptionResNet three times and Xception five times. AlexNet, GoogLeNet, and ShuffleNet are both used twice. SqueezeNet, Inception, and InceptionResNet are used three times each. NASnet and EfficientNet are used once each time. During May 2020 ResNet is used five times, DenseNet and Inception are used two times each, VGG, SqueezeNet, and GoogLeNet three of them are used only once. Finally, in June 2020 ResNet is used seven times, DenseNet and VGG both are used five times each. Inception, InceptionResNet, Xception, AlexNet, GoogLeNet, and NASnet are used three, two, one, three, one, and two times, respectively.

Fig. 13
figure 13

Bar chart describing the use of different CNN models for X-ray images (by month)

Specialized CNN Methods for COVID-19

Some researchers developed several architectures especially for COVID-19 detection with the backbone of basic CNN. These architectures have additional capabilities to classify images into multiple classes such as COVID-19, Viral pneumonia, Bacterial Pneumonia, and Normal case. Because in the primary stage, these models are trained on ImageNet, and then it is trained on various lung diseases CT or X-ray images. In the following, a brief discussion on the ensemble or specialized CNN methods to detect COVID-19 is described.

COVID-19 Detection Neural Network (COVNet) architecture was introduced by Li et al. [74] which is a 3D deep learning architecture to detect COVID-19. This architecture can extract both 2D local and 3D global illustrative features. The COVNet architecture is built with a ResNet architecture as the base model. A max-pooling operation is used for the feature extraction which is carried out for all slices of an image. Moreover, the feature map is connected with a fully connected layer and the author used a softmax activation function for the probability score to accurately classify multiple class (COVID-19, Community-Acquired Pneumonia (CAP), and non-pneumonia).

COVID-Net architecture is specially adapted for COVID-19 detection from chest X-ray images [51]. It has high architectural diversity and selective long-range connectivity. The massive use of a projection–expansion–projection design pattern in the COVID-Net architecture is also observed for the classification. COVID-Net architecture is incorporated into a heterogeneous association of convolution layers. The proposed COVID-Net is pre-trained on the ImageNet data set and then applied to the COVIDx data set. Applying this architecture, they achieved accuracy about 93.3% on the COVIDx data set.

ChexNet is originally a DenseNet-121 type of deep network which is trained on Chest X-ray images introduced by the paper [91]. Therefore, this architecture has been specially designed to diagnose COVID-19.1024-D feature vectors are extracted for the compact classifiers in ChexNet. They used the Softmax activation function to classify COVID-19, Normal, Viral Pneumonia, and Bacterial Pneumonia. The number of trainable parameters in this model is 6955,906.

COVID-CAPS is a capsule-based network architecture invented by Afshar et al. [58]. This model consists of four convolutional layers and three capsule layers. The primary layer is a convolutional layer, then batch-normalization is attached. The second layer is also a convolutional layer, followed by a pooling layer. Correspondingly, the third and fourth layers are convolutional, and the fourth layer is reshaped as the first capsule layer. Three Capsule layers are embedded in the COVID-CAPS to perform the routing. The last Capsule layer contains the classification parameters of the two classes of positive and negative COVID-19. The trainable parameters are 295,488 for this model and achieved 98.3%.

Detail-Oriented Capsule Networks (DECAPS) architecture is introduced by Mobiny et al. [21] which uses a ResNet with three residual blocks. This architecture is trained in CT images. This model obtained an area under the curve (AUC) of 98%. Besides these, some papers adopted different types of ensemble approaches such as Details Relation Extraction neural network (DRE-Net) [64]—ResNet-50 on Feature Pyramid Network [FPN] for extracting top K details from each image. Furthermore, an attention module is combined to learn the importance of every detail. In the training stage, [75] and [13] employed the Least Absolute Shrinkage and Selection Operator (LASSO) to traverse the optimal subset of clinical–radiological features for classification. Moreover, GLCM, HOG, and LBP are used by Sethy et al. [23]. In addition, Gozes et al. [12] used commercial off-the-shelf software that detects nodules and small opacities within a 3D lung volume and subsystem.

Besides some authors applied transfer learning approach [66, 86, 88] with the basic CNN models for better results. Basically, transfer learning is a technique for foretelling modeling on a different but somehow the same problem that can then be reused partially or fully to expedite the training and develop the performance of a model on the problem. In deep learning, transfer learning [109] means regenerating the weights in one or more layers from a pre-trained network architecture in a new model and either keeping the weights fixed, fine-tuning them, or adapting the weights entirely when training the model.

Interpretability

Fundamentally, a learning model consists of algorithms that try to learn patterns and relationships from the data source. To make the results obtained from machines interpretable, researchers use different techniques, such as Class Activation Mapping (CAM), Gradient-weighted Class Activation Mapping (Grad-CAM) based on a heatmap, Local Interpretable Model-agnostic Explanations (LIME) [110], and SHapley Additive exPlanations (SHAP) [111]. CAM is a method that creates heatmaps to show the important portions from the images, especially which regions are essential in terms of the Neural Network. CAM has various versions, such as Score CAM and Grad-CAM. The heatmap generated by CAM is a visualization that can be interpreted as where in the image the neural net is searching to make its decision. LIME tries to interpret models to guess the predictions of the predictive model in specific regions. LIME discovers the set of super pixels with the most coherent connection with the prediction label. It creates explanations by generating another data set of random disturbance by turning on and off a part of the super-pixels in the image. The aim of SHAP is to describe the forecast of a feature vector by calculating the contribution of distinct feature to the forecast. This is very important in image classification and object localization problems.

In our survey, there are few papers that utilized CAM [112] and few papers [12, 14, 22, 36, 38,39,40,41,42, 47, 56, 66, 67, 74, 86] utilized Grad-CAM with heatmap for better understanding of the region it is centering on. At the same time, heatmaps can also provide the radiologists with more useful information and further help them. In Papers [113, 114] LIME is used as one of the interpretable techniques to explain the outcome of different machine learning models for COVID-19 images. SHAP is used to visualize feature importance in [113, 115].

Classification

Almost all of the COVID-19 diagnosis models use Convolutional Neural Network [96] as a feature extractor and as a classifier, it uses softmax or sigmoid. Some authors also attempted to amplify CNN with a sigmoid layer. The authors of [45] merged CNN with the softmax layer along with the SVM classifier [116]. Kassani et al. [59] used CNN with softmax layer along a decision tree, random forest, XGBoost [117], AdaBoost [118], Bagging Classifier [119] and LightGBM [120]. Furthermore, the authors of Ahishali et al. [91] also merged CNN with KNN, support estimator network, and SVM classifier. Nonetheless, these models need a large amount of data for training which is in shortage of COVID-19 images.

Essentially there are two ways of classifying COVID-19 images, Binary Classification, and Multiclass classification. In Binary Classification authors tried to separate COVID-19 and non-COVID-19 patients, but this technique is very inaccurate as other types of lung diseases (viral pneumonia, viral pneumonia, bacterial pneumonia, and Community-Acquired Pneumonia) can be classified as COVID-19. For that reason, many authors differentiate COVID-19, viral pneumonia, bacterial pneumonia, community-Acquired Pneumonia, and normal images by classifying them using a softmax classifier. In terms of accuracy of detecting COVID-19 images, multiclass classifiers performed better than binary classifiers. A summary of different classification techniques used by different papers is illustrated in Tables 8 and  9.

Table 8 Summary of classification methods used by different papers both for CT and X-ray images
Table 9 Summary of classification methods used by different papers both for CT and X-ray images (monthwise)

Some authors also tried to detect COVID-19 in several stages. In the beginning, the authors separated normal and pneumonia images. After that, they classify COVID-19 by filtered pneumonia images. Several-stage classification helps the models to memorize various leveled connections. In paper [38, 40], authors used several-stage classification rather than an end to end method to detect COVID-19 which outperforms several the end to end techniques. On the flip side, the performance of multiclass classification relies on data sets. If there is a shortage of data set, the model cannot become familiar with the various leveled connections between classifications such as Pneumonia to Viral Pneumonia to COVID-19.

Experimental Results of the Papers

Researchers used different evaluation metrics to analyze their COVID-19 model’s performance. Among them, the most popular and used metrics for detecting COVID-19 are Accuracy, Precision, Recall/Sensitivity, F1 Score, Specificity, and Area Under Curve (AUC). In our work, we tried to record the performance with these metrics from all the papers which is represented in Table 10 for CT and in Table 11 for X-ray images. In addition, we have given the number of COVID-19 images from the total images used for training, testing, and validation purpose. Some papers explicitly stated the train-test split of COVID-19 images and for some papers, we calculated the split according to the ratio that is provided in the paper. Even so, for some papers, it is not clearly stated how they distributed their data set [16, 17, 20, 23, 75]. In addition, some papers explicitly stated the use of data for validation [27, 32, 33, 38, 40, 41, 44, 49, 52, 56, 63,64,65, 68, 70, 74, 81, 82, 84, 87, 89].

A summary of the results obtained by the studied models using CT images is illustrated in Table 10. These papers with their Accuracy, AUC, Sensitivity, and Specificity are given along with their distribution of COVID-19 images in training, testing, and validation set. It can also be observed that CT image-based models gained a minimum accuracy of 79.50% for the paper [85] and maximum accuracy of 99.56% for the paper [16].

Table 10 Summary of result evaluation for CT images

A summary of the results obtained by the studied models using X-ray images is illustrated in Table 11. Papers with their Accuracy, AUC, Sensitivity, and Specificity are given along with their distribution of COVID-19 images in training, testing, and validation set. It can also be seen that X-ray image-based models gained a minimum accuracy of 89.82% for the paper [22] and maximum accuracy of 99.94% for the paper [16]. For both Tables 10 and  11, the publication date, the total number of images used by the respective papers is provided. In addition, cited by (Number of papers) indicates the total number of papers that have cited the specific paper up to July 10, 2020.

Table 11 Summary of result evaluation for X-ray images

After analyzing all the papers it can be seen that most models can not accurately distinguish between Pneumonia and COVID-19 from the medical images. All the papers mentioned only focused on medical images but did not consider features, such as initial symptoms, travel history, laboratory assessment, contact history, and distinction between severe and mild COVID-19 [3, 68, 75, 86]. Most papers worked on the type of data sets that are not balanced containing more COVID-19 negative images, hence paper [58] suggested the use of modified loss function to tackle imbalanced data.

A single CNN network is not able to obtain higher dimensional fraternity features that are the decider for the classification. Whereas, modern pre-trained CNN models were fused to obtain higher dimensional fusion features which overcome the problem of insufficient features from a single CNN network model [15]. Due to this most authors used pre-trained models instead of single CNN models to detect COVID-19 from medical images.The main advantage of using pre-trained models such as Inception, ResNet and DenseNet is that they all have strong power of details extraction, but the problem with these models is that they fallaciously focus on some image edges, corners and other image areas that are not related to COVID-19 as these models are pre-trained with non-medical images [79]. Another drawback of using CNN-based models is that these models work like a black box giving no intuitions into the important image features. These methods lack transparency and interpretability [74, 86]. Moreover, most pre-trained models require a lot of time to train due to the immense number of parameters, but Polsinelli et al. [56] used SqueezeNet to solve this problem , a light weight model that reaches similar accuracy to modern CNN models.

Comparing Tables 10 and 11, it can be said that the X-ray image-based models performed better than the CT image-based models. The average Accuracy, Sensitivity, Specificity, and AUC for CT Image-based models are 90.69 %, 91.48%, 92.26%, and 94.46%, respectively, and for X-ray-based models are 96.00%, 91.09%, 96.45%, and 95.50%, respectively.

Conclusion

As COVID-19 is spreading worldwide expeditiously, accurate and faster detection of the disease has become the most essential objective to defend this outbreak. In this article, we tried to present a comprehensive survey of AI-empowered methods that use medical images to combat this pandemic. The fundamental purpose of this survey is to represent the current information so that researchers understand and aware of the up-to-date knowledge and build a model that can accurately detect COVID-19 disease at an economical cost and relatively faster in time. We surveyed a total of 80 COVID-19 diagnosis architectures among which 28 are using CT images, 50 are using X-ray images and 2 are using both CT and X-ray images. Till now none of these models are proved to be as reliable to replace RT-PCR tests and still, researchers are trying to improve these techniques. From our survey, it is noticeable that the X-ray image data set is more widely available than the CT Image data set as a CT scan procedure is costlier and more time-consuming than an X-ray. Therefore, most of the researchers utilized Chest X-ray images for diagnosing COVID-19. After analyzing the existing research works in this domain, we find out that there exists a shortage of annotated medical images of COVID-19 affected people. Enriching quality annotated medical images of COVID-19 affected people can play a significant role to boost up the performance of the mentioned data-hungry models. Furthermore, we remarked that using segmentation as preprocessing has an extensive impact on model performance. We also observed that domain adoption in transfer learning is the widely used technique which gives a promising result. Furthermore, many researchers used Gradient-weighted Class Activation Mapping (Grad-CAM) with heatmap to interpret the performance of the model. Though this survey paper cannot claim to be an in-depth think about those studies, it presents a practical outlook and shows a valid comparison of the researches in this field over these months which can be the conductor for the researcher to find future direction.