1 Introduction

The coronavirus emerged in Wuhan, China, in December 2019. In February 2020, the World Health Organization (WHO) officially designated it as COVID-19, and WHO recognized it as a public health emergency. Subsequently, in March 2020, it was declared a global pandemic. (Kim 2021). Throughout the pandemic, the impact of COVID-19 varied significantly across different regions of the world, with notable effects observed in the United States, Italy, and Spain. One of the key characteristics of COVID-19 is its rapid spread, facilitated by easy transmission from person to person. Symptoms of COVID-19 typically include cough, fever, and flu-like symptoms upon infection, with the potential for severe complications such as pneumonia difficulty breathing, and in some cases, death. The healthcare systems of nations worldwide, including those considered highly developed, faced significant challenges as a result. Many countries implemented measures such as curfews, online schooling and work arrangements, and advisories for individuals to stay home and avoid contact with others. Additionally, large-scale interventions such as mask mandates, and restrictions on inter-city and international travel, were enacted to control the spread of the virus (Gülmez 2023a; Cheng et al. 2020; Coxon et al. 2020).

Identifying individuals with COVID-19 is a critical aspect of combating the spread of the virus. Polymerase chain reaction (PCR) testing is commonly employed for this purpose. However, there is a consensus among medical professionals and experts that the reliability of PCR testing may be limited. It is not uncommon for these tests to yield false results, either indicating positive cases as negative or vice versa. As an alternative, clinicians may turn to radiological findings and chest X-ray images of patients as supplementary diagnostic tools in cases where PCR testing results are uncertain. (Osman et al. 2020; Alhudhaif et al. 2021).

With the advent of computers and technology, x-ray images may be used to diagnose sickness using artificial intelligence approaches. In medicine, artificial intelligence is frequently used. The development of CNN and deep learning models has made it quite simple to extract data from photos. Additionally, great rates of precision may be attained. Classical machine learning approaches have difficulties with visual data. This issue has been resolved thanks to CNN (Gupta et al. 2021).

When creating deep neural networks, several choices are possible. There are several variables, such as the network’s depth, the number of neurons in each layer, the kinds of layers, and the parameters. Therefore, there are an infinite number of options. Typically, researchers use their expertise and the trial-and-error technique to identify the optimal mesh. In addition, metaheuristic algorithms may be used to make excellent decisions amongst various possibilities (Xu et al. 2014). In daily life, decision-makers are often required to make choices with many objectives in mind. Typically, it is really difficult to make a solid choice, and the decision-maker needs certain decision-making aids. Moving to a better decision point for an existing choice than any objective is deemed a good option if at least one of the other goals cannot be reached without deterioration. Decision support techniques provide these excellent decisions to the user, who then picks one. Effective decision support mechanisms should provide the user with excellent choices in a fair period of time. MOO issues are those in which many objective functions are in competition with one another (Igel 2005; Ortaçay 2020).

Artificial intelligence (AI) has various application areas (Gülmez 2022a, 2023b, c; Gülmez and Kulluk 2023). CNN is a part of AI (Gülmez 2024). CNN models can be used for detecting COVID-19 from medical images such as chest X-rays. These models are trained on datasets of images to learn patterns associated with the disease, aiding radiologists and clinicians in diagnosis. Ongoing research aims to improve the accuracy and reliability of CNN-based approaches for COVID-19 detection. Less parameterized models, often referred to as simpler models, offer several advantages over their more complex counterparts in deep learning. These models require reduced computational resources for training, inference, and deployment. This leads to significant savings in terms of time, energy, and infrastructure costs, particularly beneficial when working with large datasets or deploying models on resource-constrained devices such as mobile phones or edge devices. Simpler models typically train faster compared to larger models due to their fewer parameters. This accelerated training process enables quicker experimentation and iteration, facilitating rapid prototyping and development of machine learning solutions. Simpler models are less prone to overfitting, a phenomenon where the model learns to memorize the training data rather than generalize to unseen data. Overfitting often occurs in highly parameterized models due to their increased capacity to capture complex patterns in the training data. Less parameterized models have a reduced capacity for memorization, leading to better generalization performance on unseen data. Simple models are often more interpretable and easier to understand compared to complex models. With fewer parameters, the decision-making process of the model becomes more transparent, allowing humans to comprehend how the model arrives at its predictions. This interpretability is particularly valuable in domains where model transparency and trust are critical, such as healthcare and finance. In scenarios where labeled data is limited or expensive to obtain, simpler models can be advantageous. These models tend to require less labeled data for training and can achieve competitive performance with smaller datasets. This property is especially relevant in niche or specialized domains where collecting large amounts of labeled data may be impractical or infeasible. The preference for simpler, less parameterized models stems from their efficiency, robustness, interpretability, and ability to generalize well to new data, making them a compelling choice in various machine learning applications (Gülmez and Kulluk 2019; Gülmez 2022b, 2023a, d).

In this paper, we address the challenge of enhancing the robustness and performance of neural networks in the presence of various disruptions and constraints. Song et al. (2023b) investigate event-triggered state estimation for reaction-diffusion neural networks (RDNNs) in the presence of Denial-of-Service (DoS) attacks, proposing a switching-like event-triggered strategy (SETS) to mitigate intermittent attacks while maintaining system performance. Zhuang et al. (2023) present an optimal ILC algorithm tailored for linear systems with nonuniform trial lengths and input constraints, offering improved constraint handling capabilities and monotonic convergence properties. Song et al. (2023a) explore bipartite synchronization for reaction-diffusion neural networks with cooperative-competitive interactions, leveraging a dual event-triggered control mechanism to reduce resource consumption while ensuring synchronization. By integrating insights from these diverse methodologies, we aim to contribute to the development of robust and efficient neural network systems capable of withstanding disruptions and constraints in real-world scenarios.

In this research, a deep CNN network is used to identify COVID-19. This deep artificial neural network is discovered with the MOO algorithms consisting of NSGA-II, NSGA-III, R-NSGA-II, SMS-EMOA, MOEA/D, and the proposed Swarm Genetic Algorithm (SGA). Three objective functions are used for the MOO problem. They are multi-class cross entropy, error ratio (1-accuracy), and complexity of the network. The aim is to minimize multi-class cross entropy, minimize error ratio, and minimize the complexity of the network.

This paper introduces a novel contribution to the field through the development of a bespoke multi-objective hyperparameter optimization algorithm specifically designed for COVID-19 detection from X-ray images. While CNNs are commonly utilized for image-based disease detection, the novelty of this approach lies in the integration of MOO principles to identify optimal CNN architectures. By considering objective functions such as multi-class cross entropy, error ratio, and CNN network complexity, the algorithm enables the simultaneous optimization of multiple performance metrics, thereby enhancing the robustness and generalizability of COVID-19 detection models. The comparative analysis with existing algorithms, including NSGA-III, NSGA-II, R-NSGA-II, SMS-EMOA, and MOEA/D, highlights the superior performance of our proposed SGA, underscoring the novelty and effectiveness of our approach in the context of COVID-19 detection from X-ray images. This paper represents a novel contribution to the field of medical imaging and deep learning-based disease diagnosis, with potential implications for improving healthcare outcomes and pandemic response strategies.

2 Literature review

In recent years, the field of multi-objective optimization has witnessed a surge in the development of novel algorithms aimed at efficiently solving complex real-world engineering problems. Rahman et al. (2022) demonstrated competitive performance against established algorithms such as Multi-Objective Water Cycle Algorithm (MOWCA), NSGA-II, and Multi-Objective Dragonfly Algorithm (MODA), particularly showcasing superior solution quality in certain scenarios such as coil compression spring design. Abdullah et al. (2023) introduce the MOFDO algorithm, a multi-objective variant of the Fitness Dependent Optimizer, equipped with comprehensive knowledge types. Evaluations on standard benchmark functions and real-world engineering problems, including welded beam design problems, demonstrate MOFDO’s effectiveness in providing diverse and well-distributed feasible solutions. Comparative analyses against state-of-the-art algorithms like NSGA-III and Multi-Objective Dragonfly Algorithm underscore MOFDO’s competitiveness and efficacy across various optimization scenarios. These advancements highlight the importance of continuously exploring innovative optimization methodologies to address the evolving challenges posed by complex engineering problems.

The COVID-19 pandemic has brought significant challenges to various aspects of society, including healthcare, communication systems, and face recognition technology. Researchers have responded to these challenges by proposing novel methodologies and frameworks aimed at improving diagnostic accuracy, optimizing resources, and adapting to the new normal of mask-wearing. This literature review explores eight recent studies that contribute to these efforts.

Sayed (2022) introduced a hybrid approach, RSO-AlexNet-COVID-19, combining the rat swarm optimizer (RSO) and convolutional neural network (CNN) for the automated diagnosis of COVID-19 using CT and X-ray images. The study achieved a remarkable overall classification accuracy of 100% for CT images and 95.58% for X-ray image datasets, outperforming other CNN architectures.

Akingbesote et al. (2023) proposed a Pareto-optimized FaceNet model with data preprocessing techniques to enhance face recognition accuracy, particularly in the context of mask-wearing during the COVID-19 pandemic. The study demonstrated superior performance in recognizing both masked and unmasked faces, offering implications for real-world applications.

Dhiman et al. (2022) presented ADOPT, an automatic deep learning and optimization-based approach for COVID-19 detection using X-ray images. The study employed multi-objective optimization and deep learning techniques, achieving significant advancements in classification accuracy compared to existing methods.

Hajiakhondi-Meybodi et al. (2021) addressed the need for trustworthy and time-varying connection scheduling in wireless networks during the pandemic. Their framework, CQN-CS, utilized deep reinforcement learning to optimize connection scheduling between Femto Access Points (FAPs) and Unmanned Aerial Vehicles (UAVs), improving network performance metrics.

Kiziloluk and Sert (2022) proposed COVID-CCD-Net, a CNN-based system for the diagnosis of COVID-19 and colon cancer using chest X-ray and tissue microarray (TMA) images. Their approach optimized CNN hyperparameters, achieving accurate classification of COVID-19, normal, and viral pneumonia cases, as well as different regions in colorectal cancer images.

Shukla et al. (2021) developed a multiobjective genetic algorithm combined with a CNN for automated COVID-19 identification in chest X-ray images. The study demonstrated improved diagnostic accuracy, suggesting the model’s potential for real-time testing of patients.

Mohammedqasem et al. (2023) proposed a deep learning framework for medical datasets with high missing values, particularly focusing on COVID-19 diagnosis. Their hybrid approach, incorporating Data Missing Care (DMC) framework and Grid-Search optimization, achieved high accuracy in classifying COVID-19 patients despite missing data.

Liu et al. (2024) introduced GrMoNAS, a granularity-based multi-objective Neural Architecture Search (NAS) framework for efficient medical diagnosis. Their approach balanced diagnostic accuracy and computational efficiency, showing promising results across various medical scenarios, including COVID-19 diagnosis.

Muthumayil et al. (2021) introduced a Multi-objective Black Widow Optimization-based Convolutional Neural Network (MBWO-CNN) technique for the diagnosis and classification of COVID-19 using X-ray and CT images. Their approach involved preprocessing, feature extraction, parameter tuning, and classification, achieving a remarkable accuracy of 96.43%.

Singh et al. (2021) proposed a deep neural network-based screening model using chest X-ray images for identifying COVID-19-infected patients. By tuning the hyperparameters using Multi-objective Adaptive Differential Evolution (MADE), their model outperformed existing machine learning models in terms of various performance metrics.

Goel et al. (2022) presented Multi-COVID-Net, a two-step deep learning architecture optimized using the Multi-Objective Grasshopper Optimization Algorithm (MOGOA) for COVID-19 diagnosis from chest X-ray images. Their model demonstrated superior performance compared to state-of-the-art methods in classifying Non-COVID-19, COVID-19, and pneumonia patient images.

Çiğ et al. (2023) proposed an enhanced disease detection approach using Contrast Limited Adaptive Histogram Equalization (CLAHE) and Multi-Objective Cuckoo Search (MOCS) combined with Convolutional Neural Networks (CNNs). Their method achieved high accuracy rates in classifying chest X-ray images into healthy, unhealthy, and pneumonia categories.

Singha et al. (2022) introduced the Multi-Objective Black Widow Optimization-based Convolutional Neural Network (MBWO-CNN) method for diagnosing and classifying COVID-19 data. Their model, employing Extreme Learning Machine Auto Encoder (ELM-AE), achieved a maximum accuracy of 97.53%, showcasing its effectiveness in COVID-19 diagnosis.

Dhiman et al. (2021) proposed a Deep Learning and Optimization-Based Framework (DON) for the detection of COVID-19 using X-ray images. By employing multi-objective optimization and J48 decision tree classification, their model demonstrated superior performance compared to other CNN-based approaches.

Rajagopal et al. (2023) developed a Deep Convolutional Spiking Neural Network optimized with Arithmetic Optimization Algorithm (AOA) for lung disease detection using chest X-ray images. Their technique outperformed existing methods in terms of accuracy, precision, and F-score, showcasing its potential for diagnosing lung diseases, including COVID-19.

3 Algorithms

In this section, the algorithms and methods used in this paper are explained. They are MOO, NSGA-II, NSGA-III, R-NSGA-II, SMS-EMOA, MOEA/D, SGA, and CNN subsections.

3.1 Multi-objective optimization

Despite the fact that certain real-world situations may be reduced to a single purpose, it is sometimes difficult to characterize all elements in terms of a single objective. Multiple goals often provide a clearer picture of the work. MOO has been accessible for almost two decades, and its applicability to real-world issues is continually expanding. In contrast to the abundance of approaches available for single-objective optimization, comparatively few strategies for MOO have been created. The search space in single-objective optimization is often well-defined. Once there are many potentially contradictory goals to be optimized concurrently, there is no longer a single optimum solution, but rather a group of viable solutions of equal quality. When attempting to maximize many goals simultaneously, the search space becomes partly ordered. There will be a series of optimum trade-offs between competing goals in order to arrive at the ideal solution (Gülmez et al. 2024; Abraham and Jain 2005; Liang et al. 2019).

The MOO problem can be defined as (1) and (2) (Gunantara 2018).

$$\>{\rm{min}}\,{\rm{or}}\,{\rm{max}}\,{f_1}\left( x \right),\>{f_2}\>\left( x \right), \ldots \>,\>{f_n}\>\left( x \right)$$
(1)
$$\:subject\:to:x\:\in\:U$$
(2)

where x is solution variables, n is the number of objective functions, u is the feasible set, min and max are objective functions. In the MOO, there is a multidimensional space for the objective function vector and a multidimensional space for the solution vector in the decision variable space. In every x solution in the space of choice variables, there exists a point in the space of objectives. Figure 1 depicts the mapping between the solution vector and the objective function vector. (Gunantara 2018).

Fig. 1
figure 1

Solution space and map to objective space (Lim et al. 2009)

During optimization, the Pareto approach maintains the elements of the solution vectors are distinct (independent), and the idea of dominance is used to distinguish between dominated and non-dominated solutions. When one objective function cannot rise without decreasing the other objective function, the dominant solution and optimum value are often attained in MOO. This state is known as Pareto optimality. The collection of the best MOO solutions is known as the Pareto optimal solution. There is a phrase known as Pareto efficient or non-dominated solution. A non-Pareto optimum solution is one in which one objective function may be enhanced without diminishing the other goal function. This is known as the dominant solution (inferior). It is theoretically solvable if a Pareto optimum solution can be identified. Several terms in the Pareto optimum solution must be recorded in the Pareto approach. These clauses are anchor points and utopia points. Anchor points may be acquired by an objective function’s optimal performance. The point of Utopia is found at the intersection of the maximum/minimum values of two-goal functions. Dominated points, non-dominated points, and utopia points can be seen in Fig. 2 (Gunantara 2018).

Fig. 2
figure 2

Dominated, non-dominated points, and utopia points (Gunantara 2018)

3.1.1 NSGA-II

NSGA-II is a widely used evolutionary algorithm for multi-objective optimization. It employs a non-dominated sorting mechanism to rank candidate solutions based on their dominance relationships. By maintaining a diverse population of solutions through elitist selection and crowding distance calculation, NSGA-II ensures a well-distributed set of Pareto-optimal solutions. This algorithm iteratively evolves a population of candidate solutions through selection, crossover, and mutation operators, facilitating the exploration and exploitation of the search space. NSGA-II has been applied to various optimization problems, including hyperparameter optimization for machine learning models, demonstrating its effectiveness and versatility (Deb et al. 2002; Ortaçay 2020).

3.1.2 NSGA-III

NSGA-III is an extension of NSGA-II that addresses the limitation of handling three or more objectives in multi-objective optimization problems. It introduces a reference point-based approach to guide the evolution towards the Pareto front. NSGA-III partitions the objective space into hypercubes and selects individuals based on their proximity to reference points, ensuring a balanced distribution of solutions across the Pareto front. By incorporating reference points dynamically and maintaining diversity in the population, NSGA-III enhances the convergence and diversity of solutions compared to NSGA-II, particularly in problems with more than two objectives (Mutlu 2021).

3.1.3 R-NSGA-II

R-NSGA-II is another variant of the NSGA-II algorithm that focuses on addressing the challenges of handling many-objective optimization problems. It extends the concept of reference points introduced in NSGA-III to efficiently manage multiple objectives. R-NSGA-II utilizes a set of predefined reference points to guide the evolution process towards the Pareto front while maintaining diversity in the population. By adaptively adjusting reference points and employing elitist selection strategies, R-NSGA-II effectively balances convergence and diversity, making it suitable for complex optimization problems with numerous conflicting objectives (Filatovas et al. 2017).

3.1.4 SMS-EMOA

SMS-EMOA is a surrogate-assisted evolutionary algorithm designed for multi-objective optimization tasks. It leverages surrogate models, such as Gaussian process regression, to approximate the objective functions and guide the search process efficiently. SMS-EMOA iteratively updates the surrogate model based on observed data points and focuses the search on promising regions of the objective space. By incorporating an elitist selection mechanism and adaptive sampling strategies, SMS-EMOA achieves a balance between exploration and exploitation, facilitating the discovery of high-quality Pareto-optimal solutions. This algorithm has demonstrated effectiveness in various real-world optimization problems, including hyperparameter tuning for machine learning models, where the objective functions are expensive to evaluate directly (Beume et al. 2007).

3.1.5 MOEA/D

MOEA/D is a decomposition-based evolutionary algorithm specifically designed for solving multi-objective optimization problems. It decomposes the original problem into a set of scalar subproblems and optimizes each subproblem simultaneously using a cooperative coevolutionary framework. MOEA/D maintains a population of solutions organized in subpopulations, where each subpopulation corresponds to a scalar subproblem. By balancing the exploration and exploitation of the search space through local and global search operators, MOEA/D effectively converges to a diverse set of Pareto-optimal solutions. This algorithm has been successfully applied to various optimization tasks, including hyperparameter optimization, where it efficiently explores the trade-offs between competing objectives to identify optimal model configurations (Zhang and Li 2007).

3.1.6 Swarm genetic algorithm (SGA)

Swarm Genetic Algorithm (SGA) is the new algorithm proposed for this problem. This is an algorithm based on a swarm. In every iteration, every member of the swarm moves, and then new members are created by the cross-over stage of the genetic algorithm. It is the main idea of the algorithm.

Firstly, the number of members in the population should be determined. New solutions are created by population size. Every solution of the population is calculated and evaluated.

After the evaluation of every solution, the solutions are sorted by their objective values. The sorting is not the same every time, because this is a MOO problem, so the Pareto front is the same. The sorting is related to the surface level. It cannot be said that exact differences in the Pareto front. Only the differences are determined between different levels of surfaces. The first and the best surface is the Pareto front surface. The Pareto front is in Fig. 3 with blue colored.

Fig. 3
figure 3

The non-dominant solutions with blue color and the dominated solutions with red color

For every dominated solution that is not in the Pareto front, a random non-dominated solution is selected. Then the dominated solution moves to the selected non-dominated solution. The moving process is (3). Figure 4 shows the movement.

$$\:{x}_{new}=\:\frac{{x}_{old}+{x}_{target}}{2}$$
(3)
Fig. 4
figure 4

Movement of the dominated solutions

After the movement of the solutions, the cross-over of the genetic algorithm is made. It is two points crossed over the classical genetic algorithm. So, new solutions are generated.

The elitist approach is chosen for the next generation. The best solutions live, and the worst solutions die. These procedures continue until the termination criteria.

3.2 CNN

CNN is one of the most effective pattern recognition methods. A deep CNN typically consists of convolutional layers, pooling layers, and a fully connected layer. These structures apply locally learned filters to extract visual information from the input image. Pooling minimizes the size of the feature maps, which are subsequently used as the input images for the subsequent convolution. This procedure is continued until all deep features are extracted. Oftentimes, a classifier reaches a decision based on these features after completing these stages. A fully connected network acts as a classifier for these qualities, while convolutional processes are used to extract features from this structure. The completely linked component may end up having a SoftMax output layer for classification reasons. Using these layers, several significant network topologies have been developed, including AlexNex, Xception, and GoogleNet. Overfitting these structures during training is among the most serious concerns. Several methods have been proposed to prevent overfitting, including data augmentation and dropout layers (Sarıgül et al. 2019). Figure 5 depicts a CNN sample.

Fig. 5
figure 5

Convolutional neural network sample (Cho and Kim 2021)

Using a convolution layer, which is a structure comprised of a number of fixed-size filters, it is feasible to apply complex functions to the input image. Utilizing locally trained filters, this technique is carried out. This technique uniformizes the filter weights and biases throughout the whole image. This process, known as the weight-sharing mechanism, enables the representation of the same feature over the whole picture. The local receptive field of a neuron is the area to which it was previously attached. The size of the receptive field is determined by the size of the filters. Let the size of the input picture and the size of the kernel, the image’s representation, and the weight and bias of the filter be and, respectively. ReLu or sigmoid activation functions may be employed to calculate output (Sarıgül et al. 2019). Figure 6 depicts a typical convolutional layer.

Fig. 6
figure 6

Convolutional layer sample (Lakhmiri et al. 2021)

Feature maps are subjected to convolution and activation functions prior to the pooling procedure. The smaller feature maps resulting from this technique provide summaries of the input features. It moves a window over the picture to perform the selected action. The most frequent pooling techniques are maximum, average, and L2 pooling. Averaging the input values produces an average pooling result, whereas maximization yields a maximum pooling result. The key advantages of pooling processes include reduced image size and independent extraction of visual components (Sarıgül et al. 2019). Figure 7 shows the maximum and average pooled samples.

Fig. 7
figure 7

Pooling layer sample (Yani et al. 2019)

After being twisted and aggregated, the data is reduced to a one-dimensional vector. As its input, the fully connected network will use this vector. There may be one or more secret layers within the fully integrated system. Each neuron multiplies the connection weights by the previous layer’s data and adds a bias value to the connection weights. The decided value is transferred to the subsequent layer through the activation function. The class is then established (Sarıgül et al. 2019).

4 Results and discussion

In this section, the dataset is introduced, and evaluation metrics are explained. Then, all the algorithms are evaluated and compared. Finally, sensitivity analysis for the results is made.

4.1 Dataset

In this study, a dataset consisting of 3 classes is used. The classes are COVID, normal, and viral pneumonia. The train of the dataset has 251 images. It has 111 COVID-19, 70 normal, and 70 viral pneumonia images. The test of the dataset has 66 images. It has 26 COVID-19, 20 normal, and 20 viral pneumonia images (Raikote 2020). The distribution of the dataset can be seen in Table 1.

Table 1 Class distribution of the dataset

Sample images from the dataset can be seen in Fig. 8.

Fig. 8
figure 8

Sample images from the dataset

4.2 Study

4.2.1 Evaluation metrics

The evaluation metrics employed in this study encompass multiclass entropy, accuracy, and network complexity. For multi-objective algorithms, the evaluation metrics are uniformly transformed into minimization criteria. Specifically, the multiclass cross entropy metric adopts a minimization objective, as indicated by its formulation (6). Conversely, accuracy, being a maximization metric, is converted into accuracy error through subtraction from 1, as articulated in formula (4). The determination of network complexity hinges upon the count of parameters within the networks, representing another minimization objective, elucidated in formula (5). The accuracy metric, defined by the formula (4), calculates the proportion of true positive and true negative predictions against all predictions. Error ratio, computed as 1 minus accuracy, is utilized to gauge the discrepancy between predictions and actual values. Multi-class cross entropy, as denoted by formula (6), quantifies the disparity between predicted and actual probability distributions across multiple classes. In the formula, K is the number of the classes, yk is the true probability or distribution of the class k. This is the actual probability that the sample belongs to class k. ŷk is the predicted probability or distribution of the class k. This is the probability assigned to the sample by the model for belonging to class k.

$$\>accuracy\> = \>{{true\>positive\> + \>true\>negative} \over \matrix{ true\>positive\> + \>true\>negative\> + \> \hfill \cr false\>positive\> + \>false\>negative\> \hfill \cr} }$$
(4)
$$\:error\:ratio\:=\:1-\:accuracy$$
(5)
$$\:Multiclass\:cross\:entropy=\:-\:\sum\:_{k}^{K}{y}^{k}\:\text{l}\text{o}\text{g}{\widehat{y}}^{k}$$
(6)

The selection of performance indicators plays a crucial role in evaluating the effectiveness and efficiency of predictive models. In this study, three key performance indicators were chosen: error ratio (1-accuracy), multi-class cross entropy, and the number of parameters of the model. The error ratio, calculated as 1 minus the accuracy, provides a complementary perspective to accuracy by quantifying the proportion of incorrect predictions. By considering both correct and incorrect predictions, the error ratio offers a more comprehensive understanding of the model’s predictive capabilities, particularly in scenarios where misclassification carries significant consequences. Additionally, the multi-class cross entropy metric measures the model’s predictive uncertainty by assessing the divergence between predicted and actual probability distributions across multiple classes. This metric is particularly relevant in multi-class classification tasks, providing insights into the model’s confidence levels and potential areas for improvement. Lastly, the number of parameters of the model serves as a measure of model complexity and resource utilization. By monitoring the number of parameters, researchers can assess the trade-off between model complexity and performance, ensuring that the model achieves an optimal balance between predictive accuracy and computational efficiency. Overall, the selection of these performance indicators reflects a comprehensive evaluation framework aimed at capturing different aspects of model performance and guiding model optimization efforts.

4.2.2 Deep CNN

The deep CNN is created in Fig. 9. It has input with a size 150 × 150 × 3. It has three convolutional and max-pooling layers respectively. Also, it has fully connected neural networks with a dropout layer. Finally, it has three output alternatives to detect classes. Sizes and parameters of the network are changed with MOO algorithms.

Fig. 9
figure 9

Deep CNN architecture

4.2.3 Parameter Alternatives

There are some parameters to train the CNN for COVID-19 detection. The alternatives are seen in Table 2.

Table 2 Parameter alternatives

4.3 Results and comparison

In this study, six different algorithms are run for the hyper-parameters of the deep neural network created for COVID-19 detection. The aim is to find the most suitable architecture for the deep neural network. Hyper-parameter optimization is applied with these three algorithms. The number of objective functions is three. These are multi-class cross entropy, error ratio, and the number of parameters.

First, SGA is run. As a result of SGA, 8 Pareto front solutions are obtained. The distribution of the solutions obtained as a result of SGA is shown in Fig. 10. When the graph is analyzed, the multiclass cross entropy and error rate show similar characteristics. these values increase and decrease approximately in parallel. But they are inversely proportional to the number of parameters. Moreover, the results are not very close, they are scattered. Looking at the graph of the number of parameters and error ratio, there are different structures that find an error ratio of 0.

Fig. 10
figure 10

Pareto front of SGA

The NSGA-III algorithm is run. A total of 14 solutions are obtained. Figure 11 shows that multi-class cross entropy and error ratio are directly proportional, and the number of parameters is inversely proportional to others. Within these results, many different models with an error rate of 0 are discovered. They are included in the Pareto front with differences in multi-class cross entropy values. Since it gives 14 different solutions, a large number of points can be observed. It is especially unsuccessful in discovering small-size models (smaller number of parameters) compared to SGA.

Fig. 11
figure 11

Pareto front of NSGA-III

When the NSGA-II algorithm is run, a total of 16 solutions are obtained. It can be seen in Fig. 12. In this result, it is seen that multi-class cross entropy and error ratio are proportional. When these two objective functions are compared with the number of parameters, it is seen that there is an inverse proportion. It has a high total number of solutions. It has discovered models with very small dimensions, but these models have a high error rate. The solutions are very close to each other. It gave very similar solutions to the Pareto front.

Fig. 12
figure 12

Pareto front of NSGA-II

When the R-NSGA-II algorithm is run, a total of 3 solutions are obtained. It can be seen in Fig. 13. In this result, it is seen that multi-class cross entropy and error ratio are directly proportional. When these two objective functions are compared with the number of parameters, it is seen that there is an inverse proportion. It is an inefficient algorithm as it gives only 3 solutions.

Fig. 13
figure 13

Pareto front of R-NSGA-II

When the SMS-EMOA algorithm is run, a total of 8 solutions are obtained. It can be seen in Fig. 14. In this result, it is seen that multi-class cross entropy and error ratio are directly proportional. When these two objective functions are compared with the number of parameters, it is seen that there is an inverse proportion. In general, the solutions are well distributed. As the model complexity decreased (less number of parameters) the error rate increased.

Fig. 14
figure 14

Pareto front of SMS-EMOA

When the MOEA/D algorithm is run, a total of 2 solutions is obtained. It can be seen in Fig. 15. In this result. There is a similar result here, but it is not very clear as there are only 2 solutions.

Fig. 15
figure 15

Pareto front of MOEA/D

Figure 16 shows the comparison of the algorithms on the Pareto front. Blue color represents SGA, red color NSGA-II, green color NSGA-II, yellow color R-NSGA-II, purple color SMS-EMOA, and black color MOEA/D algorithms. When the graph is analyzed, it is seen that there are two algorithms that almost approach 0 in error rate and multi-class cross entropy values. These are SGA and SMS-EMOA algorithms. In addition, there are many points with higher error ratios and multi-class cross entropy values. It can be seen that these two algorithms have achieved good results with complex models with high parameters. They found the endpoints well within the Pareto front. In the graph of the number of parameters and multi-class cross entropy, there is a blue dot close to 0. The SGA algorithm achieved a good point with low parameters.

Fig. 16
figure 16

Pareto front comparisons of the algorithms

Three-dimensional Pareto fronts are shown in Fig. 17. f1 is multi-class cross entropy, f2 is error ratio, and f3 is the number of parameters. When the figure is examined, there are too many points to make an easy conclusion. but some points stand out. it can be observed that the colors yellow and green give more unsuccessful results. The yellow and green points are R-NSGA-II and NSGA-II algorithms.

Fig. 17
figure 17

3-dimensional pareto front

4.3.1 Hypervolume results

The hypervolume is a widely used performance metric in multi-objective optimization that quantifies the quality of a Pareto front, representing the trade-offs between conflicting objectives. It measures the volume of the objective space that is dominated by the solutions in the Pareto front. A larger hypervolume indicates a more desirable Pareto front, with solutions that are both diverse and well-distributed across the objective space. The hypervolume serves as a crucial indicator of the effectiveness of different optimization algorithms. By comparing the hypervolumes generated by various algorithms, it can be accessed their ability to produce diverse and high-quality solutions that balance the trade-offs between multi-class cross entropy, error ratio, and the complexity of the convolutional neural network architecture. Ultimately, the hypervolume analysis provides valuable insights into the performance of different optimization techniques, guiding the selection of the most suitable algorithm for achieving optimal results in COVID-19 detection.

Table 3 shows the hypervolume comparison of the algorithms. The hypervolume metric serves as an indicator of the quality of Pareto fronts generated by each algorithm, with higher values representing better performance. The results demonstrate that the SGA attained the highest hypervolume score of 905.759, indicating its effectiveness in exploring the solution space and identifying diverse sets of Pareto-optimal solutions. Following closely, NSGA-III achieved a hypervolume of 835.602, showcasing its robustness in balancing convergence and diversity in the Pareto front. NSGA-II and SMS-EMOA also performed competitively, with hypervolume scores of 709.799 and 743.132, respectively. However, R-NSGA-II and MOEA/D exhibited relatively lower hypervolume scores of 633.317 and 520.339, suggesting potential limitations in their ability to explore the solution space comprehensively. Overall, the hypervolume comparison provides valuable insights into the relative performance of different algorithms and aids in selecting the most suitable approach for multi-objective hyperparameter optimization tasks. Considering all these results, SGA can be considered the best.

Table 3 Hypervolume comparison of the algorithms

4.4 Sensitivity analysis for multi-class cross entropy

A sensitivity analysis is conducted to explore the relationship between the number of parameters and the multi-class cross entropy in our model for COVID-19 detection from X-ray images. The sensitivity analysis is aimed at elucidating how variations in the number of parameters of the deep CNN architecture impact the performance of the model in terms of multi-class cross entropy. Through the application of Shapley values, a game-theoretic approach to assigning importance scores to each feature, the influence of individual parameters on the multi-class cross entropy metric is examined. By systematically varying the number of parameters and observing the corresponding changes in multi-class cross entropy, insights into the sensitivity of our model to architectural configurations are gained. This analysis provides valuable guidance for optimizing the CNN architecture to achieve better performance in COVID-19 detection from X-ray images, ultimately enhancing the accuracy and reliability of our diagnostic system.

Figure 18 shows a beeswarm graph of multi-class cross entropy and the number of parameters of the CNN model. Looking at the graph, the points in blue indicate low values and the points in red indicate high values. In other words, the blue-colored points are simpler CNN models with fewer model parameters. As the model becomes simpler, the multi-class cross entropy value decreases and the model becomes more successful. The most successful models are the most complex models.

Fig. 18
figure 18

Beeswarm graph of multi-class cross entropy and number of parameters of the CNN model

4.5 Sensitivity analysis for error rate

A sensitivity analysis is undertaken to explore the impact of variations in the number of parameters on the error rate (1-accuracy) metric in our model for COVID-19 detection from X-ray images. The primary objective of this sensitivity analysis is to elucidate how changes in the number of parameters of the deep convolutional neural network (CNN) architecture influence the error rate performance of the model. By employing Shapley values, a game-theoretic technique for assigning importance scores to individual features, the influence of each parameter on the error rate metric is examined. Through systematic adjustments to the number of parameters and subsequent observations of the corresponding changes in the error rate, a comprehensive understanding of our model’s sensitivity to architectural configurations is aimed to be gained. This analysis serves to provide valuable insights for optimizing the CNN architecture to mitigate error rates and enhance the accuracy of COVID-19 detection from X-ray images, thereby bolstering the reliability of our diagnostic system.

Figure 19 shows a beeswarm graph of the error ratio and the number of parameters of the CNN model. The data points depicted in the graph are denoted by blue for low values and red for high values. Simply put, the CNN models denoted by the blue points contain fewer amount of model parameters. The efficiency of the model improves as the error ratio value diminishes with increasing simplicity. The most intricate models are also the most successful. However, upon closer inspection, purple specks are also visible. This indicates that high success can be obtained with models that are comparatively less complex.

Fig. 19
figure 19

Beeswarm graph of error rate and number of parameters of the CNN model

5 Conclusion

In conclusion, this research paper introduces a novel approach for hyperparameter optimization in the context of COVID-19 detection from X-ray images using CNNs. With the emergence of the COVID-19 pandemic, there is a pressing need for accurate and efficient diagnostic tools, and X-ray imaging has shown promise in this regard.

The study frames the problem as a MOO task, considering objective functions such as multi-class cross entropy, error ratio, and complexity of the CNN network. To identify the best solutions to these objectives, six different algorithms are employed: NSGA-III, NSGA-II, R-NSGA-II, SMS-EMOA, MOEA/D, and the proposed Swarm Genetic Algorithms (SGA). The results reveal that SGA outperforms the other algorithms in terms of generating Pareto optimal solution sets and achieving higher hypervolume values.

The findings suggest that SGA offers superior performance compared to existing algorithms for multi-objective hyperparameter optimization in the context of COVID-19 detection from X-ray images. Moreover, a sensitivity analysis is conducted to investigate the impact of varying the number of parameters of the CNN on model success, providing valuable insights into the robustness and generalizability of the proposed approach.

In summary, this research contributes to the ongoing efforts to develop efficient and accurate diagnostic tools for COVID-19 using deep learning techniques. The findings highlight the importance of hyperparameter optimization in enhancing the performance of CNN models for disease detection and pave the way for future research in this domain.

The limitations of the proposed approach in comparison to similar schemes warrant careful consideration. While this algorithm demonstrates superior performance in generating Pareto optimal solution sets compared to existing algorithms, such as NSGA-III, NSGA-II, R-NSGA-II, SMS-EMOA, and MOEA/D, there are certain constraints to be acknowledged. One limitation lies in the computational complexity associated with multi-objective hyperparameter optimization, particularly when dealing with large-scale datasets or complex CNN architectures. Additionally, the generalizability of this approach across diverse imaging modalities or clinical settings may be subject to further investigation. The sensitivity of the algorithm to variations in dataset characteristics, such as image quality or class distribution imbalance, should be carefully assessed. Addressing these limitations and refining this approach through ongoing research efforts will be crucial for ensuring its effectiveness and applicability in real-world scenarios.

Future studies in this domain can explore several avenues to further advance the research on hyperparameter optimization for COVID-19 detection from X-ray images using CNNs. While this study considers objective functions such as multi-class cross entropy, error ratio, and complexity of the CNN network, future research could investigate the efficacy of additional objective functions. Exploring alternative metrics may provide insights into different aspects of model performance and lead to further improvements in hyperparameter optimization.

In this study, six different algorithms are evaluated for multi-objective hyperparameter optimization. Future research could explore the integration of other optimization algorithms or hybrid approaches to enhance the diversity and effectiveness of the optimization process. Investigating novel algorithms or adapting existing ones to the specific requirements of COVID-19 detection tasks could yield promising results.

While this study focuses on X-ray images for COVID-19 detection, future research could extend the proposed approach to other imaging modalities, such as computed tomography (CT) scans or magnetic resonance imaging (MRI). Comparing the performance of hyperparameter optimization techniques across different imaging modalities could provide valuable insights into their generalizability and applicability in diverse clinical settings.

Transfer learning techniques have shown promise in leveraging pre-trained CNN models for tasks with limited labeled data, such as COVID-19 detection. Future research could explore the integration of transfer learning and domain adaptation methods into the hyperparameter optimization framework to further improve model performance and adaptability to different datasets and imaging conditions.

While this study utilizes publicly available datasets for experimentation, future research could conduct extensive evaluations on real-world clinical data collected from diverse healthcare settings. Assessing the performance of hyperparameter optimization techniques under real-world conditions, including variations in patient demographics, imaging protocols, and equipment characteristics, is essential for validating their effectiveness and robustness in clinical practice.

As deep learning models are increasingly deployed in clinical settings, the interpretability and explainability of model predictions become crucial. Future research could focus on developing interpretable models and evaluation metrics to enhance the trust and transparency of CNN-based diagnostic systems. Exploring techniques for visualizing and understanding the decision-making process of optimized CNN models could facilitate their acceptance and adoption by healthcare professionals.