A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count

https://doi.org/10.1016/j.cmpb.2021.106444Get rights and content

Highlights

  • A combined dynamic ensemble selection (DES) method is proposed for imbalanced data.

  • The combined DES method is applied to detect COVID-19 from complete blood count.

  • Use SMOTE-ENN to balance data and remove noise.

  • Hybrid multiple clustering and bagging to generate cadidate classifiers for DES.

  • Proposed combined DES method outperforms some advanced algorithms.

Abstract

Background

As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier.

Method

A novel combined dynamic ensemble selection (DES) method is proposed for imbalanced data to detect COVID-19 from complete blood count. This method combines data preprocessing and improved DES. Firstly, we use the hybrid synthetic minority over-sampling technique and edited nearest neighbor (SMOTE-ENN) to balance data and remove noise. Secondly, in order to improve the performance of DES, a novel hybrid multiple clustering and bagging classifier generation (HMCBCG) method is proposed to reinforce the diversity and local regional competence of candidate classifiers.

Results

The experimental results based on three popular DES methods show that the performance of HMCBCG is better than only use bagging. HMCBCG+KNE obtains the best performance for COVID-19 screening with 99.81% accuracy, 99.86% F1, 99.78% G-mean and 99.81% AUC.

Conclusion

Compared to other advanced methods, our combined DES model can improve accuracy, G-mean, F1 and AUC of COVID-19 screening.

Keywords

COVID-19 screening
Imbalanced data
Dynamic ensemble selection
Hybrid multiple clustering and bagging
Candidate classifier generation

Cited by (0)

View Abstract