Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 30, 2020
Date Accepted: Mar 11, 2021
Date Submitted to PubMed: Apr 9, 2021
Accurate Severe vs Non-severe COVID-19 Clinical Type Classification: a Multimodality Machine Learning Study
ABSTRACT
Background:
Effectively and efficiently diagnosing COVID-19 patients with accurate clinical type is essential to achieve optimal outcomes for the patients as well as reducing the risk of overloading the healthcare system. Currently, severe and non-severe COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 invasion in different types. In addition, these type-defining features may not be readily testable at time of diagnosis.
Objective:
This study aimed to accurately differentiate severe and non-severe COVID-19 clinical types based on multiple medical features and provide reliable predictions for clinical decision support.
Methods:
In this study, we recruited 214 confirmed COVID-19 patients in non-severe and 148 in severe type. The patients’ clinical (including 26 features), and laboratory testing results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest (RF) models based on all features in each modality as well as top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types.
Results:
Using clinical and laboratory results as input independently, RF models achieved 90% and 95% predictive accuracy, respectively. Input features’ importance scores were further evaluated and top five features from each modality were identified (age, hypertension, cardiovascular disease, gender, diabetes; D-Dimer, hsTNI, absolute neutrophil count, IL-6, and LDH, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, RF model was able to achieve 99% predictive accuracy.
Conclusions:
These findings shed light on how the human body reacts to SARS-CoV-2 invasion as a unity and provide insights on effectively evaluating COVID-19 patient’s severity based on more common medical features when gold-standard features were not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triaging, while laboratory testing results are applied when accuracy is the priority.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.