Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 7, 2020
Date Accepted: Mar 8, 2021
Date Submitted to PubMed: Apr 15, 2021

The final, peer-reviewed published version of this preprint can be found here:

Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation

Domínguez-Olmedo JL, Gragera-Martínez , Mata J, Pachón V

Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation

J Med Internet Res 2021;23(4):e26211

DOI: 10.2196/26211

PMID: 33793407

PMCID: 8048712

Machine Learning Applied to Spanish Clinical Laboratory Data for COVID-19 Outcome Prediction: Model Development and Validation

  • Juan L. Domínguez-Olmedo; 
  • Álvaro Gragera-Martínez; 
  • Jacinto Mata; 
  • Victoria Pachón

ABSTRACT

Background:

The pandemic caused by the SARS-Cov2 virus will probably stand as the greatest health catastrophe of the modern era. The Spanish healthcare system has been exposed to uncontrollable numbers of patients in a short period of time, causing system collapse. Given that diagnosis is not immediate and there is no effective treatment, other tools have had to be developed to identify patients at risk of severe disease complications, and thus optimize material and human resources in health care. There are no tools to establish which patients have a worse prognosis than others.

Objective:

In this study, we aimed to process a sample of electronic health records of COVID-19 patients in order to develop a machine learning model to predict the severity of infection and mortality through clinical laboratory parameters. Early patient classification can help optimize material and human resources, and analysis of the most important features of the model could provide insights into the disease.

Methods:

After an initial performance evaluation based on a comparison with several other well-known methods, the extreme gradient boosting (XGBoost) algorithm was chosen as the predictive method for this study. In addition, SHAP (SHapley Additive exPlanations) was used to analyze the importance of the features of the resulting model.

Results:

After data preprocessing, 1823 confirmed COVID-19 patients and 32 predictor features were selected. On bootstrap validation, the XGBoost classifier yielded a value of 0.97 (95% CI 0.96-0.98) for the area under the receiver operator characteristic curve, 0.86 (95% CI 0.80-0.91) for the area under the precision-recall curve, 0.94 (95% CI 0.92-0.95) for accuracy, 0.77 (95% CI 0.72-0.83) for F-score, 0.93 (95% CI 0.89-0.98) for sensitivity, and 0.91 (95% CI 0.86-0.96) for specificity. The four most relevant features for model prediction were LDH, C-reactive protein, neutrophils, and urea.

Conclusions:

The predictive model obtained in this work achieved excellent results in the discrimination of COVID-19 dead patients, by mainly employing laboratory parameter values. The analysis of the resulting model identified a set of features with the most significant impact on the prediction, and so relating them to a higher risk of mortality.


 Citation

Please cite as:

Domínguez-Olmedo JL, Gragera-Martínez , Mata J, Pachón V

Machine Learning Applied to Clinical Laboratory Data in Spain for COVID-19 Outcome Prediction: Model Development and Validation

J Med Internet Res 2021;23(4):e26211

DOI: 10.2196/26211

PMID: 33793407

PMCID: 8048712

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement