Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 1, 2020
Date Accepted: Oct 2, 2020
Date Submitted to PubMed: Oct 7, 2020

The final, peer-reviewed published version of this preprint can be found here:

Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation

Vaid A, Somani S, Russak AJ, De Freitas JK, Chaudhry FF, Paranjpe I, Johnson KW, Lee SJ, Miotto R, Richter F, Zhao S, Beckmann ND, Naik N, Kia A, Timsina P, Lala A, Paranjpe M, Golden E, Danieletto M, Singh M, Meyer D, O'Reilly PF, Huckins L, Kovatch P, Finkelstein J, Freeman RM, Argulian E, Kasarskis A, Percha B, Aberg JA, Bagiella E, Horowitz CR, Murphy B, Nestler EJ, Schadt EE, Cho JH, Cordon-Cardo C, Fuster V, Charney DS, Reich DL, Bottinger EP, Levin MA, Narula J, Fayad ZA, Just AC, Charney AW, Nadkarni GN, Glicksberg B

Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation

J Med Internet Res 2020;22(11):e24018

DOI: 10.2196/24018

PMID: 33027032

PMCID: 7652593

Machine Learning to Predict Mortality and Critical Events in COVID-19 Positive New York City Patients: A Cohort Study

  • Akhil Vaid; 
  • Sulaiman Somani; 
  • Adam J. Russak; 
  • Jessica K. De Freitas; 
  • Fayzan F. Chaudhry; 
  • Ishan Paranjpe; 
  • Kipp W. Johnson; 
  • Samuel J. Lee; 
  • Riccardo Miotto; 
  • Felix Richter; 
  • Shan Zhao; 
  • Noam D. Beckmann; 
  • Nidhi Naik; 
  • Arash Kia; 
  • Prem Timsina; 
  • Anuradha Lala; 
  • Manish Paranjpe; 
  • Eddye Golden; 
  • Matteo Danieletto; 
  • Manbir Singh; 
  • Dara Meyer; 
  • Paul F. O'Reilly; 
  • Laura Huckins; 
  • Patricia Kovatch; 
  • Joseph Finkelstein; 
  • Robert M. Freeman; 
  • Edgar Argulian; 
  • Andrew Kasarskis; 
  • Bethany Percha; 
  • Judith A. Aberg; 
  • Emilia Bagiella; 
  • Carol R. Horowitz; 
  • Barbara Murphy; 
  • Eric J. Nestler; 
  • Eric E. Schadt; 
  • Judy H. Cho; 
  • Carlos Cordon-Cardo; 
  • Valentin Fuster; 
  • Dennis S. Charney; 
  • David L. Reich; 
  • Erwin P. Bottinger; 
  • Matthew A. Levin; 
  • Jagat Narula; 
  • Zahi A. Fayad; 
  • Allan C. Just; 
  • Alexander W. Charney; 
  • Girish N. Nadkarni; 
  • Benjamin Glicksberg

ABSTRACT

Background:

Coronavirus disease 2019 (COVID-19) has infected millions of patients worldwide and has been responsible for several hundred thousand fatalities. This has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods for achieving this are lacking.

Objective:

We analyze Electronic Health Records from COVID-19 positive hospitalized patients admitted to the Mount Sinai Health System in New York City (NYC). We present machine learning models for making predictions about the hospital course over clinically meaningful time horizons based on patient characteristics at admission. We assess performance of these models at multiple hospitals and time points.

Methods:

We utilized XGBoost and baseline comparator models, for predicting in-hospital mortality and critical events at time windows of 3, 5, 7 and 10 days from admission. Our study population included harmonized electronic health record (EHR) data from five hospitals in NYC for 4,098 COVID-19+ patients admitted from March 15, 2020 to May 22, 2020. Models were first trained on patients from a single hospital (N=1514) before or on May 1, externally validated on patients from four other hospitals (N=2201) before or on May 1, and prospectively validated on all patients after May 1 (N=383). Finally, we establish model interpretability to identify and rank variables that drive model predictions.

Results:

On the training set, the XGBoost classifier outperformed baseline models, with area under the receiver operating characteristic curve (AUC-ROC) for mortality at 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days; with area under the precision-recall curve (AU-PRC) of 0.45 at 3 days, 0.33 at 5 days, 0.44 at 7 days, and 0.48 at 10 days. XGBoost performed well for critical event prediction with AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days and AU-PRC of 0.61 at 3 days, 0.62 at 5 days, 0.66 at 7 days, and 0.70 at 10 days. The trends in performance on both external and prospective validation sets were also similar to that of the training set. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers for mortality prediction.

Conclusions:

We trained and validated (both externally and prospectively) machine-learning models for mortality and critical events at different time horizons. These models identify at-risk patients, as well as uncover underlying relationships predicting outcomes.


 Citation

Please cite as:

Vaid A, Somani S, Russak AJ, De Freitas JK, Chaudhry FF, Paranjpe I, Johnson KW, Lee SJ, Miotto R, Richter F, Zhao S, Beckmann ND, Naik N, Kia A, Timsina P, Lala A, Paranjpe M, Golden E, Danieletto M, Singh M, Meyer D, O'Reilly PF, Huckins L, Kovatch P, Finkelstein J, Freeman RM, Argulian E, Kasarskis A, Percha B, Aberg JA, Bagiella E, Horowitz CR, Murphy B, Nestler EJ, Schadt EE, Cho JH, Cordon-Cardo C, Fuster V, Charney DS, Reich DL, Bottinger EP, Levin MA, Narula J, Fayad ZA, Just AC, Charney AW, Nadkarni GN, Glicksberg B

Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation

J Med Internet Res 2020;22(11):e24018

DOI: 10.2196/24018

PMID: 33027032

PMCID: 7652593

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement