Introduction

The current coronavirus disease 2019 (COVID-19) pandemic has strained healthcare delivery models across the world. In the US there are over 8 million cases and 5.4% have required hospitalization. Of the hospitalized patients, to date, 20% have required care in the intensive care unit (ICU)1. Based on current projections, by January 1st 2021 the number of ICU beds needed for COVID patients will exceed the available ICU beds by 10.6%2,3. With this challenge in supply of ICU beds, states and counties have created detailed surge plans to ensure timely care of critically ill patients suffering with COVID-19. In order to sustain healthcare delivery through this pandemic, it is imperative to adopt a proactive approach towards utilization of healthcare resources like ICU beds and ventilators. Given the urgency for resource allocation and optimization, we sought to identify patient-level clinical characteristics at the time of admission to predict the need for ICU care and mechanical ventilation in COVID-19 patients.

Several studies have reported predictors for the severity of COVID-19 that are trained on data acquired at or around the time of admission4,5,6,7. The study described in this manuscript differs from these in several significant ways. First, instead of applying a single predictive model, we assess the performance of a cohort of models and then select the one that performs the best. Second, we do not include any imaging data and rely only on socio-demographic data, data acquired from a physical exam, and lab marker data obtained from a blood draw. This combination may be relevant to facilities in under-resourced scenarios where rapid imaging is not available. Third, we evaluate the relative benefit in predictive accuracy that is obtained from the lab-marker data alone, and conclude that it is significant. Fourth, we also consider a reduced model with only five features as input, and report good predictive performance for our model. This simplified model is easy to use, and only contains quantitative features thereby making it less prone to error and subjectivity. Finally, in training our model we consider data from Los Angeles county, while other studies are based on populations in other world regions. This is relevant since the outcome for COVID-19 are known to be dependent on demographics.

Methods

Data for this study was extracted from an Institutional Review Board (IRB) approved COVID-19 REDCap8 repository. Informed consent for the repository was waived by the USC IRB consistent with §45 CFR 46.116(f). The study was conducted in accordance with USC policies, IRB policies, and federal regulations. Subjects’ privacy and confidentiality were protected according to applicable HIPAA, and USC IRB policies and procedures. The repository contained demographic, clinical, and laboratory data for all COVID-19 positive patients seen at the Keck Medical Center of USC, Verdugo Hills Hospital, and Los Angeles County + USC Medical Center. Repository data elements include data from three categories: (a) socio-demographic data including age, sex, travel, contact history, and co-morbidities; (b) presenting clinical data gleaned from symptoms and the results of an initial physical examination including fever, dyspnea, respiratory rate, and blood oxygen saturation (SpO2); (c) blood panel profile including RT-PCR, InterLeukin-6, d-Dimer, complete blood count, lipase, and C-reactive protein (CRP). They also include the outcome data, namely, the need for ICU admission and mechanical ventilation. A description of all the input features, their type, and their median, minimum and maximum values is presented in Tables 1, 2, 3, 4 and 5.

Table 1 Socio-demographic features used as input.
Table 2 Socio-demographic features used as input.
Table 3 Input features from presenting clinical data and the results of an initial physical examination.
Table 4 Input features from presenting clinical data and the results of an initial physical examination.
Table 5 Input features from blood panel profile.

The study cohort comprised of 212 patients (123 males, 89 females) with an average age of 53 years (13–92 years), of which 74 required intensive care at some point during their stay, and 47 required mechanical ventilation. We note that only data obtained at the time of initial presentation, with 24 hours of initial presentation, was included as input to the predictive models, and the need for ICU admission and mechanical ventilation at any time during hospitalization were selected as outcomes.

Features with more than 30% missing data were excluded from the analysis. In the retained features, missing data was imputed using an iterative imputation method. In this method the feature to be imputed is treated as a function of a subset of other highly-correlated features and missing values are obtained using regression9. This subset of features is then iterated over to arrive at the final estimate. As part of this strategy, in order to prevent data leakage, only the training samples were used to develop regression models for imputation.

The retained features were used to compute the correlation of the outcome with input features. Thereafter, data was split into training (60%), and testing sets (20%). Fivefold cross-validation was performed using the training set to train the supervised learning models and tune their hyperparameters (random forest, multilayer perceptron, support vector machines, gradient boosting, extra tree classifier, adaboost). Among all these algorithms the Random Forest10 (RF) classifier was found to be the most accurate and was considered for further analysis.

The tuned RF model was applied to testing data to compute the probability of ICU admission and mechanical ventilation. This was repeated with five different folds, yielding predicted probabilities for 212 subjects generated by five distinct RF models. These were used to generate an ROC curve and compute the area under the curve (AUC). The relative importance of the input features was evaluated by computing their Gini importance.

The analysis describe above was first performed with input data from all categories, that is, socio-demographic data, presenting clinical data, and blood panel profile data. Thereafter, the blood panel profile data was excluded and the analysis was performed once again. This second analysis was done to assess the relative importance of the blood panel data in predicting the outcomes.

Results

In Fig. 1, we have plotted the AUC values for predicting the need for ICU and mechanical ventilation for all the algorithms considered in this study. From this figure we observe that the algorithms based on decision trees, that is, Random Forest, Extra Tree Classifier, and Gradient Boosting tend to perform better. This is likely because the simpler algorithms like Support Vector Machines do not have sufficient capacity to capture the complexity in the prediction, while other algorithms like Multi-Layer Perceptrons (MLP) do not have sufficient data for efficient training. This leads to issues with robustness and over-fitting. Further, among the algorithms based on decision trees, the Random Forest (RF) classifier is the most accurate and was considered for further analysis.

Figure 1
figure 1

Area under the curve (AUC) for the classifiers considered in the study for predicting the need for ICU (A) and mechanical ventilation (B).

For the RF predictor, we reported an AUC of 0.80, 95% CI (0.73–0.86) in predicting the need for ICU and an AUC of 0.83, 95% CI (0.76–0.90) for predicting the need for mechanical ventilation. At the optimal cut-point in the ROC curve11, the ICU predictor yields a Sensitivity of 0.73, Specificity of 0.74, a Positive Predictive Value (PPV) of 0.6 and a Negative Predictive Value (NPV) of 0.84, whereas the predictor for Mechanical Ventilation yields a Sensitivity of 0.72, Specificity of 0.73, a PPV of 0.44 and an NPV of 0.90 (see Table 6). These values demonstrate that we are able to accurately predict the need for intensive care and ventilation from data acquired at the time of admission. In terms of the AUC, the performance of the RF predictor is similar to results reported in studies from China4, New York7 and the Netherlands5 (AUC of 0.88, 0.8, and 0.77, respectively). We note that these studies differ from ours due to the regional differences in the population and the viral strain. Further, some these studies also included chest x-ray imaging features and tested a single type of ML algorithm (logistic regression or random forest). Deep learning models were also developed based on a cohort from China6, and these report an AUC 0.89 for a coarse measure of disease severity that clubs together patients receiving ICU care or mechanical ventilation, and those ultimately succumbing to the disease.

When only socio-demographic and presenting clinical data was used as input (lab markers were excluded), the AUC value for predicting ICU need dropped to 0.68, 95% CI (0.60–0.75), and that for predicting ventilation dropped to 0.70, 95% CI (0.61–0.79). The values of Sensitivity, Specificity, PPV and NPV at the optimal point also dropped by about 0.1 (see Table 6). This indicates that the lab marker data provides significant additional information and is important in improving the accuracy of these predictions. A recent comprehensive survey of laboratory markers concluded that many of the markers that are included in this study are correlated with COVID-19 severity and should therefore be used in models for predicting disease severity12. However, our results also indicate that it is possible to make moderately accurate predictions with only socio-demographic and presenting clinical data. This is particularly useful when quick decisions are required and the time or resources necessary for acquiring lab marker data are not available in a timely manner.

Table 6 Performance of Random Forest Predictors at the optimal operating point.

The top ten features with the strongest correlation to ICU admission are shown in Fig. 2A, and the most important features for the RF classifier for ICU need are shown in Fig. 2B. Similarly, the top ten features with the strongest correlation to the need for mechanical ventilation are shown in Fig. 3A, and the most important features for the RF classifier for mechanical ventilation need are shown in Fig. 3B.

Figure 2
figure 2

(A) Ten most highly correlated features with the need for ICU care. (B) Ten features with the highest relative importance for predicting the need for ICU care.

Figure 3
figure 3

(A) Ten most highly correlated features with the need for mechanical ventilation. (B) Ten features with the highest relative importance for predicting the need for mechanical ventilation.

Taken together, this set represents features that strongly influence the likelihood of ICU admission and mechanical ventilation. We note that they belong to all three categories—socio-demographic data, presenting clinical data, and blood panel profile data—showing that all these type of data are necessary in making an accurate assessment of disease severity. Several of these features have been implicated in determining the severity of COVID-19 by other researchers7,13,14,15,16,17,18,19; however, there are few studies that have considered them together and determined their relative importance.

Finally, we considered RF predictors that are trained only using the top five features for predicting ICU need. These are the values for CRP, d-Dimer, Procalcitonin, SpO2, and respiratory rate. Models based on this reduced set of features are easier to implement since they require less data. They are also more robust and not prone to subjective assessment since all these features are quantitative numbers that can be measured accurately. For the model designed to predict ICU need using these features we report an AUC of 0.79 (0.72, 0.85) and for the model designed to predict the need for mechanical ventilation we report an AUC of 0.83 (0.77, 0.9). Both these values are very close to the corresponding predictors that utilize all 72 features, thereby indicating not much accuracy is lost by employing the simpler, more robust models. The sensitivity, specificity, PPV and NPV values for these reduced models are reported in the third and sixth rows of Table 6, and these are also quite close to the corresponding models that utilize all 72 features.

In Fig. 4, we plot the distribution of some of the most important input features, including lab markers, presenting symptoms, and socio-demographic data for two sets of patients: those who require ICU care and whose who do not. We observe that the distribution of Creatinine (indicator of kidney function), C-reactive Protein (measure of inflammatory response), d-Dimer (measure of blood clot formation and breakdown), and Procalcitonin (elevated during infection and sepsis) among patients who require ICU care is spread over a larger range and has a higher average value. A similar trend is observed in the distribution for the respiratory rate. For SpO2 levels also we observe a distribution spread over a wider range for patients admitted to the ICU; however, in this case this group has a lower average value. We also note that the presence of the influenza-like symptoms roughly doubles the likelihood of requiring ICU care (from around 25% to 52%). Further, the percentage of males who are admitted to the ICU is much higher than the percentage of females (46% to 20%).

Figure 4
figure 4

Distribution of (from top left to bottom right) Creatinine, C-reactive Protein (CRP), d-Dimer, Procalcitonin, influenza-like symptoms, respiratory rate, SpO2 level, and sex for patients admitted to ICU and those who are not.

Discussion

The results presented in this study demonstrate that data acquired at or around the time of admission of a COVID-19 patient to a care facility can be used to make an accurate assessment of their need for critical care and mechanical ventilation. Further, the important features in this data belong to three different sets, namely, socio-demographic data, presenting clinical data, and blood panel profile data. We report that in cases where the blood panel data is not available, useful prediction might still be made, albeit with some loss of accuracy. This would be relevant to situations where the time or resources to acquire this type of data are limited. Out of all the machine learning models considered in this study, we found the random forest to be most accurate and robust to data perturbation for both critical care and mechanical ventilation prediction. We also demonstrate that the values of just five features, namely, CRP, Procalcitonin, d-Dimer, SpO2, and respiratory rate, can be used to predict the need for critical care and mechanical ventilation with an accuracy that is comparable to using all 72 features. The list of important features identified in our study is also indicative of a disease that affects multiple systems in the body including the respiratory, the circulatory system, and the immune system.