INTRODUCTION

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), is a pandemic that has infected 20 million individuals and caused over 740,000 deaths worldwide as of August, 2020.1 In the USA, COVID-19 is projected to cause over 300,000 deaths by the end of 2020.2

COVID-19 presents with a variable array of symptoms3,4,5,6 and has a highly variable effect on morbidity and mortality,7 making the disease a challenge to appropriately triage and manage.8 It is estimated that a large percentage of patients are asymptomatic,9,10,11,12 while about 15% become so ill that they require intensive care.13 Reports of rapid decompensation requiring intubation among patients who are otherwise young and healthy further cloud triage and management decisions and may lead clinicians to rely on anecdote or recency bias.14

The swift uptick in COVID-19 cases has put a severe strain on healthcare resources in various hotspot regions. Given resource constraints and the unpredictability of the COVID-19 disease trajectory, clinicians face a large challenge in deciding whether a patient can be discharged based on future clinical trajectory.

To aid clinicians at the point of care with disposition decisions surrounding COVID-19 in the emergency department and inpatient ward, we present the derivation and validation of a data-driven clinical risk score to predict the 14-day occurrence of hypoxia, ICU admission, and death.

METHODS

Data Source

We extracted structured data from the common electronic health record (EHR) of Mass General Brigham (previously Partners HealthCare) institutions, a consortium of 10 hospitals and clinics in eastern Massachusetts.

The study protocol was deemed exempt by the Mass General Brigham institutional review board and was registered at clinicaltrials.gov (NCT04339387). We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD)15 reporting guideline from the Enhancing the Quality and Transparency of Health Research (EQUATOR) Network.16

Participants

We included all patients age 18 years and older who had at least one positive reverse transcription polymerase chain reaction test for SARS-CoV-2. Criteria for testing evolved rapidly at Mass General Brigham offered at first only to seriously ill patients and healthcare workers, and then slowly liberalized (eTable 1). We included patients with a new diagnosis beginning March 1, 2020, until April 12, 2020, when we passed our target enrollment of 1250 patients.

Outcome

We sought to study whether a patient with COVID-19 was suitable for discharge from a monitored setting. We developed a 14-day composite that predicted the need for supplemental oxygen, need for the intensive care unit (ICU) level of care, and death. We chose this composite endpoint, as opposed to that used in numerous case studies (mechanical ventilation, ICU, and/or death) to be highly clinically relevant in a scenario where inpatient beds are scarce, or barriers existed to sending patients home with oxygen. Our goal was to develop a risk score that is most useful for clinicians in the emergency department or those on the inpatient ward who are making decisions about discharge to home. Because patients present at different phases in their illness, we used a patient’s most recent data to guide management, as is done by clinicians determining a patient’s suitability for discharge.

We opted for a 14-day follow-up period from time of presentation for patients who did not experience this composite endpoint. New research suggests that the median time to symptoms is 8 days and that 90% of patients develop symptoms within 14 days.17 In addition, we were not able to ascertain date of symptom onset. “Day zero” was defined as the date of either a positive SARS-CoV-2 test or presentation for evaluation, whichever was earlier, allowing additional lead time.

Predictors

We a priori chose to focus on predictors that were objective, were readily available, and required little computation so that a risk score could be calculated by hand at the point of care. Though electronic health records and mobile apps allow clinicians to access sophisticated risk prediction models, these can be challenging to integrate, are not available in underserved settings, and do not afford transparency. We used reports from the literature to direct us toward predictors that appeared promising.4, 7, 18, 19 The risk score is meant to be used in real time when a patient is assessed for discharge from the emergency department or the inpatient ward, so we chose to use the last recorded measurement for each variable. We extracted age, sex, smoking status (ever-smoker), comorbidities (reflected dichotomously in order to derive a parsimonious score, including coronary artery disease, heart failure, hypertension, chronic obstructive pulmonary disease, asthma, obstructive sleep apnea, chronic kidney disease, end-stage renal disease, diabetes, cirrhosis, cancer, and organ transplantation), and body mass index (BMI) (kg/m2). We extracted the patient’s most recent vital signs, including temperature (degrees Celsius), respiratory rate (breaths per minute), pulse rate (beats per minute), oxygen saturation (%), and systolic blood pressure (mmHg; but not diastolic blood pressure which would be colinear). We extracted common laboratory measures that had shown promise in prior case series, including creatinine (mg/dL), glucose (mg/dL), total bilirubin (mg/dL), and acute phase reactants including white blood cell count (K/μL), albumin (g/dL), lactate dehydrogenase (LDH) (U/L), high sensitivity troponin (ng/L), C-reactive protein (CRP) (mg/L), procalcitonin (ng/mL), and D-dimer (ng/mL). We did not include medication treatments given their unknown consequences.20

Sample Size

Our dataset is comprised of a training cohort to derive a risk score and a validation cohort to validate the score. To ensure that the sample size in the training (derivation) cohort was sufficient for the estimates and for p values to be valid, we applied the rule that the number of events (positive for the composite outcome of hypoxia, ICU, or death) and non-events (negative for the composite) per covariate in the model should be at least 10.21,22,23 Given we categorized continuous variables into quartiles (for reasons discussed below), each of these predictors would act as three “covariates.” We aimed for a parsimonious model with no more than 5 predictors (and thus no more than 15 “covariates”). Thus, our derivation cohort would require at least 150 events and 150 non-events. With 1014 patients in the training cohort (759 events and 255 non-events), we had sufficient patients to validly estimate the beta coefficients in the logistic regression model.

Statistical Analysis

We report descriptive statistics on all variables, reporting means and 95% confidence intervals or frequencies and percentages. The study sample was randomly divided into a 75% cohort for training and a 25% cohort for validation.24, 25

To facilitate creation of a “paper and pencil” risk score, we categorized continuous variables. In other circumstances, clinically relevant thresholds would be used to categorize variables. However, the uncertainty about the pathophysiology underlying rapid progression to acute respiratory distress syndrome and unusual clinical characteristics in general left us uncertain about appropriate thresholds; we therefore chose to categorize by each variable’s quartiles. We first examined bivariate comparisons of the categorized variables using a chi-squared test. We retained all variables with p < 0.1 for the manual forward selection process. We removed any variable with more than 10% missingness. A priori and to mimic the order in which data are typically available during an encounter, we performed manual stepwise forward logistic regression, beginning with age and sex, then vital signs, then comorbidities, and finally laboratory values. As noted above, to achieve a parsimonious model usable at the bedside, we decided a priori to include no more than 5 variables. We only added a variable if it made a more than negligible change to the model’s fit, determined by the c-statistic (estimate of the area under the receiver operator characteristic curve). We adjusted for clustering by hospital but found no difference in the model’s performance.

We derived the risk score’s points by comparing the β coefficient of the variable to the overall sum of coefficients in the model, multiplying by 100, and rounding to the nearest integer.26 The risk score calculated for each patient represented the summed point totals from all variables present, where a higher score indicates more suitability for discharge.

We calculated a risk score for each patient, and the optimal cut point in the risk score was chosen as the cut point that maximized the sum of the sensitivity and specificity for suitability for discharge.27 The c-statistic was used to assess the discriminative ability of the logistic regression model and the risk score in both the training and validation cohorts. We validated the risk score by calculating the c-statistic for the score’s ability to discriminate who is and is not suitable for discharge in the validation cohort (remaining 25% of the original sample). The Hosmer-Lemeshow goodness-of-fit statistic was used to assess calibration for the logistic regression model in both the training and validation cohorts. Anticipating the risk score’s real-world use, we also examined its discriminative capacity for inpatients versus emergency department patients by validating the model for subsets of the population. As an added test, we checked the interaction between a patient’s score and their site of care. Upon recommendation from our emergency medicine colleagues, we also tested a competing model without any laboratory values, presented in the Supplement.

All tests for significance used a 2-sided p value of 0.05 unless otherwise noted. We performed all analyses in SAS, version 9.4 (SAS Institute).

RESULTS

Patient Characteristics

A total of 2059 patients tested positive for SARS-CoV-2, 1326 met criteria for inclusion (733 did not have the necessary 14-day follow-up), and 1014 were included in the training cohort (Fig. 1). Patients in the training cohort had a mean age of 58 years (95% CI, 57 to 59) and were 56% male, 51% White, 41% employed, 43% privately insured, and 68% never-smokers (Table 1 and eTable 2). Most (65%) had at least one chronic condition, with hypertension (46%), diabetes (31%), chronic kidney disease (15%), and asthma (14%) the most common (eTable 3).

Figure 1
figure 1

Sample selection schematic. COVID-19, coronavirus disease 2019.

Table 1 Training Cohort Patient Characteristics and Bivariate Differences (the Table Shows the 75% (Training) Cohort; See eTable 2 for the 25% (Validation) Cohort)

Of the patients in the training cohort, 255 (25%) were suitable for discharge, 783 (77%) were admitted, 728 (72%) required oxygen, 388 (38%) required ICU care, and 55 (5%) expired.

About a quarter of patients’ most recent vital signs were potentially concerning (Table 1). Specifically, 237 (24%) had a heart rate greater than 92, 226 (23%) had a respiratory rate greater than 24, 256 (26%) had a systolic blood pressure less than 109, and 302 (30%) had oxygen saturation less than 94%. A large percentage of patients had aberrations in acute phase reactants; for example, 245 (26%) had albumin less than 2.8 g/dL and 203 (25%) had LDH greater than 387 U/L (eTable 4).

Risk Score Development

Of the characteristics in Table 1, all except race/ethnicity (p = 0.50) and heart rate (p = 0.66) were significant in bivariate analysis. We added variables in a prespecified manual forward stepwise manner (eTable 5). The final model included age, oxygen saturation, and albumin, resulting in a score between 0 and 55 (Table 2; Fig. 2). Albumin was the most predictive factor (aOR of albumin > 3.7 g/dL compared to < 2.8 g/dL, 42 [95% CI, 18 to 96]). Age was the least (aOR of age 18–45 compared to age > 73, 1.8 [95% CI, 0.9 to 3.4]). The risk score had good discriminative capacity (c-statistic, 0.89 [95% CI, 0.87 to 0.91]) and no evidence of lack of fit (p = 0.835). At a cut point of 30, it had a sensitivity of 83.2%, and a specificity of 82.2% (Table 3). A false positive rate of 17.8 demonstrated a conservative score. We also considered other models with respiratory rate and without laboratory values, discussed below and in the online supplement.

Table 2 Multivariable Model and Associated Risk Scorea
Figure 2
figure 2

Risk score tool for the point of care.

Table 3 Risk Score Performance

Risk Score Validation

In the validation cohort, there was no significant difference in the model’s discriminative capacity (c-statistic, 0.87 [95% CI, 0.81 to 0.93]) and no evidence of lack of fit (p = 0.435) (eTable 6; eFigure 1).

To ensure the risk score performed well for both inpatients and other patients, we also stratified the validation by inpatients and emergency department patients. There were no significant differences in model performance with respect to the c-statistic, although the model performed somewhat better in the emergency department (eTable 6). The relationship between the composite outcome and the score was similar for inpatients and emergency department patients; in particular, the interaction of score with inpatient status in a logistic regression model to predict the composite outcome was not significant for the training (p for the interaction effect, 0.772) or validation cohort (p for the interaction effect, 0.2772).

DISCUSSION

We developed and validated a 3-item risk score to be used at the point of care to assist clinicians in decisions regarding discharge from a monitored setting for patients with COVID-19 (Fig. 2). The simple score, CHOSEN (COVID Home Safely Now), has robust discrimination, sensitivity, and specificity across all patients with COVID-19 in a monitored setting.

With over 600,000 infections in the USA, many hospital systems are severely strained. In the majority of cases, the patient appears clinically stable, an instance where CHOSEN could be useful. Currently, physicians are faced with a lack of validated risk scores to aid bedside management. Our risk score builds on important work from the Brescia-COVID Respiratory Severity Scale, which is appropriate for patients in respiratory distress who must be re-evaluated every few hours or even every few minutes.28 Shi et al. have described a “host susceptibility score” that assigns one point each for age ≥ 50, male sex, and presence of hypertension, and that predicts death or a severe phenotype.29 The unclear definition of the endpoint, absence of information about missing data, and lack of description of model-building methodology make it difficult to ascertain why their results were different from ours. Also, this and several other studies lack a standardized follow-up period, which could lead to biased estimates and incorrect conclusions.30

Our work also builds upon several case series that have examined patient characteristics associated with various negative outcomes. These case series collected clinical data and laboratory values to identify risk factors linked to the progression of patients with COVID-19.3,4,5, 19 Older age has been found to be a risk factor, but there is no clear threshold below which age is protective nor comparison of age to other clinical criteria. In contrast, we found age was much less predictive than other factors. Several authors have linked comorbidities with poor outcomes.3, 7 However, comorbidities were not universal among seriously ill patients3 and in our multivariable model, the presence of a comorbidity was not significant. Our work concurs with others regarding acute phase reactants. We found albumin, a negative acute phase reactant, as highly predictive, in concert with a smaller case series in a group of patients with progressive disease.7, 30, 31

Available data suggest that COVID-19 represents a diversion from the typical viral pneumonia clinical course such that it often initially presents without clear signs of decompensation, only to later be catastrophic. In support of this, we saw relatively narrow quartiles in the included variables, which speaks to the subtle, but important, changes in clinical presentation that occur with COVID-19. These highly nuanced changes may not be immediately discernable to a clinician in the moment, making a risk score like CHOSEN clinically useful.

We encountered challenges when trying to include respiratory rate, an often subjectively measured variable with significant measurement error.32 Given this known issue, it is not surprising that we were unable to divide patients evenly into quartiles; all modelers should approach this variable with caution (Table 1). We therefore present the more parsimonious and similarly performing model without respiratory rate (Fig. 2) but include a competing model with respiratory rate in the online supplement for clinicians to use at their own discretion (eTables 79; eFigure 2). We also include a competing model without any laboratory values based on feedback from emergency medicine colleagues, although this model did not perform as well (eTable 10).

Defining the outcome’s follow-up period was challenging due to variable findings in the literature regarding disease course. Our follow-up period of at least 14 days may be insufficient. We considered a longer follow-up period but balanced the advantages of that with the goal of delivering a useful tool to clinicians as soon as possible. However, by defining day zero as a time point later than symptom onset (our day zero was date of positive test or date of presentation) and including events upstream of death (any oxygen requirement and ICU-level care), we felt 14 days was a suitable time period to capture the majority of events. The model may not perform well for patients who deteriorate after 14 days, but in times of extreme resource scarcity, it may be necessary for those patients to convalesce at home until they require hospital-level care. We also recognize that some institutions are discharging patients with COVID-19 on oxygen, and our model does not account for this as a disposition option. The 14-day follow-up period explains why 733 patients with COVID-19 were not included in the analyzed sample, a relatively large number because testing was ramping up during this period.

Our study has limitations. First, our data source was the EHR, which has known data quality issues including inaccurate coding of comorbidities. To mitigate this, we pulled not only from a patient’s problem list but also from disease registries. Also, in bedside management, a patient’s problem list is often what is quickly available to a clinician. We were not able to pull every a priori risk factor (e.g., ferritin). However, prior work has noted significant correlation among acute phase reactants, such that one is likely sufficient (e.g., albumin).7 We similarly were unable to extract unstructured data, such as date of symptom onset. Second, a patient may have required care at a facility outside of Mass General Brigham after first presenting for testing, thereby limiting our ability to capture the composite endpoint and misclassifying the patient as “suitable for discharge.” However, in our prior studies using the same EHR with phone calls and a regional health information network to confirm utilization, we found that acute care at an outside system following discharge was exceedingly low.33 Third, although possibly uncommon, a family could have expressed the desire for discharge to home with an advanced directive specifying home-first palliative care. In this case, the patient would have passed away and been counted as having reached the composite endpoint, though this is the optimal outcome in these cases. Finally, our sample is from a single region in New England and was early in the pandemic where limited testing and extreme pressures on bed capacity existed. This may therefore not be generalizable to other populations or later periods of the pandemic, although similar hotspots have occurred throughout the country and globe. Similarly, we note that triage decision-making has evolved, and local practice variation exists based on system pressures and discharge resources (e.g., supplemental oxygen or remote monitoring at home).34 CHOSEN may offer a guidepost for which clinicians should account for their local system pressures and discharge resources when making discharge decisions. We plan for a prospective evaluation of this risk score to ensure validity and external validation in other populations and time periods to ensure generalizability, as derivation models may result in overly optimistic results compared with external validation cohorts.

CONCLUSIONS

A simple point of care 3-item risk score that includes age, oxygen saturation, and albumin predicts with good discriminative capacity, sensitivity, and specificity whether a patient with COVID-19 is suitable for discharge.