Dear editor,

Severity of illness scores are used for benchmarking and assessment of adjusted mortality in intensive care units (ICU). The Acute Physiology and Chronic Health Evaluation (APACHE) and the Simplified Acute Physiology Score (SAPS) 3 scores were recently evaluated in ICU patients with coronavirus disease 2019 (COVID-19) with conflicting findings [1,2,3,4]. While in the cohorts from United Kingdom [1] and United States [2] ICU scores underestimated the actual mortality and poorly stratified disease severity, analyses from Austria using the SAPS-3 with first-level customization suggested satisfactory performance [3]. In this study, we aimed to evaluate the performance of SAPS-3 to predict hospital mortality in a large cohort of COVID-19 patients admitted to ICUs in Brazil.

We included all adult patients (> 16 years) with RT-PCR-confirmed SARS-CoV-2 infection admitted to 188 ICUs of 45 hospitals (Rede D’Or São Luiz) from February 26th, 2020, to April 30th, 2021. Anonymized information was obtained from an electronic system, which contains prospectively collected structured data for all ICU admissions (Epimed Monitor®, Rio de Janeiro, Brazil).

In addition to the standard equation (SE), we obtained recalibrated probabilities for COVID-19 patients after performing a first-level customization of the SAPS-3 equation (Supplementary Methods). We assessed the discrimination for hospital mortality using the area under the receiver operating curve (AUROC) and Brier’s Score with 95% confidence intervals (95% CI). Calibration was evaluated using the Hosmer–Lemeshow goodness-of-fit (GOF) test and the calibration belt method [5]. R 4.1 was used for all analyses.

A total of 30,571 COVID-19 patients had complete hospital outcomes and were analyzed (Supplementary Table 1). Median age was 55 (interquartile range 42–69) years and 42% required advanced respiratory support. Overall, 4581 (15%) patients died in the hospital. Using the SAPS-3-SE, the predicted mortality was 15.7% and the standardized mortality ratio (SMR) was 0.95 (95% CI 0.93–0.98). Model`s discrimination was satisfactory (AUROC = 0.835 [95% CI 0.828–0.841]; Brier`s score = 0.097 [0.095–0.100]). However, the calibration was inappropriate for both SAPS-3-SE and the COVID-19-customized equation. Calibration belts and curves demonstrated underestimation of mortality in lower to intermediate risk groups and overestimation in higher risk group, which was unaffected by customization (Fig. 1; Supplementary sTable 2). These results were consistent when stratifying patients in three consecutive periods (Supplementary sTable 3 and sFigure 1).

Fig. 1
figure 1

Calibration assessment of SAPS-3 standard equation and COVID-19 customized equation. Panel a calibration belts for predicted probabilities; Panel b calibration curves with predicted (blue line) and observed (red line) mortalities

We compared the main results of the present study with previous studies (Supplementary sTable 4). Our results did not confirm that first-level customization improves the performance of SAPS-3 in predicting hospital mortality in ICU patients with COVID-19 as shown by others. Differences in models’ performance may be caused by differing admission policies and the timing of patient ICU admission. Our study may also not reflect the entirety of the Brazilian healthcare system as we included data from a private healthcare network with almost unrestricted access to ICU care. Regardless, our findings reinforce that standard severity of illness scores should be used with caution for mortality prognostication or benchmarking of ICU performance in COVID-19 patients. Moreover, our results highlights the need for proper calibration of these scores to estimate risk-adjusted metrics such as the SMR in this population. Further work is warranted to improve current severity scores or develop COVID-19-specific prognostic measures.