Introduction

Coronavirus disease 2019 (COVID-19) is a pandemic infection caused by SARS-CoV-2; its clinical manifestations encompass a wide range of entities, from a mild flu-like illness to life-threatening forms1. The first reports describing the clinical characteristics of patients affected by COVID-19 have been from China, where the pandemic originated at the end of 20192,3; since the first case, reported on 20th February 2020, the outbreak in Italy has rapidly assumed dramatic proportions, making it the first Western country to face this epidemic.

One of the most striking aspects of the data diffused by the World Health Organization (WHO) is the substantially different prognosis among countries; indeed, according to the last situation report (18th May 2020), the case fatality rate (CFR) in Italy (31,908/225,435; 14.15%) is considerably higher than in China (4,645/84,494; 5.5%)4. This almost three-fold increased risk of death in Italy has been explained with a higher susceptibility to mortality risk factors: age, male gender, and comorbidities5. However, even among Western countries, there are relevant differences: for instance, Germany reported a very low fatality rate (4.54%) when compared to France (20.04%), United Kingdom (14.21%) or Spain (11.95%)4. The reasons for these discrepancies are still unclear and may include genetic factors, differences in the local testing strategies, and epidemiological reporting between countries and a different ability of local health systems to deal with the epidemic.

To better elucidate this issue, it is particularly important to have a clear picture of the general features of the patients diagnosed with COVID-19 in different countries. Data about the clinical features of Italian Covid-19 patients admitted to hospital are still lacking. The present study aims to fill this gap.

Methods

Study population

The study was conducted in three hospitals in Northern Italy (“Maggiore della Carità” University Hospital in Novara, “Santi Antonio e Biagio e Cesare Arrigo” Hospital in Alessandria and “Sant’Andrea” Hospital in Vercelli). The hospitals are the referral of a vast homogenous territory in Eastern Piedmont, one of the provinces most severely hit by the COVID-19 outbreak, with a catchment area of around 900,000 inhabitants.

From hospital administrative data revision, we selected all consecutive patients older than 18 years of age, admitted to the hospital after Emergency Room evaluation, with a confirmed diagnosis of SARS-CoV-2 infection by reverse-transcriptase polymerase chain reaction (RT-PCR) of a nasopharyngeal swab, between 1st March 2020 and 28th April 2020.

On 9th April 2020, an electronic case report form was generated using the Research Electronic Data Capture software (REDCap, Vanderbilt University) to retrospectively collect clinical data, retrieved from the revision of clinical records. Data entry was performed by clinicians involved in the management of COVID-19 patients. We assigned a unique pseudonymized code to each patient included in the study.

The following information was collected:

  • Patients’ demographics, symptoms, comorbidities, home medications, triage vitals, and complications during the hospital stay;

  • Outcomes: the outcome of in-hospital stay (discharged or deceased). We defined as adverse outcome to calculate CFR death for any cause occurred during hospital in-stay.

On 28th April, at the time of cut-off, clinical data about 486/1697 COVID-19 patients admitted during the study period were recorded on the database and were used for the identification of predictors of mortality. Laboratory data were directly retrieved from the central lab system for all the patients admitted to the Novara Hospital (256/486, 53%) and were used to identify the impact on survival (also see Fig. 1 for more details).

Figure 1
figure 1

The figure details the selection of the study population.

The study protocol was approved by the Institutional Review Board (Comitato Etico Interaziendale Novara; IRB code CE 97/20) and conducted in strict accordance with the principles of the Declaration of Helsinki. Prospective informed consent was waived by competent authorities due to the retrospective nature of the study and the use of pseudonymized data (Comitato Etico Interaziendale di Novara).

Patients included in this study have been evaluated for other reports.

Statistical analysis

Data were summarized according to groups as median and [25th–75th percentile] and analyzed using the Wilcoxon test. Categorical variables, whenever dichotomous or nominal, were reported as frequencies and percentages and analyzed through the Chi-square test.

A univariable logistic regression analysis was carried out to evaluate the effects of covariates on in-hospital mortality.

A random Forest feature selection algorithm was carried out to select the most relevant death predictors to be included in the multivariable logistic regression analysis. The mean decrease in accuracy measure has been considered for computation. The tuning of the algorithm has been performed via a cross-validated procedure with fourfold. The mean decrease in accuracy was considered for variable selection. The algorithm was implemented to select the most relevant variables separately among three different sets of predictors:

  1. (1)

    Set 1. The first set is characterized by the anamnestic predictors.

  2. (2)

    Set 2. The second set includes the laboratory analysis variables.

  3. (3)

    Set 3. The third set is composed of the anamnestic and laboratory analysis predictors.

Three separate logistic regression multivariable analyses were carried out on the different predictors sets considering the relevant features identified by the random forest algorithm. Set 1 was carried out on the population of 486 patients with clinical data recorded; set 2 and 3 were carried out using the 256 subjects for whom laboratory data were available.

The 0.632 bootstrap (1000 resamples) validation procedure was carried out to evaluate the predictive logistic regression model performance reporting the Harrell-C statistics corrected for over-optimism6.

The area under curve (AUC) together with the 95% confidence interval has been computed for the multivariable continuous significant predictors.

Statistical analyses were conducted using R 3.5.27 with the RandomForest8 and rms9 packages. The threshold of statistical significance was 0.05 for all tests used (two-tailed).

Results

Analysis of mortality in the study population

We included a total of 1697 patients who were hospitalized because of COVID-19. The fatality rate in this population was 29.7% (504/1697; 95% CI 27.5%; 31.9%). The death incidence over 1000 person-day was 4.9 (95% CI 0.5; 9.3).

The median time from symptoms presentation to hospital admission was 5 days2,3,4,5,6,7,8,9. In Fig. 2 we report the histogram frequency plot days from hospital admission or symptoms onset. The highest number of deaths are concentrated within the first 12 days of hospital in-stay and 23 days from symptoms onset; however, deaths occurred up to 40 days since hospital admission. In Fig. 3, we report the survival curve for in-hospital mortality; the median in-hospital survival time was 8 days from admission (95% CI 7; 11).

Figure 2
figure 2

Histogram frequency plot for days until death from hospitalization (panel A) or symptoms onset (panel B).

Figure 3
figure 3

In-hospital survival curve. The median in-hospital survival time is equal to 8 days (95% CI 7; 11).

Clinical predictors of mortality

Predictors of mortality were evaluated in a subset of 486 patients. At the end of the observation period, 407/486 had reached either of the two outcomes (death or discharge), while 79 were still hospitalized; the CFR in this subgroup was 29.9% (122 patients), representative of that observed in the whole population. In Table 1, we report the main clinical features of the study population. The majority were elderly males. Arterial hypertension was the most frequently reported comorbidity. When we looked at the presenting complaints, fever, cough, and dyspnea were the most commonly reported (61%, 59%, and 48% of patients respectively).

Table 1 Clinical features of the study population.

We then evaluated the prevalence of underlying comorbidities among deceased patients and survivors: arterial hypertension, history of coronary artery disease (CAD), active cancer, atrial fibrillation, dementia and chronic kidney disease were all more prevalent in patients who died during the hospital stay. Similarly, current smoking was positively associated with mortality. Concerning the clinical picture at admission, the patients with a poorer prognosis had a median higher respiratory rate and had more frequently dyspnea.

We further run a univariate analysis which is reported in supplementary Table 1.

The clinical predictor of mortality to be included in a multivariable logistic regression model have been identified via Random Forest feature selection (82.5% achieved accuracy with 1000 trees and 3 mtry). The variable importance measures for each predictor are represented in the Fig. 4.

Figure 4
figure 4

Random forest variable importance plot. The variables have been ranked in order of relevance in predicting in-hospital mortality. The importance measure considered for the analysis is the mean decrease in accuracy computed via Random Forest Classification Algorithm. The Random forest model accuracy is equal to 82.5% achieved with 1000 trees and 3 mtry.

As shown in Table 2, older age, smoking habit, obesity and the concomitant presence of active cancer turned out as independent predictors of mortality.

Table 2 Multivariable models.

Laboratory predictors of mortality

To evaluate the association between laboratory variables at admission and prognosis, we considered a further subgroup of 256 patients, whose laboratory data are reported in Table 3.

Table 3 Laboratory characteristics of the study population.

PaO2/FiO2 (P/F) ratio was significantly lower in patients who died than in survivors, identifying a more severe respiratory failure at baseline (246 [184–300] vs. 126 [100–202]; < 0.001). In-hospital mortality was also associated with a higher neutrophil count and increased serum creatinine, C Reactive Protein (CRP), and Lactate Dehydrogenase (LDH). Finally, lower platelets and lymphocytes count were associated with mortality. In a multivariable model including the most relevant lab variables (identified via Random forest feature selection with the 78%% achieved accuracy with 1000 trees and 3 mtry), the P/F ratio was the only one to confirm its potential predictive role.

Finally, the P/F ratio was confirmed to predict mortality, along with age, in a further multivariable logistic regression model including demographic and clinical variables (Table 2). The predictors are selected via random forest mean decrease accuracy. The achieved accuracy is 78% with 1000 trees and 4 mtry.

The multivariable significant continuous predictors (Age and P/F ratio) achieved, separately, an acceptable AUC performance, comprised between 0.7 and 0.8 (Table 4), in agreement with the Hosmer’s indication10.

Table 4 Area under curve (AUC) estimation for the multivariable significant continuous death predictors.

Discussion

The pandemic diffusion of SARS-CoV-2 infection has suddenly thrown the international scientific community into uncertainty; we are facing a novel virus, with absolutely peculiar features, which is pushing all the National Health Systems under a level of pressure unexperienced before. One of the most challenging aspects of COVID-19 is its heterogeneity, with clinical pictures ranging from very mild to rapidly fatal; currently, it is unclear why some patients develop severe life-threatening disease, although possible pathogenetic mechanisms include a hyper-inflammatory reaction and a state of hypercoagulability11,12. Similarly, we are still unable to predict who will undergo clinical impairment and who, instead, will not. To make the situation even more confused, the clinical course and the prognosis of COVID-19 show huge differences worldwide. The fatality rate reported in South Europe and the United States of America (USA) is, for instance, significantly higher than in China or in North Europe4. It follows that findings obtained in a specific country might not be automatically extended to different geographic regions and that the depiction of national cohorts might contribute to explain this heterogeneity and to better stratify patients.

Although Italy has been hit hardly by the outbreak, especially in the North of the country, cohort studies describing the outcomes and the general features of COVID-19 patients in our geographic area are still lacking. This study was designed to fill this gap. According to our data, the in-hospital mortality in Northern Italy has been dauntingly high, close to 30%. In an outbreak, the infection fatality rate, i.e. the proportion of deaths among all the infected individuals, is commonly difficult to ascertain; this is particularly true for the infection by SARS-CoV-2, because of the presence of asymptomatic infected subjects prevents accurate estimates for the general population. While some countries applied a more stringent policy of intensive swabs testing, others were less prone to test subjects who are scarcely symptomatic. Moreover, the diagnostic accuracy of serological testing is still unclear. As we ignore the incidence of SARS-CoV-2 infection in the general population, we can only speculate that the fatality rate must be far lower than the CFR among hospitalized patients since hospital admission is limited to patients with a severe clinical picture. However, the difference between our data and those observed in Chinese cohorts is striking. In fact, in the earliest reports, the mortality among hospitalized patients has been estimated between 2.2 and 3.2%13,14. The ten-fold higher mortality observed in our cohort is probably related to a more severe clinical picture in our patients at hospital admission. Conceivably, the wider diffusion of the outbreak in Northern Italy led to admit to the hospital only those patients with a more severe clinical picture. Moreover, the first reports from China were based on populations with a median age significantly lower than that in our cohort, which might in part explain this discrepancy. In agreement with this hypothesis, in a recent retrospective Chinese study, the in-hospital mortality (28%) was similar to the one we observed, being older age, d-dimer levels and higher Sequential Organ Failure Assessment (SOFA) score on admission associated with higher odds of in-hospital death15.

To the best of our knowledge, no cohort studies are investigating in-hospital mortality in Italy. Recently, a collaborative initiative described the outcome and predictors of death in patients admitted to ICU in Lombardy, reporting a high mortality rate. In fact, among the 1581 patients with ICU disposition data available, 920 patients were still in the ICU at data censoring, 256 were discharged from the ICU, and 405 had died in the ICU16. However, there are no reports about a non-ICU setting. The largest cohort reported to date in Western Countries has been recently published by Richardson et al. and refers to 5700 COVID-19 inpatients in the New York City area17. Out of them, at data censoring, 2634 had completed their hospital stay, with a fatality rate of 21% (N = 553).

The mortality is, therefore, lower than in our cohort, but it should be kept in mind that USA and Italy are facing different phases of the epidemic; it can be argued that the fatality rate might be higher during the peak of the outbreak, which has been reached earlier in Italy, possibly explaining this discrepancy. However, the mortality among hospitalized patients in Western Countries is significantly higher than the one initially reported in China.

It is also interesting to remark the trend of mortality according to days from the clinical onset and hospital admission. Although the largest part of patients dies in the first days after hospital admission, there is a not negligible proportion of subjects who die later; actually, it is common experience of clinicians managing COVID-19 that besides those subjects showing a severe disease at hospital admission, there is a group of patients who clinically deteriorate after some days of hospital stay.

We then aimed to evaluate which clinical predictors might identify patients at higher risk of mortality. First of all, we considered only potential clinical predictors. According to our data age was confirmed as a strong independent predictor of mortality in all multivariable models. The impact of age is well defined in COVID-19 natural history and has been confirmed in any geographic region13,14,15,17,18. The real impact of gender is less clear even though there is a larger proportion of males than females dying because of COVID-19. According to the data reported by the Italian National Health System, males account for around 60% of total COVID-19 deaths, similar to what we reported19. This suggests a protective effect for the female gender; however, this gender difference seems not to exist in inpatients, in whom the age and the underlying comorbidities are more relevant15. Therefore, it is reasonable to postulate that females are protected against the development of severe COVID-19 infection, but once it is developed the risk of death is similar to males.

Among comorbidities, CAD, malignancies, chronic kidney disease, dementia, and hypertension were predictive of death at univariate analysis, but only malignancies and obesity fit into the multivariable analysis model. The latter variables seem to be, therefore, the most predictive comorbidities of death by COVID-19. The effect of CAD, chronic kidney disease, hypertension and dementia may be masked by the impact of age, being all age-related diseases. Cancer has already been described as an independent predictor of death. In particular, in a recent case–control study from an Italian group, COVID-19 patients mortality was significantly greater than that of a control group20. Similarly, obesity was already reported as an independent predictor of mortality in COVID-19, possibly because of the detrimental impact of fat-deriving cytokines on the clinical course of the disease21,22. Its predictive role is even more relevant considering the low prevalence of obesity in elderly, in our region (9.8% in Piedmont vs. 14.0% in Italy)23. The real impact of smoking is more debated and less defined; while some authors advocate its detrimental impact on the patient’s prognosis24, others conversely postulate a protective effect deriving from the down-regulation of Angiotensin Converting Enzyme-2 (ACE-2) expression in lungs25. According to our data, current smokers are at increased risk of mortality, although we should acknowledge that this information, as well as details about history of smoking, could not be retrieved for around one-hundred patients, making it challenging to accurately evaluate the effect of this risk factor. Therefore, further studies on larger cohorts are required to better elucidate this issue. However, our findings seem to be supported by a recent meta-analysis on 11590 COVID-19 patients, according to which smokers have higher odds of COVID-19 progression than never smokers26. Intriguingly, all the predictors of mortality that we identified are linked to prothrombotic status; this might be particularly relevant in COVID-19, in which arterial and venous thrombosis seems to play a pivotal role in determining a worse prognosis27.

Besides demographics and medical history, laboratory and clinical data may help in risk stratification; despite neutrophil count and creatinine predicted mortality at univariate analysis, in the multivariable model, the only laboratory predictor of death was P/F ratio. Respiratory failure severity seems to be a driving element in defining the prognosis; consistently, other relevant clinical predictors are dyspnea and a higher respiratory rate, whose association with a greater in-hospital mortality has already been reported13,15.

Our work contributed to identify a clinical phenotype of patients at higher mortality risk in a large Italian cohort of patients; in the near future and particularly in case of a further outbreak this might contribute to better identify those patients requiring a stricter follow-up and monitoring to detect early clinical deterioration.

Our study has several limitations; first of all, we miss clinical data of part of the population and we have laboratory data of only 256 patients. This is due to different concomitant causes, starting with the retrospective design of the study that prevented us to collect all the relevant data in a significant proportion of subjects. Moreover, and even more relevantly, this research was conducted during a National medical Emergency, which involved many clinicians in the management of a very high number of patients, making very hard to focus and dedicate time to research projects. The design of the study did not allow to accurately retrieve data able to stage the underlying diseases, potentially up or down-scoring the net effect of each comorbidity. Furthermore, as criteria for hospitalization of COVID-19 patients are different across different Institutions, an inclusion bias cannot be excluded in this regard. Finally, as this is an observational study, residual confounding factors may exist.

Conclusions

In Italy, the COVID-19 outbreak determined a high in-hospital mortality, the main clinical predictors of which were age, current smoking, obesity, and a concomitant diagnosis of cancer. Among lab predictors, the P/F ratio, mirroring the severity of the respiratory failure, was the major factor associated with a severe prognosis.