Risk profiles for negative and positive COVID-19 hospitalized patients

https://doi.org/10.1016/j.compbiomed.2021.104753Get rights and content

Highlights

  • PAM clustering with consensus mapping for the unsupervised risk-profile discovery of COVID-19 subjects.

  • The unsupervised clustering models predict the clusters of COVID-19 patients with different combinations of risk factors.

  • The supervised decision trees were not able to find consistent decision rules from the discovered COVID-19 risk-profiles.

  • Gender, hypertension, diabetes, and obesity are potentially the main high-risk factors for COVID-19 mortality.

Abstract

COVID-19 is a viral infection that affects people differently, where the majority of cases develop mild symptoms, some people require hospitalization, and unfortunately, a small number of patients perish. Hence, identifying risk factors is critical for physicians to make treatment decisions. The purpose of this article is to determine whether unsupervised analysis of risk factors in positive and negative COVID-19 subjects can aid in the identification of a set of reliable and clinically relevant risk profiles. Positive and negative patients hospitalized were randomly selected from the Mexican Open Registry between March and May 2020. Thirteen risk factors, three distinct outcomes, and COVID-19 test results were used to categorize registry patients. As a result, the dataset was reported via 6144 different risk profiles for each age group. The unsupervised learning method is proposed in this study to discover the most prevalent risk profiles. The data was partitioned into discovery (70%) and validation (30%) sets. The discovery set was analyzed using the partition around medoids (PAM) method, and the stable set of risk profiles was estimated using robust consensus clustering. The PAM models' reliability was validated by predicting the risk profile of subjects from the validation set and patients admitted in November 2020. In the validation set, the clinical relevance of the risk profiles was evaluated by determining the prevalence of three patient outcomes: pneumonia diagnosis, ICU admission, or death. Six positive and five negative COVID-19 risk profiles were identified, with significant statistical differences between them. As a result, PAM clustering with consensus mapping is a viable method for discovering unsupervised risk profiles in subjects with severe respiratory health problems.

Introduction

Due to the rapid spread of the SARS-CoV-2 virus worldwide, the Coronavirus Disease 2019 (COVID-19) pandemic outbreak has become a public health emergency of international concern. The high mortality risk associated with COVID-19, which ranges between 2% and 20% depending on the availability and quality of medical resources and economic conditions [1,2], is one of the pandemic's primary concerns. Another issue is that many recovered patients experience long-term sequelae that impact their lives and may have economic consequences [3,4]. As a result, effective treatments are needed to improve or cure COVID-19 cases and control the disease's effects.

Identifying and characterizing the various risk profiles of infected subjects is a critical task in managing COVID-19. The accurate characterization of a subject's risk profile is critical for the prompt selection of effective treatment for that particular patient. Additionally, it may facilitate effective medical resource allocation and provide critical information to identify and protect the most vulnerable populations [5]. Numerous studies have been conducted in this risk profiling field. COVID-19 studies identified the most critical disease severity risk factors, including advanced age, male gender, obesity, and smoking, as well as comorbidities such as hypertension, diabetes, hematologic, renal, cardiovascular, and respiratory diseases, all of which may have a significant impact on the prognosis of COVID-19 infected subjects [[5], [6], [7], [8], [9], [10], [11], [12]]. Additionally, Gansevoort et al. discovered that subjects with chronic kidney disease have an extremely high risk of mortality from COVID-19 [13].

As previously stated, it is critical to identify risk factors, and more importantly, have tools that can predict disease severity in at-risk populations; thus, various supervised approaches have been proposed to identify risk factors for COVID-19 progression. The most frequently used methods for modeling risk factors for disease severity prediction in patients with COVID-19 have been univariate and multivariate ordinal logistic regression models [14]. In comparison, Ji et al. used multivariate Cox regression to investigate the risk factors for COVID-19 progressing to a critical or fatal state [15]. These efforts have been conducted in various settings or with limited clinical data [[16], [17], [18]].

Additionally, supervised approaches are limited because many possible risk factors can be associated with the severity of the outcome. Each risk factor and its combination generate various possible COVID-19 risk profiles; thus, an extensive data set is required to train complex statistical models accurately. A novel approach is proposed to address the risk factor combination issue, based on unsupervised data clustering for identifying robust patterns in subjects' risk presentations that are easily associated with disease severity and outcome [19]. This study aims that by using clustering to identify patients' risk profiles, data analysis for treatment decisions can be streamlined.

There are numerous algorithms for data clustering [[20], [21], [22], [23]]. Certain algorithms can be thought of as statistical clustering strategies [[24], [25], [26]]. They are robust approaches that result in models that adequately describe data, with each model containing explicit factors that aid in data comprehension [27,28]. Additionally, novel algorithmic advances facilitate the discovery of robust data clusters in multidimensional data sets. Consensus clustering is one such technique [29]. Consensus clustering utilizes multiple iterations of the clustering method of choice to discover the most reliable partitions from multidimensional data sets. Additionally, Partitioning Around Medoids (PAM) is a robust statistical clustering algorithm that aims to find K-medoids that minimize the sum of the observations' dissimilarities to their nearest medoid [30]. The proposed method utilizes consensus clustering and the PAM clustering algorithm to determine the risk profiles of patients.

The purpose of this study is to determine whether the unsupervised discovery of risk profiles for COVID-19 and non-COVID-19 patients seeking medical attention can aid in identifying a subset of hospitalized patients at increased risk of either: 1) developing pneumonia; 2) requiring admission to an intensive care unit (ICU), or 3) perishing as a result of the infection. To accomplish this, we used the Open Mexican Repository, which collects COVID-19 test results, outcomes (pneumonia diagnosis, hospitalization, and death), and known risk factors such as age, gender, pregnancy, smoking, obesity, and common comorbidities such as hypertension and diabetes.

Section snippets

Data preparation

The preliminary data for this study were obtained on May 9, 2020, from the COVID-19 Mexican Open Repository, maintained by the Mexican government's General Directorate of Epidemiology [31]. On June 8, the dataset was updated to ensure the best possible patient outcome. The dataset contained 128,148 subjects and included the following variables: patient ID, age, sex, exposure history, obesity, smoking, pregnancy, patient type (ambulatory/hospitalized), and other underlying comorbidities

Results

As the discovery set, a cohort of 33,325 patients with positive and negative COVID-19 tests were analyzed who were hospitalized from March to May. Additionally, hospitalized subjects with positive and negative COVID-19 tests in November (N = 50157) were investigated to add a new patient group to the test set. The characteristics of positive and negative COVID-19 hospitalized patients in the discovery and test sets are summarized in Table 1, Table 2 Differences in their statistical significance

Discussion

This study discovered, described, and classified the risk profiles of hospitalized COVID-19 positive and negative subjects. Initially, a detailed combinatory analysis of 6144 different risk profiles of hospitalized Mexican patients stratified by age was conducted. The detailed analysis identified the risk factors associated with the top ten profiles by gender, COVID-19 test result, and age. According to the analysis of positive patients, hypertension, diabetes, and obesity were prevalent among

Conclusion

This study demonstrated the use of consensus clustering in conjunction with PAM models to identify the most consistent risk profiles among COVID infected and non-infected patients. Additionally, CART analysis was used to describe the relationship between newly discovered risk factors and each risk profile. The findings demonstrated that the proposed method could identify a small set of the most prevalent risk profiles for both data sets, and it may be a valuable tool for filtering out the most

Acknowledgments

This research was supported with funding from the Mexican National Council for Science and Technology (CONACYT). The authors are thankful to Dr. Víctor Treviño, Dr. Emmanuel Martínez and Dr. Santiago Conant-Pablos for all valuable comments and suggestions, which helped us to improve the quality of the article.

References (52)

  • Sana Salehi et al.

    Long-term pulmonary consequences of coronavirus disease 2019 (COVID-19): what we know and what to expect

    J. Thorac. Imag.

    (2020)
  • Yuetian Yu

    Identification of risk factors for mortality associated with COVID-19

    PeerJ

    (2020)
  • Wei-jie Guan

    Comorbidity and its impact on 1590 patients with covid-19 in China: a nationwide analysis

    Eur. Respir. J.

    (2020)
  • Giacomo Grasselli

    Risk factors associated with mortality among patients with COVID-19 in intensive care units in Lombardy, Italy

    JAMA Inter. Med.

    (2020)
  • Annemarie B. Docherty

    Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study

    bmj

    (2020)
  • Lindsay Kim

    Risk factors for intensive care unit admission and in-hospital mortality among hospitalized adults identified through the US coronavirus disease 2019 (COVID-19)-associated hospitalization surveillance network (COVID-NET)

    Clin. Infect. Dis.

    (2020)
  • Tao Liu

    Risk factors associated with COVID-19 infection: a retrospective cohort study based on contacts tracing

    Emerg. Microb. Infect.

    (2020)
  • Zhaohai Zheng

    Risk factors of critical & mortal COVID-19 cases: a systematic literature review and meta-analysis

    J. Infect.

    (2020)
  • Elizabeth R. Lusczek

    Characterizing COVID-19 clinical phenotypes and associated comorbidities and complication profiles

    PloS One

    (2021)
  • Ron T. Gansevoort et al.

    CKD is a key risk factor for COVID-19 mortality

    Nat. Rev. Nephrol.

    (2020)
  • Dong Ji

    Prediction for progression risk in patients with COVID-19 pneumonia: the CALL Score

    Clin. Infect. Dis.

    (2020)
  • Qiao Shi

    Clinical characteristics and risk factors for mortality of COVID-19 patients with diabetes in Wuhan, China: a two-center, retrospective study

    Diabetes Care

    (2020)
  • Ling Hu

    Risk factors associated with clinical outcomes in 323 coronavirus disease 2019 (COVID-19) hospitalized patients in Wuhan, China

    Clin. Infect. Dis.

    (2020)
  • Fahimeh Nezhadmoghadam

    Robust Discovery of Mild Cognitive impairment subtypes and their Risk of Alzheimer's Disease conversion using unsupervised machine learning and Gaussian Mixture Modeling

    (2020)
  • Mayra Z. Rodriguez

    Clustering algorithms: a comparative approach

    PloS One

    (2019)
  • Divya Pandove et al.

    Systematic review of clustering high-dimensional and large datasets

    ACM Trans. Knowl. Discov. Data

    (2018)
  • Cited by (5)

    • Region Grouping Based On Sales Results Using K-Medoids Algorithm At PT RMK

      2022, Proceedings of 2022 International Conference on Information Management and Technology, ICIMTech 2022
    View full text