Risk profiles for negative and positive COVID-19 hospitalized patients
Introduction
Due to the rapid spread of the SARS-CoV-2 virus worldwide, the Coronavirus Disease 2019 (COVID-19) pandemic outbreak has become a public health emergency of international concern. The high mortality risk associated with COVID-19, which ranges between 2% and 20% depending on the availability and quality of medical resources and economic conditions [1,2], is one of the pandemic's primary concerns. Another issue is that many recovered patients experience long-term sequelae that impact their lives and may have economic consequences [3,4]. As a result, effective treatments are needed to improve or cure COVID-19 cases and control the disease's effects.
Identifying and characterizing the various risk profiles of infected subjects is a critical task in managing COVID-19. The accurate characterization of a subject's risk profile is critical for the prompt selection of effective treatment for that particular patient. Additionally, it may facilitate effective medical resource allocation and provide critical information to identify and protect the most vulnerable populations [5]. Numerous studies have been conducted in this risk profiling field. COVID-19 studies identified the most critical disease severity risk factors, including advanced age, male gender, obesity, and smoking, as well as comorbidities such as hypertension, diabetes, hematologic, renal, cardiovascular, and respiratory diseases, all of which may have a significant impact on the prognosis of COVID-19 infected subjects [[5], [6], [7], [8], [9], [10], [11], [12]]. Additionally, Gansevoort et al. discovered that subjects with chronic kidney disease have an extremely high risk of mortality from COVID-19 [13].
As previously stated, it is critical to identify risk factors, and more importantly, have tools that can predict disease severity in at-risk populations; thus, various supervised approaches have been proposed to identify risk factors for COVID-19 progression. The most frequently used methods for modeling risk factors for disease severity prediction in patients with COVID-19 have been univariate and multivariate ordinal logistic regression models [14]. In comparison, Ji et al. used multivariate Cox regression to investigate the risk factors for COVID-19 progressing to a critical or fatal state [15]. These efforts have been conducted in various settings or with limited clinical data [[16], [17], [18]].
Additionally, supervised approaches are limited because many possible risk factors can be associated with the severity of the outcome. Each risk factor and its combination generate various possible COVID-19 risk profiles; thus, an extensive data set is required to train complex statistical models accurately. A novel approach is proposed to address the risk factor combination issue, based on unsupervised data clustering for identifying robust patterns in subjects' risk presentations that are easily associated with disease severity and outcome [19]. This study aims that by using clustering to identify patients' risk profiles, data analysis for treatment decisions can be streamlined.
There are numerous algorithms for data clustering [[20], [21], [22], [23]]. Certain algorithms can be thought of as statistical clustering strategies [[24], [25], [26]]. They are robust approaches that result in models that adequately describe data, with each model containing explicit factors that aid in data comprehension [27,28]. Additionally, novel algorithmic advances facilitate the discovery of robust data clusters in multidimensional data sets. Consensus clustering is one such technique [29]. Consensus clustering utilizes multiple iterations of the clustering method of choice to discover the most reliable partitions from multidimensional data sets. Additionally, Partitioning Around Medoids (PAM) is a robust statistical clustering algorithm that aims to find K-medoids that minimize the sum of the observations' dissimilarities to their nearest medoid [30]. The proposed method utilizes consensus clustering and the PAM clustering algorithm to determine the risk profiles of patients.
The purpose of this study is to determine whether the unsupervised discovery of risk profiles for COVID-19 and non-COVID-19 patients seeking medical attention can aid in identifying a subset of hospitalized patients at increased risk of either: 1) developing pneumonia; 2) requiring admission to an intensive care unit (ICU), or 3) perishing as a result of the infection. To accomplish this, we used the Open Mexican Repository, which collects COVID-19 test results, outcomes (pneumonia diagnosis, hospitalization, and death), and known risk factors such as age, gender, pregnancy, smoking, obesity, and common comorbidities such as hypertension and diabetes.
Section snippets
Data preparation
The preliminary data for this study were obtained on May 9, 2020, from the COVID-19 Mexican Open Repository, maintained by the Mexican government's General Directorate of Epidemiology [31]. On June 8, the dataset was updated to ensure the best possible patient outcome. The dataset contained 128,148 subjects and included the following variables: patient ID, age, sex, exposure history, obesity, smoking, pregnancy, patient type (ambulatory/hospitalized), and other underlying comorbidities
Results
As the discovery set, a cohort of 33,325 patients with positive and negative COVID-19 tests were analyzed who were hospitalized from March to May. Additionally, hospitalized subjects with positive and negative COVID-19 tests in November (N = 50157) were investigated to add a new patient group to the test set. The characteristics of positive and negative COVID-19 hospitalized patients in the discovery and test sets are summarized in Table 1, Table 2 Differences in their statistical significance
Discussion
This study discovered, described, and classified the risk profiles of hospitalized COVID-19 positive and negative subjects. Initially, a detailed combinatory analysis of 6144 different risk profiles of hospitalized Mexican patients stratified by age was conducted. The detailed analysis identified the risk factors associated with the top ten profiles by gender, COVID-19 test result, and age. According to the analysis of positive patients, hypertension, diabetes, and obesity were prevalent among
Conclusion
This study demonstrated the use of consensus clustering in conjunction with PAM models to identify the most consistent risk profiles among COVID infected and non-infected patients. Additionally, CART analysis was used to describe the relationship between newly discovered risk factors and each risk profile. The findings demonstrated that the proposed method could identify a small set of the most prevalent risk profiles for both data sets, and it may be a valuable tool for filtering out the most
Acknowledgments
This research was supported with funding from the Mexican National Council for Science and Technology (CONACYT). The authors are thankful to Dr. Víctor Treviño, Dr. Emmanuel Martínez and Dr. Santiago Conant-Pablos for all valuable comments and suggestions, which helped us to improve the quality of the article.
References (52)
- et al.
COVID-19 cardiac injury: implications for long-term surveillance and outcomes in survivors
Heart Rhythm
(2020) Clinical characteristics of different risk-profiles and risk factors for the severity of illness in patients with COVID-19 in Zhejiang, China
Infect. Dis. Poverty
(2020)Risk factors for predicting mortality in elderly patients with COVID-19: a review of clinical data in China
Mech. Ageing Dev.
(2020)Excess out-of-hospital mortality and declining oxygen saturation: the sentinel role of emergency medical services data in the COVID-19 crisis in Tijuana, Mexico
Ann. Emerg. Med.
(2020)Classification and regression tree (CART) analysis of endometrial carcinoma: seeing the forest for the trees
Gynecol. Oncol.
(2013)- et al.
Estimating the number of clusters in a dataset via consensus clustering
Expert Syst. Appl.
(2019) Clustering ensemble based on sample's stability
Artif. Intell.
(2019)Prevalence of obesity among adult inpatients with COVID-19 in France
Lancet Diabetes Endocrinol.
(2020)- et al.
Early epidemiological analysis of the 2019-nCoV outbreak based on a crowdsourced data
(2020) Epidemiological and clinical features of the 2019 novel coronavirus outbreak in China
(2020)
Long-term pulmonary consequences of coronavirus disease 2019 (COVID-19): what we know and what to expect
J. Thorac. Imag.
Identification of risk factors for mortality associated with COVID-19
PeerJ
Comorbidity and its impact on 1590 patients with covid-19 in China: a nationwide analysis
Eur. Respir. J.
Risk factors associated with mortality among patients with COVID-19 in intensive care units in Lombardy, Italy
JAMA Inter. Med.
Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study
bmj
Risk factors for intensive care unit admission and in-hospital mortality among hospitalized adults identified through the US coronavirus disease 2019 (COVID-19)-associated hospitalization surveillance network (COVID-NET)
Clin. Infect. Dis.
Risk factors associated with COVID-19 infection: a retrospective cohort study based on contacts tracing
Emerg. Microb. Infect.
Risk factors of critical & mortal COVID-19 cases: a systematic literature review and meta-analysis
J. Infect.
Characterizing COVID-19 clinical phenotypes and associated comorbidities and complication profiles
PloS One
CKD is a key risk factor for COVID-19 mortality
Nat. Rev. Nephrol.
Prediction for progression risk in patients with COVID-19 pneumonia: the CALL Score
Clin. Infect. Dis.
Clinical characteristics and risk factors for mortality of COVID-19 patients with diabetes in Wuhan, China: a two-center, retrospective study
Diabetes Care
Risk factors associated with clinical outcomes in 323 coronavirus disease 2019 (COVID-19) hospitalized patients in Wuhan, China
Clin. Infect. Dis.
Robust Discovery of Mild Cognitive impairment subtypes and their Risk of Alzheimer's Disease conversion using unsupervised machine learning and Gaussian Mixture Modeling
Clustering algorithms: a comparative approach
PloS One
Systematic review of clustering high-dimensional and large datasets
ACM Trans. Knowl. Discov. Data
Cited by (5)
Automated identification of patient subgroups: A case-study on mortality of COVID-19 patients admitted to the ICU
2023, Computers in Biology and MedicineAssessment of Transfusion Practices Among Doctors During COVID-19 Pandemic Using Questionnaire-Based Survey
2023, Indian Journal of Hematology and Blood TransfusionRegion Grouping Based On Sales Results Using K-Medoids Algorithm At PT RMK
2022, Proceedings of 2022 International Conference on Information Management and Technology, ICIMTech 2022