Abstract
Multidimensional host and viral factors determine the clinical course of COVID-19. While the virology of the disease is well studied, investigating host-related factors, including genome, transcriptome, metabolome, and exposome, can provide valuable insights into the underlying pathophysiology. We conducted integrative omics analyses to explore their intricate interplay in COVID-19. We used data from the UK Biobank (UKB), and employed single-omics, pairwise-omics, and multi-omics models to illustrate the effects of different omics layers. The dataset included COVID-19 phenotypic data as well as genome, imputed-transcriptome, metabolome and exposome data. We examined the main, interaction effects and correlations between omics layers underlying COVID-19. Single-omics analyses showed that the transcriptome (derived from the coronary artery tissue) and exposome captured 3–4% of the variation in COVID-19 susceptibility, while the genome and metabolome contributed 2–2.5% of the phenotypic variation. In the omics-exposome model, where individual omics layers were simultaneously fitted with exposome data, the contributions of genome and metabolome were diminished and considered negligible, whereas the effects of the transcriptome showed minimal change. Through mediation analysis, the findings revealed that exposomic factors mediated about 60% of the genome and metabolome’s effects, while having a relatively minor impact on the transcriptome, mediating only 7% of its effects. In conclusion, our integrative-omics analyses shed light on the contribution of omics layers to the variance of COVID-19.
Similar content being viewed by others
Background
Coronavirus disease 2019 (COVID-19) is a viral-induced inflammatory disease that causes acute respiratory distress syndrome (ARDS) in some patients1,2. Infectious diseases including COVID-19 are viewed differently from other complex diseases as they are largely determined by exposure to a microbial pathogen, with some contribution from host-related factors3. While prior knowledge suggests that multidimensional host, viral, and environmental factors determine the clinical course of COVID-194, advances in research on the disease have been largely driven by understanding of virus biology, infection mechanisms, and virulence5,6, with less emphasis on host-associated and external features7. It is important to note that not all people exposed to the virus will develop the same disease8,9, and given disease heterogeneity, understanding host-associated factors is a better way to look for possible interventions.
Genetic and environmental factors greatly contribute to phenotypic variation among individuals10,11. While genetic studies have demonstrated associations between genetic variations and complex traits, a comprehensive mechanistic understanding of these relationships remains uncertain12,13. This underscores the importance of delving into how multi-omics layers interact to unravel the details of disease etiology and progression. To capture the proportion of unexplained phenotypic variation, comprehensive analysis of the genome in conjunction with other multi-omics data (transcriptome, proteome, metabolome, exposome, etc.) is required to define phenotypic variation between populations and/or individuals, to maximize etiological understanding of disease12,14. Several omics approaches, such as genomics, transcriptomics, proteomics, metabolomics, and exposomes, each reveal the association of corresponding omics layers with complex traits13,15,16.
The use of multi-omics approaches in the analysis of complex traits has been relatively limited, with only a few studies exploring its potential12. Zhou and Lee17 employed an integrative analysis of genomic and exposomic data that enabled to capture a greater proportion of phenotypic variance for anthropometric traits such as body mass index (BMI) and height, resulting in improved accuracy in phenotypic prediction, compared to the genomic data alone. The authors also highlighted that both additive and non-additive effects of multi-omics contribute to shaping the phenotypic variance. Similarly, another integrative analysis of genome and exposome18 focused on mental health was applied, showing that in addition to the main effects, genome-exposome interactions played a significant role in mental health traits, including internalizing and externalizing symptoms. Moreover, researchers have now expanded the integrative multi-omics approach by incorporating additional layers of omics data such as transcriptomics, proteomics, metabolomics, lipidomics and methylomics. These comprehensive analyses have been used to dissect the influence of both genetic and environmental factors on various diseases like Type 2 diabetes, osteoarthritis, cardiovascular diseases, Alzheimer’s disease and systemic lupus erythematosus19,20,21,22.
Attempts have been made to explore the host genetic basis of COVID-19 through genome-wide association studies23,24, but the genome still captures the few portions of the disease phenotypic variance and little attention has been paid to other biological facets. Virus-host relationships are complex and difficult to fully characterize, so integrated multiple genome-wide omics approaches may be useful. Few studies have attempted to examine the potential role of molecular signatures (e.g., genome and transcriptome) and clinical covariates in association with the COVID-19 phenotype, but the results are either based on individual omics analyses or on the integration of a few omics layers25,26. The integrative analysis of multi-omics data may also enhance the understanding of the molecular dynamics underlying the pathophysiology of COVID-19, and may lead to novel strategies for early detection, prevention, and treatment of the disease.
In this study, we conducted a comprehensive multistage integrative multi-omics analysis covering genome, imputed gene-expression levels (transcriptome), metabolome and exposome (Supplementary note 1), to explore the interplay among these factors and their effects on COVID-19. We first estimated both additive and non-additive variance components for each omics layer, i.e. quantifying individual omics contribution to COVID-19 phenotypic variation. We also explored the correlations and interactions between individual omics effects on COVID-19. For the purpose of quantifying interaction and correlation effects, we applied a novel linear mixed model known as CORE-REML27, which can handle multiple variance-covariance structures and explicitly estimates the covariance between random effects. By comprehensively examining these multiple omics layers, we aimed to gain insights into the multi-omics data underlying COVID-19, uncover novel biomarkers, and understand the interrelationships between genetic, transcriptomic, metabolic, and environmental factors.
Methods
Ethics declarations
We used data from the UK Biobank (UKB) for our analyses. The UKB has approval from the Northwest Multi-centre Research Ethics Committee (MREC), National Information Governance Board for Health & Social Care (NIGB), and Community Health Index Advisory Group (CHIAG) (http://www.ukbiobank.ac.uk/ethics/). UKB has obtained informed consent from all participants. These ethical regulations cover the work in this study and our accession to the UKB data was under the reference number 14,575. All methods were performed in accordance with the relevant guidelines and regulations.
Phenotypic data and case definition
The present study used both phenotypic and genotypic data from UKB participants who underwent COVID-19 testing. Based on their test results, we classified participants into three categories: those who tested negative, those with infection presenting with moderate symptoms, and those with infection presenting with severe symptoms28,29. Severe cases were defined as individuals who received a positive clinical diagnosis (acute respiratory distress syndrome, sepsis, septic shock, and etc.) of COVID-19 and required hospital admission, intensive care unit admission with respiratory support (either non-invasive or invasive ventilation) or died from COVID-19 as the primary cause of death. In contrast, moderate cases were defined as individuals who received a clinical diagnosis of COVID-19 and were either inpatients or outpatients not requiring respiratory support, clinically diagnosed individuals in the general population, or self-reported quarantined individuals. Taking into account the clinical course of the infection, we defined the COVID-19 phenotype according to the ordinal disease classification proposed by the World Health Organization (WHO)28. Therefore, controls were COVID-19-negative individuals (coded 0), and cases (who tested positive for COVID-19) were further divided into moderate cases (coded 1) and severe cases (coded 2) based on clinical severity scores.
UK Biobank and population
The UKB is a population-based cohort of over 500,000 people (aged 40 to 69 years at recruitment) recruited from England, Scotland and Wales between 2006 and 2010. UKB provides comprehensive baseline measurements/questionnaires, biomarkers, demographic characteristics (e.g., age, gender, socioeconomic status, etc.) and longitudinal clinical phenotypes (cancer, death, hospitalization, etc.). UKB has also recently made available COVID-19 research data (including test results, death registrations, hospital admissions, primary care, and other data). As of March 3, 2022, a total of 434,119 COVID-19 tests have been conducted, of which nearly 9.5% (40,949) were positive results. A total of 144,278 people were tested for COVID-19, of whom 28,003 tested positive during the period. Specifically, this study analysed the largest ancestry group, the British white population (408,183 participants), to minimize genetic heterogeneity (Supplementary note 2).
Study design
This study aimed to investigate the extent to which genetic and non-genetic factors contribute to COVID-19 phenotypic variation. Specifically, we analysed the genome, transcriptome, metabolome, and exposome to understand their respective impacts. The details on the phenotypic data and each omics signature are provided below.
As shown in Fig. 1, our analysis began by estimating the single nucleotide polymorphism-based heritability (SNP-h2) of COVID-19 based on host genome information. Next, we explored the contributions of imputed transcriptome (tissue gene expression levels), biomarkers (including lipids and amino acids, and etc.), and exposomic characteristics (socio-demographic data, physical measures, behavioural factors, population structure to account for genetic similarity, diet intake, and medical conditions) using a linear mixed model. Additionally, we investigated the interaction effects of the exposome with the genome, transcriptome, metabolome, and exposome on COVID-19 (Fig. 1). This design allowed us to identify potential gene-environment interactions that could impact COVID-19 susceptibility and severity. Overall, this multi-omics approach allowed us to explore the complex interplay between genetic and non-genetic factors in COVID-19 phenotypic variation and identify potential targets for further research and intervention.
A diagram of study design for partitioning of COVID-19 phenotypic variance using multi-omics data. The phenotypic variance partitioning analyses were based on multi-omics layers, including genome (G), transcriptome (T), metabolome (M) and exposome (E). Phenotypic variance also captured by omics-exposome interactions, such as genome-exposome (GxE), transcriptome-exposome (TxE), and metabolome-exposome (MxE). Interactions between omics layers, genome-transcriptome (GxT), and transcriptome-metabolome (TxM) were also explored to examine variance components. Furthermore, it is worth noting that due to complex interplays between omics layers, part of the phenotypic variation can be explained by covariance structures, thus revealing omics correlations. Correlation structures were sought between genome & exposome (rG,E), genome & transcriptome (rG,T), transcriptome & exposome (rT,E), transcriptome & metabolome (rT,M), and metabolome & exposome (rM,E). Apart from metabolomics data (only 23520 samples were accessed), other omics data were obtained from 107857 UKB participants. Genome and metabolome interplay was not carried out because the effect of the genome was negligible, based on the effect of metabolome-matched cases on phenotypic variation.
Genotypic data and quality control
This study analysed individuals who were of White British ancestry (119,132 individuals) to minimise the effects of population stratification. Quality controls (QC) procedures have been carried out at the SNP and individual levels, and a total of 7,701,772 SNPs and 107,857 individuals remained for further analyses. For SNP-level QC, variants with info score < 60%, multi-character allele codes, minor allele frequency (MAF) < 1%, the Hardy-Weinberg equilibrium (HWE) P < 1e-7, SNP call rate < 95% and duplicate ID variants were excluded in down-stream analyses. For individual-level QC, individuals with a missing rate of genotype > 5%, gender mismatch, poor genotype quality or a sex chromosome aneuploidy, and non-white British subjects were excluded from the main analyses. Furthermore, we assessed the genetic relatedness between a pair of individuals after constructing a genetic relationship matrix (GRM) and applied relatedness cut off QC (> 5%), using genome-wide complex trait analysis (GCTA)30. Thus, out of a total of 144,278 COVID-19 test subjects, 119,132 were retained according to the quality control steps described above, and 107,857 (57362 females, 50495 males) were finally considered for downstream analysis after removal of pairwise relationship > 5% (Fig. 2). To estimate SNP-h2, we extracted 1,118,829 SNPs from the HapMap3 database, which is known for its high-quality SNPs17,31, aiming to improve computational performance and reliability.
Transcriptomic data and imputation of gene expression levels
To better understand the genetic factors underlying COVID-19, we utilized imputation to estimate genetically predicted gene expression levels using individual genotype data27,32,33,34,35. This process was carried out using MetaXcan, which is an updated version of PrediXcan. Specifically, we used tissue-specific elastic net models that incorporated around 200,000 cis-expression quantitative trait loci (cis-eQTLs) based on GTEx v8, the reference transcriptome dataset.
We generated transcriptome data for 18 tissues (e.g., coronary artery, whole blood, spleen and etc.) that were selected based on their potential roles in the pathogenesis of the disease36,37,38. We estimated gene expression levels for almost all genes across 18 tissues and used approximately 90% of eQTLs (SNPs) in COVID-19 genotype data to predict transcriptome information (Supplementary note 3: Table S2-S3).
Metabolomic data
Our study utilized metabolomic biomarkers consisting of 249 metabolites from 118,461 individuals from UKB39. These metabolites were measured in preserved blood samples collected from the UKB cohort between 2006 and 2010. This dataset provided information on the plasma concentration levels of circulating lipids, lipoprotein subclasses, fatty acid composition, and various other low-molecular metabolites. However, we only had access to metabolome data for 23,520 COVID-19 cases.
To measure these biomarkers, the UK Biobank used a high-throughput NMR-based metabolic biomarker profiling platform to analyse randomly selected EDTA plasma samples (aliquot 3). This allowed them to measure the metabolomic biomarkers efficiently and accurately in our study participants.
Exposomic data
Our analysis incorporates a comprehensive set of exposomic characteristics, including socio-demographic and population structure, dietary intake, and medical conditions.
These features were selected based on prior research that underscores their significant role in influencing susceptibility to COVID-19, as evidenced by the existing literature (Table 1). We collected and analysed exposomic data from 107,857 participants in the UK Biobank, along with their associated COVID-19 data. We evaluated the entirety of all exposomic features by constructing a single cohesive framework or variance-covariance matrix for assessing their global effect on COVID-19 susceptibility. Additionally, we assessed the individual effect of each exposomic variable on COVID-19 (Table S4).
Modelling and data analysis
We first estimated gene expression levels in 18 tissues using MetaXcan at the individual SNP level. The estimated tissue-specific transcriptome association signal with COVID-19 was then assessed using multiple linear regression. Five tissue models (e.g., coronary artery, whole blood, spleen, musculoskeletal, and adipose viscera) were selected based on the significance level and used for downstream comprehensive analysis. Then, generalized linear models were used to adjust the COVID-19 phenotype for fixed effects (genotype measurement batch and UKB assessment centre) before applying the linear mixed model framework. We also constructed GRM with the 1,118,829 SNPs based on the HapMap3 SNPs, transcriptomic relation matrix (TRM) with tissue-specific gene expression levels (1634–8490 genes), metabolomic relationship matrix (MRM) with 249 metabolites, and exposomic relationship matrix (ERM) with 43 exposomic features. We also generated matrices based on interaction or correlation between each pair of omics layers. We estimated the variance components due to additive and interaction effects of omics using advanced linear mixed model based on genome-based restricted maximum likelihood (GREML). Additionally, we applied CORE-GREML to estimate correlations between random omics effects. COVID-19 phenotypic variances explained by genome, transcriptome, metabolome, exposome, interaction, and covariance between pairs of omics layers are described below (Table 2). Given the sample size of the omics datasets, the variance component analysis offered adequate statistical power to capture the proportion of phenotypic variance (https://shiny.cnsgenomics.com/gctaPower/)52 (Figure S2). Data analyses were conducted using MTG2 (linear mixed models including CORE-GREML)53, PLINK version 254, MetaXcan (transcriptome imputation)55, and R-programming (data manipulation and visualization).
Additionally, mediation analysis was conducted to explore the pathways through which independent variables including genomic, transcriptomic, and metabolomic data affect the dependent variable COVID-19, via a mediator variable (MV), the exposome. This analysis utilized omics-specific risk scores56,57,58, including genomic, transcriptomic, metabolomic, and exposomic risk scores. These scores were derived by splitting the data into an 80% discovery set and a 20% test set to avoid overfitting, and were implemented in MTG253. To ensure result robustness, five-fold cross-validation59,60,61 was applied, and average estimates were used to assess the exposome’s mediating effect on the genome, transcriptome, and metabolome in relation to COVID-19 susceptibility. Individual based risk scores from genome, transcriptome, metabolome, and exposome datasets were loaded, and linear regression models were fitted to estimate the relationships between these variables. Mediation analysis was conducted with the R mediation package61 to estimate the average causal mediation effect (ACME), direct effect (DE), total effect and the proportion of mediation. Bootstrapping was used to obtain robust confidence intervals for the indirect effects.
Results
The effects of single omics on COVID-19 phenotypic variance
We first investigated the contributions of individual omics layers to the phenotypic variation of COVID-19 status, using a single-effect linear mixed model that fits each omics at a time. We found that genomic factors (SNP-h2) explained 2.5% (Standard error [Se] = 0.3%, P-value = 4.3e-15) of the phenotype variation (Fig. 3).
For transcriptomic effects, we identified significant effects in only 5 tissues: adipose viscera, coronary arteries, musculoskeletal, spleen, and whole blood. Interestingly, the highest proportion of phenotypic variation explained by imputed gene expression was found in coronary tissue at 3.4% (Se = 0.2%, P-value = 4.97e-62), whereas the effect of transcriptome in other tissues remained below 1% (Fig. 3).
In addition to gene expression, we also separately examined the contribution of metabolomic and exposomic data to COVID-19 phenotypic variation, estimated at 2.0% (Se = 0.3%, P-value = 4.2e-13) and 4.0% (Se = 0.8%, P-value = 2.7e-06), respectively (Fig. 3). Overall, exposome had the highest proportion of contribution to the phenotypic variation, followed by the coronary transcriptome, genome, and metabolome. Conversely, gene expression levels in musculoskeletal, visceral adipose, and whole blood had the least impact on COVID-19 phenotypic variation.
Contribution of individual omics data to COVID-19 phenotypic variance based on single-effect linear mixed models. The omics data include genome (G), transcriptome (T), metabolome, and exposome (E), and tissue-specific transcriptome data from coronary (CT), muscle (MT), spleen (ST), adipose (AT), and blood (BT) tissues. Exposome contributes the highest proportion to the phenotypic variance (4%), followed by imputed gene expression levels in coronary artery tissue (3.4%) and genome (2.5%). The x-axis represents the proportion of COVID-19 phenotypic variance explained by each omics data, while the y-axis represents the different tissues and omics data.
Pairwise cross-omics analysis on COVID-19
To gain a comprehensive understanding of phenotypic variance in COVID-19, we conducted a pairwise omics analysis to determine the contribution of each pair of omics to the phenotypic variance. Our analyses, illustrated in Fig. 4, demonstrates that combining transcriptome data from the coronary artery tissue and exposome in the model can capture nearly 7% of the COVID-19 phenotypic variation. Joint models that incorporate the genome and exposome, as well as the genome and transcriptome of the spleen tissue, explain 5% of the phenotypic variance each. We also found that the transcriptome data of the muscle skeletal tissue and metabolome significantly contribute to COVID-19, explaining 4.5% of the phenotypic variance. However, incorporating transcriptomes of other tissues with the genome did not increase the proportion of phenotypic variance explained, as compared to the genome alone. These findings suggest that analysing the exposome in conjunction with other omics layers can provide valuable insights into understand host-related etiologic factors of COVID-19, explaining up to 7% of the phenotypic variance, compared to other pairs of omics. Therefore, to estimate the extent to which the effects of omics layers are mediated by the exposome, we specifically analysed the omics layers (genome, transcriptome, and metabolome) influenced by environmental factors. By fitting multiple omics data types and exposomic effects in the same model, we can better comprehend how each omics layer contributes to COVID-19 phenotypic variance.
Contribution of pairwise omics to COVID-19 phenotypic variance. Pairwise omics analyses of COVID-19 were based on genome (G), coronary transcriptome (CT), spleen transcriptome (ST), muscle transcriptome (MT), metabolome (M) and exposome (E). In the analyses, we used a multi-effects linear mixed model that simultaneously fits pairwise omics layers to examine which parts of omics had a significant effect on COVID-19. In general, fitting expsomic and coronary transcriptomic data captures a larger proportion of COVID-19 phenotypic variance. The diagonal elements are the estimates from the single omics analysis, while the off-diagonals indicate the estimates from the pairwise omics analyses.
Omics effects on COVID-19 mediated by exposome
In Fig. 5, we present the results of our study regarding the phenotypic variance partitioning for COVID-19. Specifically, we investigated the interplay between the exposome and each of the genome, transcriptome, and metabolome in a combined model. Our analysis revealed that the additive effect of the exposome was relatively constant (~ 4%, p = 2.7e-06) across all omics combinations. Interestingly, we found that the genomic effect from the joint model fitting genome and exposome (G-E) was modest (~ 1%, p = 5.6e-04), while the metabolomic effect was also reduced but still significant (~ 0.4%, p = 3.5e-03). To further explore the role of the exposome in mediating gene expression, we also investigated the impact of exposome jointly along with transcriptome effects in each of five tissues (coronary arteries, spleen, musculoskeletal, whole blood, and adipose viscera). Our results showed that the additive effects of transcriptome in whole blood (~ 0.2%, p = 9.5e-02) and adipose visceral (~ 0.1%, p = 1.8e-01) became non-significant from joint models, although they exhibited sizeable significant signals in the single omics models (Fig. 3). However, we observed no marked changes in the effects of transcriptome in the other tissues (ranging from 3.4 to 3.1% for the coronary arteries, 0.6–0.5% for the spleen, and 0.6–0.5% for the musculoskeletal) upon considering the exposome layer in the model.
Supporting the results of omics-exposome models, we investigated how exposomic factors mediate the effects of the genome, transcriptome, and metabolome on COVID-19 susceptibility, with results validated through five-fold cross-validation (Fig. 6). Our findings reveal that exposomic factors significantly mediate the relationship between these omics layers and COVID-19 risk. Specifically, exposomic factors accounted for 46–83% of the genomic effects on COVID-19, with an average mediation effect of about 60%. This indicates a substantial impact of environmental and lifestyle factors on genetic susceptibility to the disease. Similarly, exposomic factors mediated 40–89% of the effects of the metabolome on COVID-19, with an average of approximately 60%, underscoring their crucial role in influencing metabolomic responses. In contrast, the mediating effect of exposomic factors on the transcriptome, particularly in coronary tissues, was relatively minor, accounting for only 4–8% (average 7%) of the transcriptomic impact on COVID-19. This suggests that while exposomic factors significantly influence genomic and metabolomic pathways, their effect on transcriptomic pathways, especially in coronary tissues, is limited. Mediation analysis for other tissue transcriptomes (such as adipose, blood, muscle, and spleen) was not performed due to their minimal impact on COVID-19.
Furthermore, we aimed to capture non-additive effects, including interactions and correlations among the omics datasets in relation to COVID-19; however, most estimations yielded no significant signals (Tables S5 and S6). Significant estimates were observed only between the genome and transcriptome (Figure S3-S4, & Table S9). Our analysis showed that the joint model, which incorporated the exposome and one molecular layer (genome, transcriptome, and metabolome) at a time, outperformed the single-omics models in capturing phenotypic variation, as demonstrated by higher model-fits (Table S7). Furthermore, paired-omics models provided accurate estimates compared to individual-only models (Table S8).
Contribution of omics data and exposome to COVID-19 phenotypic variance. The joint model fitting each omics data type with exposome estimated the relative contributions of each omics type, such as genome (G), transcriptome (T), and metabolome (M), while simultaneously accounting for the impact of environmental factors (E) on COVID-19 phenotypes, using a multi-effects linear mixed model. The x-axis represents the proportion of COVID-19 phenotypic variance explained by each omics data and by exposome data, and the y-axis represents different models including the genome-exposome (G-E), multiple tissue transcriptome-exposome (CT-E; MT-E; ST-E; AT-E; BT-E), and metabolome-exposome (M-E) models. The transcriptome-exposome models are based on tissue-specific gene expression profiles and exposome data, including coronary transcriptome-exposome (CT-E), muscle transcriptome – exposome (MT-E), spleen transcriptome-exposome (ST-E), adipose transcriptome -exposome (AT-E), and blood transcriptome (BT-E).
Mediation analysis of omics effects on COVID-19 Susceptibility by exposomic Factors. The five-fold cross-validation analysis (folds 1 through 5) was employed to determine the range of mediation effects. The mediation proportions represent the degree to which exposomic factors influence the relationship between each omics layer and susceptibility to COVID-19. The figure illustrates the proportion of the effects of genomic, coronary transcriptome, and metabolomic factors on COVID-19 susceptibility that are mediated by exposomic factors (left to right, respectively). Each fold presents the proportion of mediation along with 95% confidence intervals (CIs).
Analysis of transcriptome, metabolome and exposome for COVID-19
We also extended the multi-omics analysis by integrating transcriptomic, metabolomic, and exposome data from matched samples, aiming to measure the variance component of COVID-19. It is noted that due to the limited availability of metabolomics data (nearly 20% of samples), genomic information had to be excluded from the analysis because genomic effects were deemed insignificant given this reduced samples. Instead, by integrating the three layers into the multi-omics model, we sought to conduct variance partitioning analysis on COVID-19 (Table S10). The results demonstrated a strong exposome effect (~ 4% of phenotypic variance), with a slight reduction in the coronary transcriptome effect from 3.4 to 2.4%. The metabolomic data contributed a marginal effect, explaining only 0.3% of the phenotypic variation. Also, no apparent interaction or correlation between the transcriptome and metabolome effects were observed, indicating that only the main additive effects of these omics contribute to COVID-19.
Discussion
We applied statistical genetics models to study the effect of multiple omics on COVID-19. A single layer of omics can only provide limited insights into the biological mechanisms of a disease, implying consideration of several biological data is substantially helpful to fully dissect the complex molecular processes involved in disease development17,18. The complementary or antagonistic effects of biological signatures such as DNA, RNA, proteins, and metabolites on the development of complex traits or diseases have been studied12. However, previous COVID-19 studies have mainly investigated the contribution of host genetic risk factors7,24 and ignoring the potential role of other biological aspects, namely the transcriptome, metabolome, and exposome. Therefore, we employed a comprehensive integrative multi-omics analysis of COVID-19, incorporating four distinct omics layers involving the composite structure of SNP effects (genome), imputed gene expression level (transcriptome), metabolites (metabolome) and environmental features (exposome). Notably, we explored the variance components of COVID-19, specifically partitioning them into additive and non-additive omics effects, including interactions and correlations between omics layers.
We first systematically analysed the contribution of individual omics to COVID-19 by applying a single-omics model. Thus, the exposome and transcriptome (estimated using the coronary artery model) explained 3–4% of the variance in COVID-19, which is almost double the variance captured by the genomic (2.5%) and metabolomic (2%) effects. This has been demonstrated in omics-exposome models (i.e. simultaneous fitting of individual omics to the exposome), where the genome and metabolome shrink significantly when fitting with the exposome. This may highlight the role of the environment in mediating genomic and metabolomic impacts on COVID-19. In contrast, the transcriptome-induced phenotype variation remains stable. In particular, coronary arteries showed a strong COVID-19 expression signal in the tested tissue models used for transcriptomic analysis. Supporting evidence suggests that the relative expression of SARS-CoV-2 entry genes, including angiotensin-converting enzyme 2 (ACE2) and Basigin (BSG), is prominent in the endothelial layer of vascular tissues, including coronary arteries62,63. ACE2 and BSG are surface receptors that facilitate viral uptake by host cells, and their expression increases with age62,64,65. As the current study involved UKB cohort participants aged 40–69, this may be one of the factors contribute to the significant transcriptome effect on COVID-19 in this age group. Single-omics analysis highlights the individual contribution of omics components in COVID-19 susceptibility, with certain tissues and omics data having a greater impact on phenotypic variation than others.
The analysis also revealed that the exposome has a strong signal for COVID-19, not only by itself, but also by modulating other omics layers. The study specifically examined the global influence of dietary intake, medical conditions (cancer, diabetes, asthma, etc.), and sociodemographic as exposome characteristics. Thus, combined effects account for a substantial proportion of phenotypic variation, highlighting the contribution of exogenous actors to the pathophysiology of COVID-19. Observational studies also support this, and the analysis of individual exposures (regardless of overall environmental determinants) demonstrate a potential role for the progression of COVID-19 66,67,68. Collectively, the findings suggest that the exposome plays a crucial role in shaping disease phenotypes, and considering multiple molecular layers simultaneously can enhance our understanding of the mechanisms underlying COVID-19.
Through mediation analysis, study examined the significant role of exposomic factors in mediating the relationship between genomic, transcriptomic, and metabolomic influences on COVID-19 susceptibility69. The substantial mediation of genomic effects by exposomic factors underscores the importance of environmental and lifestyle factors in shaping genetic predispositions to COVID-1970. This finding emphasizes the complex interplay between genetic susceptibility and external influences, suggesting that interventions targeting environmental factors could potentially mitigate genetic risks associated with COVID-1971. Similarly, the significant mediation of metabolomic effects by exposomic factors further illustrates how environmental exposures can modify metabolic responses, reinforcing the crucial role of external factors in understanding COVID-19 susceptibility72. This highlights the need for a further analysis to explore how specific genetic and metabolic factors are mediated by environmental factors, aiding in the development of targeted strategies to avoid or modify external risk factors. Conversely, the relatively minor mediation of transcriptomic effects by exposomic factors suggests a more limited role of environmental factors in modulating transcriptomic pathways, particularly in coronary tissues. This finding indicates that while exposomic factors substantially influence genomic and metabolomic pathways, their impact on transcriptomic pathways may be less pronounced. It is possible that the transcriptomic response in specific tissues, such as the coronary artery, may be more directly regulated by intrinsic genetic and cellular mechanisms rather than external environmental factors. These results underscore the need for further research to elucidate the specific mechanisms through which exposomic factors influence transcriptomic responses and to explore potential tissue-specific interactions that may contribute to COVID-19 susceptibility. Overall, the results from the omics-exposome model and mediation analysis highlighted the critical role of the exposome in COVID-19.
It’s worth mentioning that the results of the study might not apply broadly to other ethnic groups since the study specifically used data from White British population only. Evidence shows that genetic variations can differ significantly between populations due to a combination of biological and environmental elements73,74. Genetic variants associated with phenotypes could vary in allele frequencies and effects across ancestral backgrounds75. This variation can affect gene-environment interactions, implying that the way genes and the environment interact plays a substantial role in determining an individual’s susceptibility or resistance to diseases such as COVID-19. Focusing on a single ethnic group, we risk overlooking these crucial differences, potentially resulting in an incomplete understanding of the biological mechanisms underlying COVID-19. Therefore, it is advised to conduct studies involving diverse populations to better understand the etiology of COVID1-19.
In addition to these generalizability concerns, the study has specific limitations. First, not all COVID-19 determinants were examined in the exposome analysis; however, considering existing knowledge, potential predictors consisting of 43 exposome features (e.g., medical conditions) were comprehensively evaluated. Second, another limitation arises from the accuracy constraints of the elastic-net model used for transcriptome imputation, as evidenced by significant gene expression being imputed in only a few tissues. Third, while the study investigated how exposomic features mediate the effects of the genome, transcriptome, and metabolome on COVID-19, it did not account for potential interaction effects between these omics layers in the mediation analysis. This is primarily because imputation generates predictions rather than direct measurements of gene expression, which can introduce biases in identifying genuine interactions76,77. As result, we restricted the analysis considering main effects to minimize imputation errors that could be amplified during interaction analysis. Lastly, the current integrative multi-omics approach does not include proteomic information. Given that the proteome is a key component of the omics landscape, future research incorporating proteomics-based methods will be crucial for further illuminating the variations in the clinical presentation of COVID-19.
Despite these limitations, the study has several notable strengths. First, the study integrated multi-omics data, including genome, transcriptome, metabolome, and exposome, each provides information on unique aspects of COVID-19. Bringing them together further provides new insights that cannot be obtained when analysing each of them independently, thereby helping to elucidate the underlying mechanisms of the disease. In other words, the study was able to reveal the interplay between molecular features, thus providing a new dimension for further understanding of biological clues. Second, in addition to the main omics effects, the study was able to dissect non-additive effects arising from complex interactions between omics layers, such as interaction and correlation effects. Furthermore, methodologically, this study applied novel linear mixed models (GREML and COR-GREML), which can accurately and efficiently integrate and interpret multidimensional omics data.
In conclusion, our study showed that an integrated analysis of COVID-19 using multi-omics data revealed the potential contribution of each layer of omics data to the disease. In particular, exposomes and transcriptomics appear to have independent contributions and explained considerable amount of variation in the clinical presentation of COVID-19. In contrast, the effects of genomics and metabolomics were small, and even adding the exposome into the model significantly attenuated the phenotypic variance explained by these two layers. We also observed a strong mediation effect of exposome on genome and metabolome effects on COVID-19, but a relatively weak mediating effect on transcriptomic influences. The results of study suggest that efficiently analysing and considering additional omics data (e.g., epigenome and proteome) may provide a more comprehensive biological insights into the variation of COVID-19 clinical presentation.
Data availability
The data used for this study are available from the UKB under an approved data request (https://www.ukbiobank.ac.uk/). Used tools along with code with related files can be accessed from the following web-resources.- MTG2 https://sites.google.com/site/honglee0707/mtg2 or from https://github.com/honglee0707/IGE.- PLINK https://www.cog-genomics.org/plink/. - MetaXcan https://github.com/hakyimlab/MetaXcan. - GCTA https://yanglab.westlake.edu.cn/software/gcta/. - LDSC https://github.com/bulik/ldsc.
Abbreviations
- ACME:
-
Average causal mediation effect (ACME)
- AT :
-
Imputed transcriptome from adipose-visceral
- BT :
-
Imputed transcriptome from whole blood
- CORE-GREML:
-
Covariance between random effects-genome-based restricted maximum likelihood
- COVID-19:
-
Coronavirus disease 2019
- CT :
-
Imputed transcriptome from coronary arteries
- DE:
-
Direct effect
- E:
-
Exposome
- ERM:
-
Exposomic relationship matrix
- G:
-
Genome
- GREML:
-
Genome-based restricted maximum likelihood
- GRM:
-
Genetic relationship matrix
- M:
-
Metabolome
- MRM:
-
Metabolomic relationship matrix
- MT :
-
Imputed transcriptome from muscle skeletal
- MV:
-
Mediator variable
- QC:
-
Quality control
- SNPs:
-
Single nucleotide polymorphisms
- SNP-h2 :
-
SNP-based heritability
- ST :
-
Imputed transcriptome from spleen
- T:
-
Transcriptome
- TRM:
-
Transcriptomic relationship matrix
- UKB:
-
UK Biobank
References
Karmouty-Quintana, H. et al. Emerging mechanisms of pulmonary vasoconstriction in SARS-CoV-2-induced acute respiratory distress syndrome (ARDS) and potential therapeutic targets. Int. J. Mol. Sci. 21, 8081 (2020).
Thapa, K. et al. COVID-19-Associated acute respiratory distress syndrome (CARDS): Mechanistic insights on therapeutic intervention and emerging trends. Int. Immunopharmacol. 101, 108328 (2021).
Zhang, X. et al. Viral and host factors related to the clinical outcome of COVID-19. Nature. 583, 437–440 (2020).
Zsichla, L. & Müller, V. Risk factors of severe COVID-19: A review of host, viral and environmental factors. Viruses. 15, 175 (2023).
V’kovski, P., Kratzel, A., Steiner, S., Stalder, H. & Thiel, V. Coronavirus biology and replication: Implications for SARS-CoV-2. Nat. Rev. Microbiol. 19, 155–170 (2021).
Sharma, H. N., Latimore, C. O. & Matthews, Q. L. Biology and pathogenesis of SARS-CoV-2: Understandings for therapeutic developments against COVID-19. Pathogens. 10, 1218 (2021).
Velavan, T. P. et al. Host genetic factors determining COVID-19 susceptibility and severity. EBioMedicine. 72, 103629 (2021).
Shelton, J. F. et al. Trans-ethnic analysis reveals genetic and non-genetic associations with COVID-19 susceptibility and severity. MedRxiv https://doi.org/10.1101/2020.09.04.20188318 (2020).
Chen, P. Z. et al. Heterogeneity in transmissibility and shedding SARS-CoV-2 via droplets and aerosols. Elife 10, e65774 (2021).
Ismail, S. & Essawi, M. Genetic polymorphism studies in humans. Middle East. J. Med. Genet. 1, 57–63 (2012).
Marderstein, A. R. et al. Leveraging phenotypic variability to identify genetic interactions in human phenotypes. Am. J. Hum. Genet. 108, 49–67 (2021).
Sun, Y. V. & Hu, Y. J. Integrative analysis of multi-omics data for discovery and functional studies of complex human diseases. Adv. Genet. 93, 147–190 (2016).
Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 1–15 (2017).
Akiyama, M. Multi-omics study for interpretation of genome-wide association study. J. Hum. Genet. 66, 3–10 (2021).
Manzoni, C. et al. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief. Bioinform. 19, 286–302 (2018).
Niedzwiecki, M. M. et al. The exposome: Molecules to populations. Annu. Rev. Pharmacol. Toxicol. 59, 107–127 (2019).
Zhou, X. & Lee, S. H. An integrative analysis of genomic and exposomic data for complex traits and phenotypic prediction. Sci. Rep. 11, 21495 (2021).
Choi, K. W. et al. Integrative analysis of genomic and exposomic influences on youth mental health. J. Child Psychol. Psychiatry. 63, 1196–1205 (2022).
Kreitmaier, P., Katsoula, G. & Zeggini, E. Insights from multi-omics integration in complex disease primary tissues. Trends Genet. (2022).
Leon-Mimila, P., Wang, J. & Huertas-Vazquez, A. Relevance of multi-omics studies in cardiovascular diseases. Front. Cardiovasc. Med. 6, 91 (2019).
Wang, S., Yong, H. & He, X. D. Multi-omics: opportunities for research on mechanism of type 2 diabetes mellitus. World J. Diabetes. 12, 1070 (2021).
Usova, E. et al.
Pairo-Castineira, E. et al. Genetic mechanisms of critical illness in COVID-19. Nature. 591, 92–98 (2021).
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature (2021).
Lipman, D., Safo, S. E. & Chekouo, T. Integrative multi-omics approach for identifying molecular signatures and pathways and deriving and validating molecular scores for COVID-19 severity and status. BMC Genom. 24, 1–17 (2023).
Sameh, M. et al. Integrated multiomics analysis to infer COVID-19 biological insights. Sci. Rep. 13, 1802 (2023).
Zhou, X., Im, H. K. & Lee, S. H. CORE GREML for estimating covariance between random effects in linear mixed models for complex trait analyses. Nat. Commun. 11, 4208 (2020).
Rubio-Rivas, M. et al. WHO ordinal scale and inflammation risk categories in COVID-19. Comparative study of the severity scales. J. Gen. Intern. Med. 37, 1980–1987 (2022).
Murray, M. F. et al. COVID-19 outcomes and the human genome. Genet. Sci. 22, 1175–1177 (2020).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Momin, M. M. et al. A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data. Nat. Commun. 14, 722 (2023).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Fryett, J. J., Inshaw, J., Morris, A. P. & Cordell, H. J. Comparison of methods for transcriptome imputation through application to two common complex diseases. Eur. J. Hum. Genet. 26, 1658–1667 (2018).
Chen, J., Fu, Z., Iraji, A., Calhoun, V. D. & Liu, J. Imputed gene expression versus single nucleotide polymorphism in predicting gray matter phenotypes. MedRxiv https://doi.org/10.1101/2023.05.05.23289592. (2023).
Liang, Y. et al. Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries. Genome Biol. 23, 1–18 (2022).
Alqutami, F., Senok, A. & Hachim, M. COVID-19 transcriptomic atlas: A comprehensive analysis of COVID-19 related transcriptomics datasets. Front. Genet. 12, 755222 (2021).
Park, J. et al. System-wide transcriptome damage and tissue identity loss in COVID-19 patients. Cell. Rep. Med. 3, 100522 (2022).
Mavrikaki, M., Lee, J. D., Solomon, I. H. & Slack, F. J. Severe COVID-19 is associated with molecular signatures of aging in the human brain. Nat. Aging. 2, 1130–1137 (2022).
Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat. Commun. 14, 604 (2023).
Schäfer, A. A., Santos, L. P., Quadra, M. R., Dumith, S. C. & Meller, F. O. Alcohol consumption and smoking during COVID-19 pandemic: Association with sociodemographic, behavioral, and mental health characteristics. J. Community Health. 47, 588–597 (2022).
Woodward, M., Peters, S. A. & Harris, K. Social deprivation as a risk factor for COVID-19 mortality among women and men in the UK Biobank: Nature of risk and context suggests that social interventions are essential to mitigate the effects of future pandemics. J. Epidemiol. Community Health. 75, 1050–1055 (2021).
Gao, M. et al. Associations between body-mass index and COVID-19 severity in 6.9 million people in England: A prospective, communitybased, cohort study. Lancet Diabetes Endocrinol. 9(6), 350–359. https://doi.org/10.1016/S2213-8587(21)00089-9 (2021).
Singh, R. et al. Association of obesity with COVID-19 severity and mortality: An updated systemic review, meta-analysis, and meta-regression. Front. Endocrinol. 13, 780872 (2022).
Rosoff, D. B., Yoo, J. & Lohoff, F. W. Smoking is significantly associated with increased risk of COVID-19 and other respiratory infections. Commun. Biology. 4, 1230 (2021).
Hu, J., Li, C., Wang, S., Li, T. & Zhang, H. Genetic variants are identified to increase risk of COVID-19 related mortality from UK Biobank data. Hum. Genomics. 15, 1–10 (2021).
Aman, F. & Masood, S. How Nutrition can help to fight against COVID-19 pandemic. Pakistan J. Med. Sci. 36, 121 (2020).
Vu, T. H. T., Rydland, K. J., Achenbach, C. J., Van Horn, L. & Cornelis, M. C. Dietary behaviors and incident COVID-19 in the UK Biobank. Nutrients 13, 2114 (2021).
Kim, H. et al. Plant-based diets, pescatarian diets and COVID-19 severity: A population-based case–control study in six countries. BMJ Nutr. Prev. Health. 4, 257 (2021).
Gęca, T., Wojtowicz, K., Guzik, P. & Góra, T. Increased risk of COVID-19 in patients with diabetes mellitus—current challenges in pathophysiology, treatment and prevention. Int. J. Environ. Res. Public Health. 19, 6555 (2022).
Freeman, V. et al. Are patients with cancer at higher risk of COVID-19-related death? A systematic review and critical appraisal of the early evidence. J. Cancer Policy 33, 100340 (2022).
Sharifi, Y. et al. Association between cardiometabolic risk factors and COVID-19 susceptibility, severity and mortality: a review. J. Diabetes Metabolic Disorders. 20, 1743–1765 (2021).
Visscher, P. M. et al. Statistical power to detect genetic (co) variance of complex traits using SNP data in unrelated samples. PLoS Genet. 10, e1004269 (2014).
Lee, S. H. & Van der Werf, J. H. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics. 32, 1420–1422 (2016).
Chang, C. C. et al. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4 https://doi.org/10.1186/s13742-015-0047-8 (2015).
hakyimlab. MetaXcan. Retrieved from (2024). https://github.com/hakyimlab/MetaXcan/tree/master..
Dawes, C. T., Okbay, A., Oskarsson, S. & Rustichini, A. A polygenic score for educational attainment partially predicts voter turnout. Proceedings of the National Academy of Sciences 118, e2022715118 (2021).
Li, J. et al. Orbitofrontal cortex volume links polygenic risk for smoking with tobacco use in healthy adolescents. Psychol. Med. 52, 1175–1182 (2022).
Yingxuan, E. et al. in AMIA Annual Symposium Proceedings. 422 (American Medical Informatics Association).
Imai, K., Keele, L. & Tingley, D. A general approach to causal mediation analysis. Psychol. Methods. 15, 309 (2010).
Gareth, J., Daniela, W., Trevor, H. & Robert, T. An Introduction to Statistical Learning: With Applications in R (Spinger, 2013).
Tingley, D., Yamamoto, T., Hirose, K., Keele, L. & Imai, K. Mediation: R package for causal mediation analysis (2014).
Ahmetaj-Shala, B. et al. Cardiorenal tissues express SARS-CoV-2 entry genes and basigin (BSG/CD147) increases with age in endothelial cells. Basic. Translational Sci. 5, 1111–1123 (2020).
Kulasinghe, A. et al. Transcriptomic profiling of cardiac tissues from SARS-CoV‐2 patients identifies DNA damage. Immunology. 168, 403–419 (2023).
Baker, S. A., Kwok, S., Berry, G. J. & Montine, T. J. Angiotensin-converting enzyme 2 (ACE2) expression increases with age in patients requiring mechanical ventilation. PLoS One 16, e0247060 (2021).
AlGhatrif, M. et al. Age-associated difference in circulating ACE2, the gateway for SARS-COV-2, in humans: results from the InCHIANTI study. GeroScience 43, 619–627 (2021).
Hu, H. et al. An external exposome-wide association study of COVID-19 mortality in the United States. Sci. Total Environ. 768, 144832 (2021).
Brandt, E. B. & Mersha, T. B. Environmental determinants of coronavirus disease 2019 (COVID-19). Curr. Allergy Asthma Rep. 21, 1–11 (2021).
Hu, H. et al. A spatial and contextual exposome-wide association study and polyexposomic score of COVID-19 hospitalization. Exposome 3, osad005 (2023).
Naughton, S. X., Raval, U., Harary, J. M. & Pasinetti, G. M. The role of the exposome in promoting resilience or susceptibility after SARS-CoV-2 infection. J. Expo. Sci. Environ. Epidemiol. 30, 776–777 (2020).
Dey, S. et al. in AMIA Annual Symposium Proceedings. 378 (American Medical Informatics Association).
Cormier, S. A., Yamamoto, A., Short, K. R., Vu, L. & Suk, W. A. Environmental impacts on COVID-19: Mechanisms of increased susceptibility. Annals Global Health 88 (2022).
Sh, A. et al. Metabolome and exposome profiling of the biospecimens from COVID-19 patients in India. Журнал микробиологии эпидемиологии и иммунобиологии, 397–415 (2021).
Bouchard, T. J. Jr & McGue, M. Genetic and environmental influences on human psychological differences. J. Neurobiol. 54, 4–45 (2003).
Creanza, N. & Feldman, M. W. Worldwide genetic and cultural change in human evolution. Curr. Opin. Genet. Dev. 41, 85–92 (2016).
Momin, M. M., Zhou, X., Hyppönen, E., Benyamin, B. & Lee, S. H. Cross-ancestry genetic architecture and prediction for cholesterol traits. Hum. Genet. 143, 635–648 (2024).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Acknowledgements
We would like to acknowledge the UKB participants and the UKB team as this study was conducted using UKB resources. Computational analysis was performed using servers (hscpl-statgen.cw.unisa.edu.au and hscpl-statgen2.cw.unisa.edu.au) provided by the University of South Australia. Data analysis was also conducted using the Gadi server under University of South Australia and the National Computational Merit Allocation Scheme (NCMAS) of Australian Government. We would like to thank the Statistical Genetics Group at the Australian Centre for Precision Health for their help in providing quality controlled genotypic and phenotypic data. SE’s PhD is funded by the University of South Australia and the Australian Government Research Training Program (RTPi). This research is supported by the Australian Research Council (DP190100766).
Author information
Authors and Affiliations
Contributions
SHL conceived the idea and supervised the study. SE and SHL conceptualized and designed the analyses. SE preformed data extraction, quality control and analysis. SE and SHL wrote the first draft of the manuscript. BB, EH and KWC contributed valuable insights and feedback, and meticulously edited and refined the manuscript. All authors discussed the results and contributed to the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Eshetie, S., Choi, K.W., Hyppönen, E. et al. Integrative multi-omics analysis to gain new insights into COVID-19. Sci Rep 14, 29803 (2024). https://doi.org/10.1038/s41598-024-79904-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-79904-z