Introduction

In December 2019, coronavirus disease 2019 (COVID-19) was discovered in Wuhan, China, and then spread across the country [1,2,3]. According to Chinese guideline, COVID-19 is clinically divided into four types: mild, common, severe, and critically severe [4]. Compared with healthy individuals, mild COVID-19 shows no obvious CT abnormality, but the other three do. Yang et al [5] reported that the mortality of severe COVID-19 in Wuhan exceeded 60%. Therefore, early and correct evaluation of COVID-19 severity and timely treatment can improve patients’ prognosis.

CT plays an important role in the evaluation of COVID-19. Clinical CT results are made by physicians through analyzing the size, shape, position, and internal density of CT images. However, this analysis fails to capture diagnostic/prognostic information about the lesion. Through, texture analysis, mathematical methods are used to extract meaningful characteristics of an image at different gray levels, so as to quantify the lesion’s heterogeneity [6]. This retrospective study was performed to analyze the correlations among CT texture features, clinical features, and clinical subtypes of COVID-19 and to explore the value of CT texture analysis in determining the severity of COVID-19.

Materials and methods

Patients and ethical approval

Our institutional review board (IRB) waived written informed consent for this retrospective study, which evaluated de-identified data and brought no potential risk to patients. To avert any potential breach of confidentiality, no link between the patients and the researchers was available.

The patients’ data were collected from the First Affiliated Hospital of University of Science and Technology of China and the Affiliated Infectious Disease Hospital. During the period between January 20, 2020 and February 20, 2020, patients were included if they met the following criteria: (1) exhibiting positive results of 2019-nCoV nucleic acids and (2) having undergone chest CT examination during the initial diagnosis (within 3 days after admission). Excluded were those who had no obvious lung CT abnormalities or had pneumonia caused by other common bacterial or viral pathogens. According to the clinical classification criteria, 81 patients were enrolled (60 common cases and 21 severe cases).

Scanning

CT was performed for all patients within 3 days after disease onset, with a 128-slice CT detector (NeuViz 128) without contrast agent. The scanning parameters were as follows: tube voltage, 120 kVp; tube current, 150 mA; rotation time, 0.8 s; slice thickness, 5 mm; slice interval, 5 mm; pitch, 1.2; matrix, 512 × 512; and breath-holding when fully aspirated.

Region-of-interest segmentation and feature extraction

All images were segmented on the LK2.1 software package (GE Healthcare). First, the images were resampled to voxel size 1 × 1 × 1 mm3, and a Gaussian filter was applied for denoising. Then, the lung was automatically segmented into five lobes, three-dimensional volumes of interest (VOIs) were created for each lobe, and the CT score was calculated [7]. If the automatic segmentation failed to create favorable volumes, an experienced radiologist was assigned to manually delineate the VOIs, and another radiologist to check the segmentation. Any difference in opinions was resolved through negotiation. Pyradiomics was performed to extract features. In total, 1042 features were extracted, including histogram features, gray-level co-occurrence matrix, gray-level size zone matrix (GLSZM), and gray-level run length matrix (GLRLM) features.

Statistical analysis of radiomics and clinical features

All statistical analyses were performed using R (version 3.5.1; www.R-project.org) software. First, for radiomics features and CT score, the independent t test or Wilcoxon test was used. For clinical features described as continuous variables, the Mann–Whitney U test was used for abnormal distributions and t test for normal distributions. For clinical features described as nominal variables, chi-square test or Fisher’s exact test was used. Features with p <0.05 were deemed statistically significant. Second, for radiomics features, univariate logistic analysis was performed to evaluate whether the features were discriminative in two groups (p < 0.05). Then, the minimum redundancy and maximum relevance (MRMR) algorithm was applied to further select the relevant and non-redundant features. For clinical factors, univariate logistic analysis was performed to find the discriminative features (p < 0.05). Third, for both radiomics features and clinical features, backward stepwise multivariate logistic regression selection was performed, and the likelihood ratio test was used to select the subset of the most predictive features and construct the predictive model. Finally, to verify the reliability of texture analysis, 100-fold leave-group-out cross-validation (LGOCV) was performed. Finally, Spearman correlation analyses were performed to evaluate the correlations between the predictive radiomics features and clinical factors.

Results

Eighty-one patients with a mean age of 51.35 ± 14.31 years diagnosed with COVID-19 were included. Of which, 21 (26%) had severe and 60 (74%) had common symptoms.

The CT characteristics included ground-glass opacity (GGO) or GGO combined with fine grid or consolidation. Patients with severe disease exhibited diffuse lesions; in four patients, initial CT images showed involvement in all the segments of both lungs. Most lesions were observed in the peripheral zone of the lung field, particularly the lower lobe. Figures 1 and 2 show representative images of common and severe COVID-19, respectively.

Fig. 1
figure 1

A common case of COVID-19. A man aged 53 years presented with a 3-day history of fever, cough, and sputum. a, b On the third day after disease onset, CT imaging revealed pure ground-glass opacity (GGO) and GGO with fine grid in the bilateral lobes. c, d The area of the lesions was delineated on the axial and reconstructed images. e Histogram map of the lesions

Fig. 2
figure 2

A severe case of COVID-19. A man aged 47 years presented with a 7-day history of fever, cough, and sputum. a, b On the third day after disease onset, CT imaging revealed diffuse pure GGO and GGO with fine grid in the bilateral lobes. c, d The area of the lesions was delineated on the axial and reconstructed images. e Histogram map of the lesions

Of the 1042 radiomics features, 511 were statistically significant when assessed by independent t test or Mann–Whitney test (p < 0.05). Of these, 358 were selected by univariate logistic analysis (p < 0.05), then 20 were retained after MRMR analysis. The AUC values of these 20 radiomics features are shown in Fig. 3 a. All of them showed good predictive performance, with AUC values > 70%. Of the clinical factors, 16 features were statistically significant when assessed by the chi-square or Mann–Whitney U test (Table 1), and 12 clinical features were retained after univariate logistic regression (p < 0.05). These 12 features’ AUC values are shown in Fig. 3 b. After backward stepwise multivariate logistic regression and selection with the likelihood ratio test, eight radiomics features (Table 2) and four clinical features (Table 3) were finally retained, and predictive models were constructed. Both the radiomics signature and the clinical model showed favorable predictive accuracy, with AUC values of 0.93 (0.86–1.0) and 0.95 (0.95–0.99), respectively (Fig. 3c, d).

Fig. 3
figure 3

a Performance of 20 radiomics features. All showed good performance (AUC > 70%). b AUC values of 12 clinical features. Both the radiomics signature and the clinical model showed favorable predictive accuracy. c The radiomics signature had an AUC value of 0.93 (0.86–1.00). d The corresponding AUC value for the clinical model was 0.95 (0.95–0.99)

Table 1 Epidemiological history and clinical data of 81confirmed patients
Table 2 Multivariate logistic of the predictive radiomics features
Table 3 Multivariate logistic of the predictive clinical features

The sensitivity, specificity, and accuracy were calculated based on the Youden index (Table 4). Both the clinical model and radiomics signature showed good performance in discriminating patients with common and severe COVID-19. In the clinical model, the accuracy, sensitivity, and specificity values were 0.91, 0.81, and 0.95, respectively; in the radiomics signature, these values were 0.90, 0.90, and 0.90, respectively.

Table 4 The sensitivity, specificity, accuracy of clinical model and radiomcs signature

The mean accuracy, sensitivity, and specificity values of the 100-fold LGOCV are shown in (Table 5). Both the clinical and radiomics models showed good stability, indicating that the texture analysis was valuable for discriminating common and severe COVID-19 patients, and that the results were not due to overfitting.

Table 5 Cross validation of clinical model and radiomcs signature

The correlations between the predictive radiomics and clinical features are shown in Fig. 4. The association was defined as strong (r = 0.7–1), moderate (r = 0.4–0.69), or low (r = 0.1–0.39). The prediction score showed strong correlations with wavelet_LLL_glrlm_RunLengthNonUniformity (r = 0.80, p < 0.0001), log_sigma_5_0_mm_3D_glszm_SizeZoneNonUniformity (r = 0.71, p < 0.0001), wavelet_HHL_glrlm_RunLengthNonUniformity (r = 0.75, p < 0.0001), and wavelet_HLL_glrlm_RunLengthNonUniformity (r = 0.79, p < 0.001). Most other clinical features and radiomics features showed moderate correlations (Fig. 4).

Fig. 4
figure 4

Correlations between the predictive radiomics and clinical features. There is a strong association between the correlation scores and radiomics features

Discussion

Chest CT can be used to make rapid diagnosis and assess the severity of COVID-19 patients [8]. On CT images, COVID-19 may be manifested with GGO, GGO combined with grid, and/or consolidation, all changing rapidly [9]. We can judge the severity of the disease by the lesion range on the CT image, the results of which may be subjective and inaccurate.

In CT texture analysis, image features of different gray levels can be filtered out. An image’s pixel and spatial parameters are used to quantitatively extract pathophysiological features of the lesions that cannot be recognized by naked eyes and to reveal the internal heterogeneity between tissues [10,11,12]. Texture analysis is objective in that it evaluates lesion with gray level on the image [13].

COVID-19 is an acute and highly infectious disease [12]. Early prediction and timely treatment can improve patients’ prognosis. The clinical stages of COVID-19 have been rarely classified by texture analysis based on CT images. In this study, a radiomics texture model and a clinical model were established for 81 patients to early predict the clinical stages of COVID-19, and a correlational study was conducted to reveal the associations between CT and clinical variables in common and severe patients. Both radiomics and clinical models showed favorable predictive accuracy, with an AUC of 0.93 (0.86–1.0) and 0.95 (0.95–0.99), respectively. The prediction model based on CT text features achieved a sensitivity of 0.80 and a specificity of 0.95, which means that CT texture analysis can early accurately assess disease severity.

It has been reported that coronavirus acts mainly on lymphocytes, especially T lymphocytes, as SARS-CoV does. Therefore, a decrease in lymphocytes can be used as a reference index for clinical diagnosis of new coronavirus infection [14,15,16,17]. CD3 represents the total count of T lymphocytes, and CD4 or CD8 represents that of their subsets. Among the clinical features measured in this research, the percentage and absolute count of lymphocytes and the counts of CD3, CD4, and CD8 in the severe group decreased significantly and showed high diagnostic value to differentiate patients of different COVID-19 severity levels (all AUC > 70%). These results suggest that the lymphocyte count in the severe group was much lower than that in the common group. Feng et al [18] used a semi-quantitative scoring system to quantitatively estimate the pulmonary involvement of lung lesions. The inflammation score was correlated with the severity. Our results showed that the inflammation score was also effective in the differential diagnosis between the two groups (AUC = 88%).

We used LGOCV to verify the reliability of the multivariate logistic regression model, and the results showed that both the imaging and clinical models had good stability. In the training and validation sets, the average accuracy, sensitivity, and specificity values resulting from 100-fold cross-validation were above 0.8, indicating that the results of the model were not caused by overfitting. The results of 100-fold cross-validation in the imaging model showed that the logarithmic transformation characteristics of log_sigma_3_0_mm_3D_glszm_GrayLevelNonUniformity and log_sigma_1_0_mm_3D_glszm_LargeAreaEmphasis appeared more than 50 times during the 100 times cross-validations (Supplementary Materials). The glszm_GrayLevelNonUniformity variable was used to measure the variability of gray-level intensity values in the image, with lower values indicating higher homogeneity of intensity values. Further, glszm_LargeAreaEmphasis was used to measure the distribution of large-sized area zones, with greater values indicating more large-sized area zones and more coarse textures. And the wavelet characteristics include wavelet_HHL_glrlm_RunLengthNonUniformity and wavelet_HLH_glszm_LargeAreaHighGrayLevelEmphasis, of which wavelet_glrlm_RunLengthNonUniformity measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image, and wavelet_HLH_glszm_LargeAreaHighGrayLevelEmphasis measures the proportion in the image of the joint distribution of larger-sized zones with higher gray level. Besides, the CT score also showed more than 50 times, which means these features had high stability and diagnostic values of the severity of the patients. In the clinical model, inflammation score, hsCRP, CD3, PCT, and other factors showed more than 50 times during the 100 times cross-validation (Supplementary Materials), indicating that these parameters were highly correlated with the severity of COVID-19. In addition, the Spearman correlation analysis was performed to evaluate the correlation between significant imaging and clinical features. The results showed that most of the clinical features (e.g., lymphocyte ratio; absolute value; CD3, CD4, and CD8 counts; CRP; and D dimer) had moderate correlations with the imaging features (> 0.4). The strong correlation was found between inflammatory score and partial wavelet transform features and region size matrix GLSZM features (> 0.7), indicating that these image features are closely related to disease severity and can be used for clinical type classification of the COVID-19 patients.

This study has some limitations. Firstly, this study is retrospective, and the sample size is small, especially in the severe group, with only 21 cases. Secondly, some cases have small ground-glass lesions that may be missed when the ROI is automatically delineated. In future studies, it is necessary to increase the number and type of samples. Thirdly, in clinical practice, COVID-19 should be differentiated from other pulmonary diseases or other pneumonias. But, we did not apply CTTA to differentially diagnose COVID-19 in this study. Finally, our results may have had some bias. In this study, we did not consider the effect of patient’s underlying disease, such as chronic respiratory disease (emphysema or interstitial pneumonia, etc.), on the severity judgment. We will establish a multicenter study to explore the value of CT texture analysis in the differential diagnosis of COVID-19.

In summary, texture analysis can provide reliable and objective information for differential diagnosis of COVID-19.