Next Article in Journal
Entrepreneurial Orientation as a Determinant of Sustainable Performance in Polish Family and Non-Family Organizations
Previous Article in Journal
Cost Deviation Model of Construction Projects in Saudi Arabia Using PLS-SEM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Factors Influencing Consumer Satisfaction of Fresh Produce E-Commerce in the Background of COVID-19—A Hybrid Approach Based on LDA-SEM-XGBoost

1
School of Economics and Management, Zhejiang University of Science and Technology, Hangzhou 310012, China
2
School of Economics and Management, Southwest Jiaotong University, Chengdu 610031, China
3
School of Science, Zhejiang University of Science and Technology, Hangzhou 310012, China
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(24), 16392; https://doi.org/10.3390/su142416392
Submission received: 18 October 2022 / Revised: 29 November 2022 / Accepted: 30 November 2022 / Published: 7 December 2022

Abstract

:
In order to clarify the influencing factors of fresh produce e-commerce consumer satisfaction in the context of COVID-19, a hybrid approach based on LDA-SEM-XGBoost was proposed by studying online reviews. Firstly, topic elements were extracted through the LDA topic model, PLS-SEM was established to explore the paths between variables, and XGBoost models were applied to rank the importance of each topic variable based on satisfaction. The results showed that epidemic factors had a significant impact on logistics factors, product factors, and platform factors, with the epidemic factors having the greatest impact on logistics factors. Logistics factors, product factors, platform factors, and epidemic factors had a significant impact on consumer satisfaction, with logistics factors having the greatest impact on satisfaction. The topic variables affecting fresh produce e-commerce consumer satisfaction were, in order: logistics time, shipping speed, product quality, delivery speed, after-sales strategy, logistics packaging, product price, the impact of COVID-19, marketing strategy, and product brand. Based on these findings, recommendations are made for the sustainable production and marketing of fresh produce.

1. Introduction

The appearance of COVID-19 has caused an unpredicted threat to public health, affecting the global economy and the global supply chain [1]. COVID-19 has severely disrupted the retail industry. Because of the pandemic and government regulatory policy, during this period, society was forced to shut down for an extended period of time, and routine daily activities were disrupted [2]. Furthermore, the quarantine measures limited people’s ability to leave their homes and work, as well as purchase food and other necessities; many offline retailers have experienced drastic decreases in visits compared to other industries [3]. With such a background, the development of e-commerce has ushered in a new opportunity.
China is the largest market for business-to-consumer (B2C). China also has the highest number of online stores in the world [4]. E-commerce for fresh food has broken the information asymmetry in the fresh food market, broadened the purchasing channels for consumers, and reduced the intermediate links in the fresh food circulation through the use of information and communication technology [5]. In addition, contactless delivery of fresh products reduces the risk of infection. The outbreak prompted people to change their consumption form and subconsciously changed their consumption habits. According to the statistics, e-commerce has become the main channel of shopping [6]. COVID-19 brought unexpected development opportunities to fresh e-commerce enterprises; fresh e-commerce ushered in explosive growth [7]. Take Shanghai as an example. When the epidemic became severe in February 2020, an increasing number of citizens purchased fresh food via e-commerce platforms, and the proportion of online orders increased significantly. According to the Shanghai Municipal Commission of Commerce (MCC), total fresh food e-commerce platform transactions in Shanghai reached 1.366 billion dollars in the first quarter of 2020, a year-on-year increase of 167%, and order volume increased by 80% [8].
The distribution of food and agricultural products has been severely affected by the increased shortage of manpower generated by viruses and the need to maintain physical separation in the manufacturing process, leading to problems in ensuring a consistent supply of food to consumers [9]. These problems limit the ability of the agricultural sector to operate properly, with serious knock-on effects on food quality, freshness, and health. Many farmers have fruit and vegetables that cannot be transported out in time after ripening and thus rotting [10,11]. Difficulties in selling agricultural products and highly overstocked situations had been reported around the globe. Farmers in developing countries, including China, India, and Ethiopia, were forced to leave their produce to rot.
However, in the current context, the impact of COVID-19 on fresh produce production and sales is yet to be further analyzed and proven. What exactly affects consumer satisfaction and how to promote sustainable development of fresh produce e-commerce is a question that needs to be addressed. To fill this gap, and considering that text is a major component of online consumer reviews, many studies have used content analysis to identify consumer preferences for text-based online reviews to reveal patterns and relationships in qualitative data [12,13,14]. In this article, we analyze the main factors affecting consumer satisfaction with fresh produce during COVID-19 by mining and extracting information from the text and exploring the relationships between the factors. Based on the final results of the study, we would like to suggest ideas for sustainable production and marketing of fresh produce.
This study makes two major contributions: (1) We extracted factors from the epidemic comments and considered some of the implications of the epidemic factors. (2) Using online user-generated content and an LDA-SEM-XGBoost hybrid method, we investigate the satisfaction of consumers for fresh agriculture products.
Section 2 discusses the literature review, followed by the conceptual model and hypotheses descriptions. The methodology adopted is described in Section 3. The results are presented in Section 4. In Section 5, we conducted the discussion and the implications of the results are described. Finally, in Section 6 we present the conclusion and limitations.

2. Literature Review

2.1. Fresh E-Commerce in the Background of COVID-19

Due to COVID-19, the production and marketing of fresh produce have all been greatly affected. Current research is multifaceted, with research on the impact of the epidemic on fresh produce supply chains occupying an important place. Marusak et al. (2011) identified increased supply chain resilience as a key driver for reducing vulnerability in disruptive times [15]. Xu et al. (2020) explored how regionalized food supply chains can improve the resilience of the US food supply system in response to large-scale disruptions such as the new crown pneumonia crisis [16]. A number of studies explored shopping behavior during COVID-19. Chen et al. (2021) collected data from adult internet users in Wuhan, China in 2020, with the aim of exploring the impact of new crown pneumonia on fresh food shopping behavior [17]. The study shows that an increasing number of people in Wuhan will shop for fresh food online, at an increased cost and frequency. The experience of buying fresh food online during the embargo promotes more online shopping. Scacchi et al. (2021) believed that embargos significantly influenced self-behavior and led to positive and sustainable food purchasing and consumption habits. [18]. Some studies focus on food waste during COVID-19. De et al. (2021) explored food packaging waste during the new crown pneumonia pandemic: trends and challenges [19]. Jribi et al. (2020) explored the impact of New Coronary Pneumonia on household food waste [20].

2.2. Fresh Produce E-Commerce Satisfaction

Fresh e-commerce is a form of recently developed e-commerce that is coveted by many businesses [21]. This paper focuses on studying the public’s satisfaction with buying fresh produce online. In recent years, many scholars have conducted research and analysis on fresh produce e-commerce satisfaction. Yanyan (2018) used a questionnaire to assess consumer satisfaction based on 20 factors such as agricultural product characteristics, website quality, and service quality, and the results show that service quality is the most important factor influencing public satisfaction [4]. Huang et al. (2014) believed that consumer perception, product quality, and logistics speed are the main factors affecting consumers’ online shopping satisfaction [22]. Using Yantai cherries as a specific case, Liu and Kao (2022) obtained through hypothesis testing that purchase expectation has a negative but not significant acoustic effect on customer satisfaction, and product quality and logistics and delivery have a significant positive effect on customer satisfaction [23]. Yang et al. (2011) believed that logistics services are an important factor influencing consumers’ online purchases. They use structural equations to study the positive and negative correlations between logistics quality and consumer satisfaction, complaint rate, repurchase intention, and website image from the consumer’s perspective to provide effective suggestions for online stores [24].

2.3. A Study of Consumer Satisfaction Based on Online Reviews

Earlier studies used questionnaires and conducted research related to consumer satisfaction based on logistic regression, structural equation modeling, and other econometric methods. Using a survey of hotels in a popular tourist destination in Mexico, Berezan et al. (2013) used logistic regression to explore how sustainable hotel practices affect the satisfaction of hotel guests of different nationalities [25]. Resano et al. (2011) studied satisfaction with pork and derived products in five European countries using logistic regression [26]. Park and Kang (2022) conducted a structural equation analysis (SEM) on 659 responses obtained from Korean consumers to explain how green art can increase the satisfaction of green hotel consumers [27].
While statistical sampling methods may have good data quality and provide strong support for theoretical frameworks, sampling data sometimes involves biased selection, and the results may not be generalizable to the population as a whole [28]. With the popularity of the Internet, online reviews are very useful information for consumers and, therefore, online reviews largely influence consumers’ purchase decisions [29]. In addition to using questionnaires to study consumer satisfaction, based on consumer data crawled on the web or searched for on various channels, many scholars also use machine learning, such as text analysis, classification algorithms, and other models to study consumers’ online reviews, so as to explore consumers’ satisfaction with online shopping.
Based on 8229 reviews from the Google Travel website, Kim and Kim (2022) processed the data using text mining and semantic analysis and used regression and factor analysis to find important factors affecting customer satisfaction in terms of re-travel [29]. Jung and Suh (2019) used strengths analysis to examine the relative importance of each job satisfaction factor to the employee job satisfaction factors and to provide input to business managers in managing their employees [30]. Nilashi et al. (2019) performed big data analysis based on text mining, clustering, and predictive learning techniques in machine learning on potential Dillicret assignments (LDA) to identify the voice of the customer [31]. Shah et al. (2021) analyzed UK patients’ satisfaction with healthcare services based on two factors, namely satisfaction (PS) and dissatisfaction (PD), and combined SentiNet and LDA text mining methods to detect the affective information contained in the semantics of patients’ healthcare [32]. Based on 2585 user reviews of 46 five-star hotel restaurants on TripAdvisor, Aktas-Polat and Polat (2021) used textual techniques to explore the three areas that are most relevant to delight, satisfaction, and dissatisfaction in fine dining experiences by adding the topic of staff as an antecedent [33]. A related approach was also used to explore fresh produce e-commerce consumer satisfaction. Hong et al. (2019) analyzed various online reviews using convolutional neural networks and text mining models, and the results showed that logistics convenience, communication, and reliability had significant effects on customer satisfaction, but information integrity had no effect on satisfaction [34]. Xin and Jiaying (2020) focus on service quality, using web crawlers to crawl the fresh food data of JD Mall and using NVivo software to sort out the statistical analysis of the crawled variables such as correlation and regression [35].
In conclusion, the impact of COVID-19 on the production and marketing of fresh produce in the current context is yet to be studied and proven. What factors affect consumer satisfaction and how to facilitate the long-term development of e-commerce for fresh produce are questions that must be addressed. In response, this paper makes two extensions. The first is to examine satisfaction with e-commerce for fresh produce in the context of an epidemic. The second is to experiment with online texts and to use an LDA-SEM-XGBoost-based approach to explore the influencing factors.

2.4. Research Hypothesis

Due to the high contagiousness of the virus, road traffic controls were implemented in many places, and villages were blocked in some places. As a result, vegetables from plantation sites could not be shipped out and distributors had poor logistics and access to transport [36]. Strict plot closures and community management were imposed, and people were asked to reduce going out and isolate themselves at home for a specified period of time. Viruses have caused increased manpower shortages and the need to maintain physical isolation during the manufacturing process [9]. All of these factors had a significant impact on merchant shipments and distribution, for which the following hypothesis was formulated.
H1: 
Epidemic factors have a significant impact on logistics factors.
During the epidemic, due to the lack of labor for picking, processing, and shipping agricultural products [9], vegetables from some production areas could not be transported out due to village and road closures, and a large number of agricultural products rotted in the fields in many places, resulting in serious stagnation in sales. This caused different degrees of stagnant hoarding, a mismatch between supply and demand in meat, egg, and milk production and marketing areas, an expansion of the price difference, and serious damage to fresh products with short shelf life such as poultry and leafy vegetables [10,11].
H2: 
Epidemic factors have a significant impact on product factors.
Due to the impact of the epidemic, the fresh food market has seen a rapid increase in trading volume. After-sales policies and service aspects are greatly compromised in parallel with the surge in consumer online shopping [37]. As a result, the following hypothesis is proposed.
H3: 
Epidemic factors have a significant impact on the platform factor.
Online shopping is widely accepted by consumers and has even become the main way of daily shopping. Timely order acceptance and delivery times are inextricably linked to the consumer’s shopping experience [38]. Fresh agricultural products, due to their perishable nature, are not easy to keep fresh and many other characteristics make consumers pay high attention to the logistics and delivery of fresh agricultural products. The speed of logistics and the degree of preservation of fresh agricultural products during the delivery process are potential factors affecting the satisfaction of fresh e-commerce consumers [39]. Therefore, the following hypothesis is proposed in this paper.
H4: 
Logistics factors have a significant impact on fresh produce e-commerce consumer satisfaction.
Fresh produce is the most relevant aspect of consumers’ immediate interests and the aspect that consumers are most concerned about when shopping for fresh produce online. This paper focuses on product quality, product price, product packaging, and product branding. According to previous studies, the described product attributes affect consumer satisfaction [39]. Perceived product quality is defined as the consumer’s judgment of the overall excellence or superiority of the product, and minimizing product cost and maximizing product quality should be considered as major factors in the success of e-commerce [40,41,42], so the following hypotheses are proposed.
H5: 
Product factors have a significant impact on satisfaction with fresh produce.
The fresh produce platform is a vehicle for online transactions between consumers and fresh produce sellers. Consumers search for fresh produce according to their needs on the fresh produce platform, and when they confirm their orders, a buying and selling relationship is created between them and the fresh produce seller. In the process of fresh produce consumption, the strategy adopted by the fresh produce platform has an impact on consumer satisfaction. Many customers expect online shops to offer products and services at lower prices than traditional shops [43]. Discounts at the time of purchase influence consumers’ beliefs about price and ultimately their satisfaction [44]. In addition, several studies involving online commerce have shown that service quality has a positive effect on customer satisfaction [45,46,47]. Service quality determines whether customers will build a strong and loyal relationship with an online retailer, whose service meets the expectations of customers and thus increases their satisfaction [48]. Therefore, the following hypothesis is proposed.
H6: 
Platform factors have a significant impact on fresh produce e-commerce consumer satisfaction.
The hypothesis that the epidemic factor affects the product factor, logistics factor, and platform factor, and the hypothesis that the product factor, logistics factor, and platform factor affect consumer satisfaction are combined. Therefore, the following hypothesis is proposed.
H7: 
Epidemic factors have a significant impact on fresh produce e-commerce consumer satisfaction.
The proposed research model is presented in Figure 1.

3. Methodology

In order to obtain relevant factors affecting consumer satisfaction from online reviews and to identify the impact of epidemic factors on the production, distribution, and sale of fresh produce and the impact of each influencing factor on consumer satisfaction, we propose a hybrid approach that incorporates topic modeling, partial least squares structural equation modeling (PLS-SEM), and extreme gradient boosting (XGBoost). We first used the JD platform (www.jd.com), one of the largest online shopping sites in China, as a data source to collect information on consumer reviews about fresh produce. Then, topic variables were derived from the review text using the Latent Dirichlet allocation (LDA) technique. After populating all probability scores for all topics, the scores were normalized from 1 to 5, together with the rating scale for each review (scale 1 to scale 5), and the resulting matrix was used as the dataset for the PLS-SEM analysis. Next, PLS-SEM was introduced to ascertain the interactions between the factors. One is the influence of the epidemic factor on the logistics factor, the product factor, and the platform factor. The second is the effect of platform factors, logistics factors, product factors, and epidemic factors on consumer satisfaction. Finally, to further analyze the impact of each thematic variable on consumer satisfaction, we used the XGBoost algorithm to reach our goal. These methods are described below, and the specific process consists of the following three steps as shown in Figure 2. In Section 3.1 we will specify the data pre-processing as well as the topic modeling process. The PLS-SEM method will be presented in Section 3.2 and the XGBoost method will be described in Section 3.3.

3.1. Data Pre-Processsing and LDA

Topic modeling is a machine learning technique in the field of data mining and is one of the most fluid methods for comment text mining. LDA is one of the most popular topic modeling approaches [49]. Compared with other text mining methods such as TF-IDF (term frequency-inverse document frequency), latent semantic analysis (LSA), or probabilistic LSA, LDA is considered to be more potent in semantic annotation, generalization ability, dimensionality reduction, and mixture modeling [50]. It is used for text analysis in a wide range of industries or fields such as finance, aviation, products, research, and hospitality [50,51,52,53,54].
In order to analyze the text data using LDA, a five-stage data pre-processing process was carried out, including text de-duplication, phrase deletion, text splitting, stop words removal, and n-grams. As shown in Figure 3, the raw review text was first subjected to a series of processes to classify and identify potential topics. The first step of the process is to remove duplicate text through text de-duplication, where a large number of worthless and meaningless reviews are repeated when customers do not comment promptly for a long time, when the platform defaults to positive reviews or when other people’s review content is artificially copied and pasted. The second step is phrase removal, when less vocabulary is contained in a text comment, then the content expressed in this comment is often unclear and has no actual meaning, for example: “okay”, “praise”, “good “, etc. These short sentences, consisting of two or three words, do not contribute to the results of text analysis. By removing phrases such as these, the data quality of text comments is improved. We then used the “jieba” word-splitting tool to split the Chinese text into words. After separating the text into words, we used a common deactivation table to remove stop words. There are also some meaningless words in the results, mainly intonation words and degree adverbs that have no real meaning, such as “the”, “had”, “what “, “specifically”, “on the other hand”, etc., which needed to be added to the list of stop words. For the sake of the generality of the study, the names of the fresh produce involved in this paper: “fruit”, “vegetables”, “pork”, “fish “, etc., were also added to the list of discontinued words. Finally, a bigram (binary language model) was added to the data in order to extract more meaningful word pairs. A bigram is a collection of two adjacent words: for example, ‘machine’ and ‘learning’ can be combined to form the bigram ‘machine learning’.
The core principle behind LDA is that documents are represented as random mixes over latent topics, each of which is defined by a word distribution. Each document is represented by a probability distribution over a set of topics, with each topic being represented by a probability distribution over a set of words [55]. The specific model structure is shown in Figure 4.
Where the symbol α is the prior distribution parameter of the topic distribution θ, β is the prior distribution parameter of the word distribution φ of the topics, z and w denote the topics generated by the model and the final topic words, respectively, M represents the number of documents, N represents the number of words in the documents, and K represents the number of topics. The topic model is generated as follows: first, documents of length N are selected globally, then the distribution of documents on topics θ and the distribution of topics on words φ are sampled from the prior distribution with parameter α and the prior distribution with parameter β, respectively; finally, topics z and words w are sampled from the polynomial distribution with parameter θ and parameter φ, respectively, and the joint distribution of the model is shown in the equation:
p ( w , z , θ m , φ k | α , β ) = n = 1 N p ( θ m | α ) p ( z m , n | θ m ) p ( φ k | β ) p ( W m , n θ z m , n )  

3.2. PLS-SEM

The SEM methods are widespread in marketing and management research while analyzing the cause–effect relations between latent constructs. The PLS-SEM is one of the SEM methods, which aims to maximize the explained variance of the dependent latent construct [56]. Many limiting assumptions underlying covariance-based SEM approaches (e.g., the LISREL, the EQS), such as multivariate normality and huge sample size, are avoided using PLS-SEM. PLS-SEM examines the correlations between a variety of parameters, including latent and observable variables. For target analysis, a latent variable is an unseen idea. An observable variable with relationships, such as causal and co-occurrence links, is referred to as an observed variable.
The PLS-SEM model consists of two main components, the measurement model, which describes the relationship between the latent and observed variables, and the structural model, which describes the relationship between the latent and latent variables. The estimation of the parameters is carried out in two steps: firstly, the estimates of the latent variables are obtained through iterations; secondly, linear regression using partial least squares is applied to obtain the estimates of the parameters of the structural and measurement models.
With k observed variables, there are k groups of latent variables, each containing m variables, and each group of latent variables can be expressed as:
X j = { x i 1 , x i 2 , x i 3 , , x i m i }   i = { 1 , 2 , 3 , , k }  
It is also assumed that there is a linear combination of latent and latent variables, and latent and observed variables, and that each observed variable is associated with a unique latent variable. The equation of the measurement model is then:
x i j = λ i j ξ i + σ i j   ( i = 1 , 2 , 3 , , k ; j = 1 , 2 , 3 , , m i )  
The equations of the structural model are:
  ξ i = i j β i j ξ i + ε i

3.3. XGBoost

To further analyze the impact of each influencing factor on consumer satisfaction, we used XGBoost to analyze the importance of each of the topic variables. And in our study, we chose the XGBoost model for two reasons. First, XGBoost is one of the most popular boosting tree algorithms for a gradient boosting machine. It has been widely employed in business due to its high performance in problem-solving and minimal requirement for feature engineering [57,58]. Secondly, XGBoost has many advantages such as high accuracy, speed, support for parallelization, and prevention of overfitting, and has a much better performance compared to other machine learning algorithms (RF, SVM, etc.) [59,60].
The XGBoost model, known as Extreme Gradient Boosting, is a distributed and efficient gradient-boosting algorithm based on CART (Classification and Regression Trees), the main base learner in XGBoost is CART (Classification and Regression Trees). The steps of the XGBoost algorithm are as follows.
In the first step, for a given data set D = { ( X i , y i ) } , the following function is used to predict the samples:
  y ^ i = ϕ X i = k = 1 K f k X i f k F
where F = { f ( X ) = ω q ( X ) } ( q : R m T , ω R T ) , K is the number of decision trees, F denotes the decision tree space, and q ( X ) denotes the mapping of sample X to the leaf nodes of the tree, whose corresponding leaf node fraction is ω q ( x ) .
In the second step, the objective function is regularized.
L ϕ = i l y ˆ i , y i + k Ω f k
where Ω ( f ) = γ T + 1 / 2 λ w 2 , T is the number of leaf nodes, and after t iteration we have y ˆ i ( t ) = y ˆ i ( t 1 ) + f t ( x i ) . At this point the objective function can be written as:
L t = i = 1 n l y i , y ˆ i t 1 + f t x i + Ω f t
The third step, after the second order Taylor expansion can be expressed as:
L ( t ) = i = 1 n [ l ( y i , y ˆ ( t 1 ) ) + g i f t ( X i ) + 1 2 h i f t 2 ( X i ) ] + Ω f t
Removing the constant term can be simplified as:
L ˜ t = i = 1 n g i f t X i + 1 2 h i f t 2 X i + Ω f t
The fourth step is to optimize the objective function.
L ˜ * = 1 2 j = 1 T G j 2 H j + λ + γ T
where G j = j I j g i ,   H j = j I j h i
The XGBoost algorithm was used in our study. First, to validate the models and assess their quality, we used accuracy in our research. Our research divides the dataset into two subgroups to ensure the effectiveness of the training process and to discover the best-fitted model based on accuracy. We can train the XGBoost model to discover the best combination of parameters using 80% of the data (train dataset). The remaining 20% (test dataset) allows us to check the performance of the best-fitted model and confirm the trained approach’s accuracy. Next, the relevance of features could be assessed using the XGBoost classifier’s properties. We applied the permutation importance method based on the test set, which was evaluated by permuting the column values of a single feature, rerunning the trained model, and then calculating accuracy change as an importance score [61,62]. This approach is more accurate than the Gini coefficient [63]. This is done using the rfpimp package in Python, and the generated feature importance is presented as a histogram.

4. Results

4.1. Data Crawling and Pre-Processing

In this paper, we use Octopus crawler software to obtain four categories of fresh fruit, seafood, aquatic products, selected meat, vegetables, and eggs from the fresh produce section of the JD platform (www.jd.com (accessed on 27 February 2022)) and obtain the product reviews of the top-ranked product categories in each of the four categories, after which we use text pre-processing steps including text de-duplication, phrase deletion, text splitting, stop word removal and n-grams. The final sample consisted of 9243 messages.

4.2. Topic Extraction

For all comments on raw produce, we used LDA to analyze the terms and topics across the corpus and generate a topic model. Each topic can be interpreted as a set of words that are thematically or semantically coherent. The probability distribution of clusters of topic-specific words is usually likely to indicate topics. In this study, three topic model diagnostic measurements consisting of exclusivity, coherence, and corpus distance, suggested by McCallum [64], were used to evaluate the topics. Exclusivity measures the degree to which a topic’s top words do not appear as top words in other topics. A high value indicates that the word is more likely to appear under a particular topic. Coherence, a negative log probability value, is one of the topic model diagnostics metrics used to evaluate the LDA results and indicates words that tend to co-occur more often with values closer to zero. A topic with large negative values indicates that words are not often seen together. Finally, corpus distance measurement handles all reviews as a single topic and shows the distance of each topic from this single topic, which we can name as a general topic. Smaller values denote a more general topic, while larger values denote a more specific topic.
According to early researchers [65,66,67], after removing some redundant topics, we categorized all the comments into 10 topic variables. The 10 topic variables were Shipping speed (T1), Logistics speed (T2), Delivery speed (T3), Logistics packaging (T4), Product quality (T5), Product price (T6), Product brand (T7), Marketing strategy (T8), After-sales strategy (T9), Impact of COVID-19 (T10). Finally, as shown in Table 1, we grouped these 10 topics into four categories, namely logistics factors, product factors, platform factors, and epidemic factors. The keyword weights for each topic and the relevant metrics for the topic model diagnostics are presented in Table 1 and Table 2.
Logistics factors include the delivery speed time of fresh produce. During the epidemic prevention period, traffic control in many areas, road closures, and low logistics resumption rates led to disruptions in the transport of produce, making the timely delivery of goods a major concern for consumers. In addition, the outer packaging and means of preserving the freshness of the goods are also key to ensuring the quality of logistics and transport. The product factor contains information on the taste, freshness, and price of the fresh produce. The epidemic has had a serious impact on supply chain and logistics, and it has also affected the quality of the products, making the provision of good quality products at a good price an urgent demand for consumers. In addition, information about the brand of the product is also appearing in consumer reviews, and a good brand will be trusted and followed by customers for a long time. The platform factor encompasses marketing promotions and after-sales service. Discount campaigns are popular with consumers, and after-sales service and attitude influence consumer perceptions, and this is of increasing concern during the epidemic. The epidemic factor includes terms such as “outbreak” and “period”, representing the new crown epidemic that is taking place.

4.3. Path Model Construction Using PLS-SEM

After classifying all topics as latent variables, we used PLS-SEM to further confirm the causal relationships, i.e., to identify the impact of epidemic factors on the production, distribution, and sale of fresh produce and the impact of each influencing factor on consumer satisfaction. For the LDA model, each document in the corpus is represented as a probability distribution of topics. To achieve our aim, we normalized the generated probabilities on a scale from 1 to 5, together with the rating scale for each review (scale 1 to scale 5). We used the logistics factor, the product factor, the platform factor, and the epidemic factor together with the satisfaction factor as latent variables. Next, the satisfaction factor as latent variables and the 10 topics together with consumer satisfaction as observed variables were used as PLS-SEM analysis. The theoretical framework was confirmed through the use of the PLS-SEM approach using a two-step process that included measurement model analysis and structural modeling. First, a measurement model was used to test the relationship between the structure and its indicators. Then, the causal relationships amongst the constructs, i.e., the hypotheses of the theoretical model, were confirmed.

4.4. Measurement Model

The model was assessed by single-item reliability, convergent validity, and discriminant validity. As shown in Table 3, the single-item reliability was analyzed by factor loading, with all single-item factor loading values exceeding 0.5 [68], indicating that all single items passed the test. Then, to assess the convergent validity, the AVE values in Table 3 were analyzed to be greater than the threshold value of 0.5 [69] and the CR values were greater than 0.7 [70], which showed good convergent validity and reliability of the model, and, therefore, the convergent validity of the model was acceptable. In addition, this study performed a discriminant calibration test by comparing the square root of the AVE with the correlation between the focal structure and all other structures using the Fornell and Larker criterion. As the square root of each AVE in this dataset was greater than the correlation between the other latent variables [71] (see Table 4), this suggests that there is sufficient discriminant validity for all dimensions.

4.5. Structural Model

Furthermore, the quality of the evaluation criteria was assessed in this study by calculating the cross-validated predictive relevance of the model, which was based on the value of Stone-Q2 Geisser’s which ranged from 0.184 to 0.224. (refer to Table 5). Stone-Q2 Geisser’s values are all greater than zero, indicating a good fit in model prediction. Furthermore, the model fit was evaluated using the standardized root mean square residual (SRMR) to compare the observed correlation to the model’s implied correlation matrix. The square root of the sum of the squared differences between the model-implied and empirical correlation matrices is defined as SRMR. The value of SRMR is 0.076 in this research, which is less than the maximum level of 0.100 [72]. This result (see Table 5) indicates that the overall model is acceptable. The value of NFI is 0.913 (see Table 5) and NFI values that are closer to 1 indicate better model fit [73]. All the VIF scores ranged from 1.104 to 1.979 (see Table 5), which is lower than the maximum level (5) of VIF [74]. Thus, the collinearity is acceptable according to Shiau and Chau [75].

4.6. Hypothesis Testing

Based on the results of the analysis shown in Table 3 and Table 4, we tested the results of the seven hypothesis tests in Table 5 using SEM. The path coefficient (β) and t-value in Table 5 also demonstrate that all of the hypotheses are supported. In addition, according to Hair et al. [76], effect size (f2) defines as weak (≥0.02), moderate (≥0.15), and strong (≥0.35), respectively. H1 predicted the effect of the epidemic factors on the logistics factors, which was the most significant compared to the other factors (β = 0.281, p < 0.001). For H2, the effect of epidemic factors on product factors was also significant (β = 0.232, p < 0.001). For H3, the effect of epidemic factors on platform factors was also significant (β = 0.096, p < 0.01). For H4, there was a significant effect of logistics factors on satisfaction with fresh produce, with a greater probability of satisfaction for logistics factors relative to other factors (β = 0.144, p < 0.001). For H5, there was a significant effect of product factors on satisfaction with fresh products (β = 0.121, p < 0.001). For H6, the platform factors had a significant effect on satisfaction with fresh products (β = 0.102, p < 0.001). For H7, the presence of epidemic factors had a significant negative impact on consumer satisfaction (β = −0.113, p < 0.01). H1, H2, H3, H4, H5, H6, and H7 were fully supported by the PLS-SEM results. Table 5 and Figure 5 show the results of the hypothesis test.

4.7. XGBoost-Based Ranking of Topic Variables

T1, T2, T3, T4, T5, T6, T7, T8, T9, and T10 variables are considered. The XGBoost model, as shown in Figure 4, has multiple input features (T1, T2, T3, T4, T5, T6, T7, T8, T9, T10) and one output feature (consumer satisfaction). The accuracy of the models fitted to the test dataset was 0.76 with 100 rounds, a learning rate of 0.3, a subsample size of 1, and a maximum depth of 6 after a tuning process that included determining the best parameter combinations and validating their performance.
As shown in Figure 6, the analysis of the XGBoost modeling results shows that the order of influence on fresh produce e-commerce consumer satisfaction is Logistics time (T2) > Shipping speed (T1) > Product quality (T5) > Delivery speed (T3) > After-sales strategy (T9) > Logistics packaging (T4) > Product price (T6) > Impact of COVID-19 (T10) > Marketing strategy (T8) > Product brand (T7). The effect of the XGBoost model also differs somewhat from that of the PLS-SEM, in which the Product quality and After-sales strategy show a higher impact. The difference may be due to the oversimplification of the linearly compensated structural equation model compared to the non-linear XGBoost. This also reflects the significance of combining structural equation modeling with XGBoost analysis. The influence of each topic variable on fresh produce satisfaction can be further refined based on the ranking of the importance of each factor on satisfaction, providing a basis for decisions regarding fresh produce.

5. Discussion and Implication

H1 hypothesized a link between epidemic factors and logistical factors. We discovered that the epidemic factor had a significant impact on logistics and that it was more pronounced than other factors. The reasons for this are that fresh produce e-commerce operations are not yet mature enough to cope with market changes, insufficient market inventory, and poor logistics transit. In the current context of COVID-19, policymakers should be aware that a sustainable food supply system and logistics system are essential and very important when dealing with unexpected shocks. In this regard, the emergency response mechanism for fresh agricultural products should be accelerated and improved, and consideration should be given to establishing a corresponding emergency command center for agricultural products, allocating fresh agricultural products in emergency situations, and carrying out multi-channel stockpiling to establish a sustainable supply system. In addition, the intelligence of the logistics industry should be accelerated, unmanned parks and unmanned terminals should be constructed, as well as the application of intelligent operations, unmanned technology, logistics robots, and other equipment to build a sustainable logistics system.
Hypothesis H2 tested the relationship between epidemic factors and product factors. We found that epidemiological factors had a significant impact on product factors. Fruits and vegetables require strict storage conditions and are prone to spoilage and lack of freshness when stored at room temperature for long periods of time. The shortage of labor in the logistics industry caused by the epidemic as well as traffic risk control measures have deeply affected the transportation of goods, further affecting the quality of fresh produce. On this basis, vaccination of procurement and transportation personnel and regular COVID-19 virus testing of key personnel could be prioritized to ensure the stability of the relevant labor supply. At the same time, when an epidemic occurs, appropriate road control measures should be taken to safeguard the flow of produce and ensure its freshness and quality, thereby ensuring supply chain stability.
In hypothesis H3, platform factors are examined in relation to epidemic factors. It is implied that the epidemic affects platform factors. The epidemic has caused consumers to switch from the original offline shopping purchase mode to an online purchase mode, with a large increase in fresh produce online users, platform activities, and after-sales playing an increasingly important role in the production and sale of fresh produce. In response, consideration should first be given to increasing online promotions, such as celebrity leads, to market good quality and inexpensive produce to help sales. Again, to ensure that users do not lose out, a good after-sales service experience is key. It is particularly important to maintain a good service attitude and to respond to consumer concerns in a timely manner, especially with regard to logistics delivery and distribution during the epidemic. Overall, give full consideration to consumer experience, meet consumers’ individual needs and promote sustainable marketing of the fresh produce e-commerce platform during COVID-19.
Hypothesis H4 tests the relationship between logistics factors and consumer satisfaction. We found that logistics factors have one of the highest effects on consumer satisfaction. Logistics is an extremely critical factor affecting consumer satisfaction, and consumers shopping online through O2O fresh produce e-commerce expect merchants to deliver in a timely and accurate manner. In response, it is particularly urgent to build a modern logistics system. The first thing to do is to speed up the industrialization of the rural digital economy, implement strategies such as “Internet+” agriculture and “digital countryside”, accelerate the construction of e-commerce platforms and cold chain logistics, build networks and urban and rural service centers, and solve a range of problems such as poor circulation of agricultural products. Second, promote the in-depth integration of online and offline, support cold-chain home delivery, online shop self-pick-up, convenience store delivery, community direct distribution, and other delivery methods, and open up the “last mile” channel for the circulation of agricultural products. Develop green and efficient express packaging technology to reduce product loss, save costs and protect the environment.
The relationship between product factors and customer satisfaction is investigated by hypothesis H5. We found that there is a significant effect of product factors on consumer satisfaction. The reason for this is that freshness is a reflection of the quality of the fresh produce itself and is an important factor in consumer evaluation of the quality of fresh produce. Green, organic, pollution-free, fresh, and good-tasting produce are the most attractive to consumers. In this regard, merchants should strictly control the quality of their fresh produce. Quality assurance of fresh produce is the core concern of consumers. If the quality of fresh produce sold by merchants does not meet consumers’ expectations, even if merchants provide good services and low prices, it will reduce consumers’ trust and satisfaction. This is why businesses need to carefully screen their fresh produce from the start of the supply chain, maintain consistency in the quality of their products online and offline, and provide consumers with a diverse selection of fresh, organic, green, safe, reliable, and well-packaged fresh produce. It is especially important to strengthen food production process supervision and control to ensure food safety and nutritional value, to eliminate artificial additives, and to use environmentally friendly production and packaging. For example, the October Paddy Organic Rice on the JD platform is designed to enhance the consumer experience by strictly grasping quality, creating a higher quality brand, and stimulating consumer consumption.
Hypothesis H6 investigates the link between platform factors and consumer satisfaction. The platform factor had a significant effect on consumer satisfaction, according to our findings. The reason for this is that on an e-commerce platform, different merchants selling the same type of goods are easily clustered together due to the existence of search engines and price comparison functions, and price and after-sales service are important criteria for consumers to choose a merchant. In response, improve after-sales service links to save the user experience that has not reached expectations. After-sales service is very important for fresh produce e-commerce. In this regard, consideration can be given to absorbing surplus rural labor and increasing the number of customer service staff so that any questions and suggestions raised by consumers can be answered in the first instance. In addition, optimize the after-sales process and do a good job of auditing returns and exchanges to establish a good reputation for the platform.
Hypothesis H7 tests the relationship between epidemic factors and consumer satisfaction. We found that the epidemic factors had a significant effect on consumer satisfaction. In this regard, it is especially important to strengthen the supply chain guarantee and improve product quality and customer after-sales experience.
From the XGBoost model, we can find that two factors, product quality and after-sales strategy, show a high impact on consumer satisfaction. This also reflects that product quality is the core competitiveness of fresh products and the after-sales factor provides a considerable degree of psychological security. In this regard, it is suggested that future transport arrangements of logistics enterprises should be further improved, not only for the selection of transport equipment, the monitoring of the transport process, the maintenance of transport tools and equipment, but also for the supervision of the quality of the whole process of fresh e-commerce products. An efficient transportation plan will guarantee the quality of products in emergency situations. It also provides an immediate and effective after-sales guarantee to solve problems with the products immediately and improve customer satisfaction.

6. Conclusions and Limitations

During COVID-19, the logistics and transportation of fresh produce, product quality, and platform operations were all affected by a major impact. Exploring the specific impact of COVID-19 and the factors that influence consumer satisfaction can help companies grow and become more resilient in the COVID-19 environment. Reviewing of text mining has proven to be a useful tool for consumer satisfaction analysis. In contrast to traditional questionnaire-based surveys, text mining can retrieve knowledge from the burgeoning volumes of unstructured data. These consumer review data represent immediate consumer responses and are available at low cost for long-term data. Economists are already aware of the impact of factors such as logistics and product on satisfaction with fresh produce, and have studied this. However, the impact of COVID-19 on fresh produce production and marketing needs further research. What factors affect consumer satisfaction and how to facilitate the long-term development of e-commerce for fresh produce are questions that must be addressed. In order to achieve this goal, this study proposes a mixed methodology based on LDA-SEM-XGBoost to conduct research on consumer satisfaction with fresh produce in the context of COVID-19.
The specific findings of this paper are as follows. First, we crawled consumer review data through China’s largest e-commerce consumer platform JD, and after five processes of data pre-processing including text de-duplication, phrase deletion, text splitting, stop word removal and n-grams, we obtained 9243 final review data. The LDA topic model was then applied to extract 10 topic variables, including shipping speed, logistics time, delivery speed, logistics packaging, product quality, product price, product brand, marketing strategy, after-sales strategy, impact of COVID-19, and classified them into four major factors: logistics, product, platform, and epidemic. Again, PLS-SEM was developed to explore the causal relationships between the factors, including the impact of the epidemic factors on the logistics, product, and platform factors, and the impact of each factor on consumer satisfaction. Finally, the XGBoost model was applied to further rank the importance of each topic variable on consumer satisfaction. The results showed that epidemic factors had a significant effect on the logistics factors (β = 0.281, p < 0.001), the product factors (β = 0.232, p < 0.001), the platform factors (β = 0.096, p < 0.01), with the greatest effect on the logistics factors. Logistics factors (β = 0.144, p < 0.001), product factors (β = 0.121, p < 00.001), platform factors (β = 0.102, p < 0.01), and epidemic factors (β = −0.113, p < 0.01) all had an impact on satisfaction, with the logistics factors having the greatest impact. The topic variables affecting consumer satisfaction with fresh produce e-commerce are, in order: logistics time (T2) > delivery speed (T1) > product quality (T5) > delivery speed(T3) > after-sales strategy (T9) > logistics packaging (T4) > product price (T6) > impact of COVID-19 (T10) > marketing strategy (T8) > product brand (T7). Based on the above findings, we have made some recommendations for the production and marketing of fresh produce.
However, the study still has some shortcomings. This study only focuses on fresh produce, which is a broad concept that includes fresh fruit, seafood, aquatic products, selected meat, vegetables, and eggs, so future research could be further refined to examine the differences that may exist between the various types of produce. In addition, in terms of methodology, the linear relationship between factors and consumer satisfaction was not considered. In the future, methods such as logistic regression and SHAP value could be used to investigate this issue.

Author Contributions

G.G. designed this article; D.L. and J.Z. wrote this article. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of Humanities and Social Science Project (No.22YJAZH023).

Institutional Review Board Statement

Ethical review and approval were waived for this study, according to Article 13(6) of the Law of the People’s Republic of China on the Protection of Personal Information and Article 27 of the Security Management Measures issued by the Internet Information Office of China, the research uses data that is legally and publicly available to individuals on online platforms, collected from legally and publicly available channels and not obviously against the will of the subject of the personal information and anonymized, and does not involve personal privacy or ethical issues.

Informed Consent Statement

Patient consent were waived for this study, according to Article 13(6) of the Law of the People’s Republic of China on the Protection of Personal Information and Article 27 of the Security Management Measures issued by the Internet Information Office of China, the research uses data that is legally and publicly available to individuals on online platforms, collected from legally and publicly available channels and not obviously against the will of the subject of the personal information and anonymized, and does not involve personal privacy.

Data Availability Statement

Data could be provided upon reasonable request.

Acknowledgments

We would like to thank the reviewers, whose comments greatly improved the manuscript.

Conflicts of Interest

The authors declare they have no known competing financial interests or personal relationships that could influence the work reported in this paper.

References

  1. Pu, M.; Chen, X.; Zhong, Y. Overstocked Agricultural Produce and Emergency Supply System in the COVID-19 Pandemic: Responses from China. Foods 2021, 10, 3027. [Google Scholar] [CrossRef]
  2. Sheth, J. Impact of COVID-19 on consumer behavior: Will the old habits return or die? J. Bus. Res. 2020, 117, 280–283. [Google Scholar] [CrossRef] [PubMed]
  3. Chetty, R.; Friedman, J.; Hendren, N.; Stepner, M. The Opportunity Insights Team. The Economic Impacts of COVID-19: Evidence from a New Public Database Built Using Private Sector Data. NBER Work. Pap. 2020, 27431, 1–84. [Google Scholar] [CrossRef]
  4. Yanyan, W. Empirical Analysis of Factors Influencing Consumers’ Satisfaction in Online Shopping Agricultural Products in China. J. Electron. Commer. Organ. 2018, 16, 64–77. [Google Scholar] [CrossRef]
  5. Zeng, Y.; Jia, F.; Wan, L.; Guo, H. E-commerce in agri-food sector: A systematic literature review. Int. Food Agribus. Manag. Rev. 2017, 20, 439–460. [Google Scholar] [CrossRef]
  6. Han, B.R.; Sun, T.; Chu, L.Y.; Wu, L. COVID-19 and E-Commerce Operations: Evidence from Alibaba. Manuf. Serv. Oper. Manag. 2022, 24, 1388–1405. [Google Scholar] [CrossRef]
  7. Cang, Y.-M.; Wang, D.-C. A comparative study on the online shopping willingness of fresh agricultural products between experienced consumers and potential consumers. Sustain. Comput. Inform. Syst. 2021, 30, 100493. [Google Scholar] [CrossRef]
  8. Beckman, J.; Countryman, A.M. The Importance of Agriculture in the Economy: Impacts from COVID-19. Am. J. Agric. Econ. 2021, 103, 1595–1611. [Google Scholar] [CrossRef] [PubMed]
  9. Lin, Y.; Marjerison, R.K.; Choi, J.; Chae, C. Supply Chain Sustainability during COVID-19: Last Mile Food Delivery in China. Sustainability 2022, 14, 1484. [Google Scholar] [CrossRef]
  10. Agrahari, R.; Mohanty, S.; Vishwakarma, K.; Nayak, S.K.; Samantaray, D.; Mohapatra, S. Update vision on COVID-19: Structure, immune pathogenesis, treatment and safety assessment. Sensors Int. 2021, 2, 100073. [Google Scholar] [CrossRef]
  11. Andersen, K.G.; Rambaut, A.; Lipkin, W.I.; Holmes, E.C.; Garry, R.F. The proximal origin of SARS-CoV-2. Nat. Med. 2020, 26, 450–452. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Baek, H.; Ahn, J.; Choi, Y. Helpfulness of Online Consumer Reviews: Readers’ Objectives and Review Cues. Int. J. Electron. Commer. 2012, 17, 99–126. [Google Scholar] [CrossRef]
  13. Singh, N.; Hu, C.; Roehl, W.S. Text mining a decade of progress in hospitality human resource management research: Identifying emerging thematic development. Int. J. Hosp. Manag. 2007, 26, 131–147. [Google Scholar] [CrossRef]
  14. Stepchenkova, S.; Kirilenko, A.P.; Morrison, A.M. Facilitating Content Analysis in Tourism Research. J. Travel Res. 2008, 47, 454–469. [Google Scholar] [CrossRef] [Green Version]
  15. Marusak, A.; Sadeghiamirshahidi, N.; Krejci, C.C.; Mittal, A.; Beckwith, S.; Cantu, J.; Morris, M.; Grimm, J. Resilient regional food supply chains and rethinking the way forward: Key takeaways from the COVID-19 pandemic. Agric. Syst. 2021, 190, 103101. [Google Scholar] [CrossRef]
  16. Xu, Z.; Elomri, A.; Kerbache, L.; El Omri, A. Impacts of COVID-19 on global supply chains: Facts and perspectives. IEEE Eng. Manag. Rev. 2020, 48, 153–166. [Google Scholar] [CrossRef]
  17. Chen, J.; Zhang, Y.; Zhu, S.; Liu, L. Does COVID-19 Affect the Behavior of Buying Fresh Food? Evidence from Wuhan, China. Int. J. Environ. Res. Public Health 2021, 18, 4469. [Google Scholar] [CrossRef]
  18. Scacchi, A.; Catozzi, D.; Boietti, E.; Bert, F.; Siliquini, R. COVID-19 Lockdown and Self-Perceived Changes of Food Choice, Waste, Impulse Buying and Their Determinants in Italy: QuarantEat, a Cross-Sectional Study. Foods 2021, 10, 306. [Google Scholar] [CrossRef] [PubMed]
  19. de Oliveira, W.Q.; de Azeredo, H.M.C.; Neri-Numa, I.A.; Pastore, G.M. Food packaging wastes amid the COVID-19 pandemic: Trends and challenges. Trends Food Sci. Technol. 2021, 116, 1195–1199. [Google Scholar] [CrossRef]
  20. Jribi, S.; Ben Ismail, H.; Doggui, D.; Debbabi, H. COVID-19 virus outbreak lockdown: What impacts on household food wastage? Environ. Dev. Sustain. 2020, 22, 3939–3955. [Google Scholar] [CrossRef]
  21. Lingyu, M.; Lauren, C.; Zhijie, D. Strategic Development of Fresh E-Commerce with Respect to New Retail. In Proceedings of the 2019 IEEE 16th International Conference on Networking, Banff, AB, Canada, 9–11 May 2019; pp. 373–378. [Google Scholar] [CrossRef]
  22. Huang, J.; Xu, W.; Wei, H.; Wan, H. Study on Consumers’ Satisfaction Degree and Influencing Factors of Online Shopping for Agricultural Products. In Proceedings of the 2014 International Conference on Mechatronics, Electronic, Industrial and Control Engineering (MEIC-14), Shenyang, China, 15–17 November 2014. [Google Scholar] [CrossRef] [Green Version]
  23. Liu, X.; Kao, Z. Research on influencing factors of customer satisfaction of e-commerce of characteristic agricultural products. Procedia Comput. Sci. 2022, 199, 1505–1512. [Google Scholar] [CrossRef]
  24. Yongqing, Y.; Nan, L.; Meijian, L.; Shanshan, L. Study on the Effects of Logistics Service Quality on Consumers’ Post-Purchase Behavior of Online Shopping. Int. J. Adv. Inf. Sci. Serv. Sci. 2011, 3, 241–247. [Google Scholar] [CrossRef] [Green Version]
  25. Berezan, O.; Raab, C.; Yoo, M.; Love, C. Sustainable hotel practices and nationality: The impact on guest satisfaction and guest intention to return. Int. J. Hosp. Manag. 2013, 34, 227–233. [Google Scholar] [CrossRef]
  26. Resano, H.; Pérez-Cueto, F.J.A.; Sanjuán, A.I.; de Barcellos, M.D.; Grunert, K.G.; Verbeke, W. Consumer satisfaction with dry-cured ham in five European countries. Meat Science 2011, 87, 336–343. [Google Scholar] [CrossRef] [PubMed]
  27. Park, J.-E.; Kang, E. The Mediating Role of Eco-Friendly Artwork for Urban Hotels to Attract Environmental Educated Consumers. Sustainability 2022, 14, 3784. [Google Scholar] [CrossRef]
  28. Bornstein, M.H.; Jager, J.; Putnick, D.L. Sampling in developmental science: Situations, shortcomings, solutions, and standards. Dev. Rev. 2013, 33, 357–370. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Kim, Y.-J.; Kim, H.-S. The Impact of Hotel Customer Experience on Customer Satisfaction through Online Reviews. Sustainability 2022, 14, 848. [Google Scholar] [CrossRef]
  30. Jung, Y.; Suh, Y. Mining the voice of employees: A text mining approach to identifying and analyzing job satisfaction factors from online employee reviews. Decis. Support Syst. 2019, 123, 113074. [Google Scholar] [CrossRef]
  31. Nilashi, M.; Abumalloh, R.A.; Alghamdi, A.; Minaei-Bidgoli, B.; Alsulami, A.A.; Thanoon, M.; Asadi, S.; Samad, S. What is the impact of service quality on customers’ satisfaction during COVID-19 outbreak? New findings from online reviews analysis. Telemat. Inform. 2021, 64, 101693. [Google Scholar] [CrossRef]
  32. Shah, A.M.; Yan, X.; Tariq, S.; Ali, M. What patients like or dislike in physicians: Analyzing drivers of patient satisfaction and dissatisfaction using a digital topic modeling approach. Inf. Process. Manag. 2021, 58, 102516. [Google Scholar] [CrossRef]
  33. Aktas-Polat, S.; Polat, S. Discovery of factors affecting tourists’ fine dining experiences at five-star hotel restaurants in Istanbul. Br. Food J. 2021, 124, 221–238. [Google Scholar] [CrossRef]
  34. Hong, W.; Zheng, C.; Wu, L.; Pu, X. Analyzing the Relationship between Consumer Satisfaction and Fresh E-Commerce Logistics Service Using Text Mining Techniques. Sustainability 2019, 11, 3570. [Google Scholar] [CrossRef] [Green Version]
  35. Xin, X.; Jiaying, C. Research on the Influence of E-commerce service quality of fresh Agricultural products on customer satisfaction. E3S Web Conf. 2020, 189, 01022. [Google Scholar] [CrossRef]
  36. Du, H.; Chen, X. Study on vegetable emergency support technology under epidemic. IOP Conf. Ser. Earth Environ. Sci. 2020, 615, 012007. [Google Scholar] [CrossRef]
  37. Al-Hawari, A.R.R.S.; Balasa, A.P.; Slimi, Z. COVID-19 Impact on Online Purchasing Behaviour in Oman and the Future of Online Groceries. Eur. J. Bus. Manag. Res. 2021, 6, 74–83. [Google Scholar] [CrossRef]
  38. Din, A.U.; Han, H.; Ariza-Montes, A.; Vega-Muñoz, A.; Raposo, A.; Mohapatra, S. The Impact of COVID-19 on the Food Supply Chain and the Role of E-Commerce for Food Purchasing. Sustainability 2022, 14, 3074. [Google Scholar] [CrossRef]
  39. Bian, X.; Yao, G.; Shi, G. Social and natural risk factor correlation in China’s fresh agricultural product supply. PLoS ONE 2020, 15, e0232836. [Google Scholar] [CrossRef] [PubMed]
  40. Chen-Yu, J.; Kim, J.; Lin, H.-L. Antecedents of product satisfaction and brand satisfaction at product receipt in an online apparel shopping context. J. Glob. Fash. Mark. 2017, 2000, 1–13. [Google Scholar] [CrossRef]
  41. Chen, Z.; Dubinsky, A.J. A conceptual model of perceived customer value in e-commerce: A preliminary investigation. Psychol. Mark. 2003, 20, 323–347. [Google Scholar] [CrossRef]
  42. Keeney, R.L. The Value of Internet Commerce to the Customer. Manag. Sci. 1999, 45, 533–542. [Google Scholar] [CrossRef]
  43. Maxwell, S.; Maxwell, N. Channel reference prices: The potentially damaging effects of Napster. In Proceedings of the 2001 Fordham University Behavioral Pricing Conference, New York, NY, USA, 2 June 2021; Volume 32, pp. 104–110. [Google Scholar]
  44. Biswas, A.; Blair, E.A. Contextual effects of reference prices in retail advertisements. J. Mark. 1991, 55, 1. [Google Scholar] [CrossRef]
  45. Haque, A.; Khatibi, A.; Mahmud, S.A. Factors determinate customer shopping behaviour through internet: The Malaysian case. Aust. J. Basic Appl. Sci. 2009, 3, 3452–3463. [Google Scholar]
  46. Kim, S.; Stoel, L. Apparel retailers: Website quality dimensions and satisfaction. J. Retail. Consum. Serv. 2004, 11, 109–117. [Google Scholar] [CrossRef]
  47. Seyed, R.; Farzana, Y.; Ahasanul, H.; Ali, K. Study on consumer perception toward e-ticketing: Empirical study in Malaysia. Indian J. Commer. Manag. Stud. 2011, 2, 3–13. [Google Scholar]
  48. Khristianto, W.; Kertahadi, I.; Suyadi, I. The influence of information, system and service on customer satisfaction and loyalty in online shopping. Int. J. Acad. Res. 2012, 4, 28–32. [Google Scholar]
  49. Jelodar, H.; Wang, Y.; Yuan, C.; Feng, X. Latent Dirichlet Allocation (LDA) and Topic modeling: Models, applications, a survey. Multimed. Tools Appl. 2018, 78, 15169–15211. [Google Scholar] [CrossRef] [Green Version]
  50. Bastani, K.; Namavari, H.; Shaffer, J. Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints. Expert Syst. Appl. 2019, 127, 256–271. [Google Scholar] [CrossRef] [Green Version]
  51. Çallı, L.; Çallı, F. Understanding Airline Passengers during COVID-19 Outbreak to Improve Service Quality: Topic Modeling Approach to Complaints with Latent Dirichlet Allocation Algorithm. Transp. Res. Rec. J. Transp. Res. Board 2022. [Google Scholar] [CrossRef]
  52. Wang, W.; Feng, Y.; Dai, W. Topic analysis of online reviews for two competitive products using latent Dirichlet allocation. Electron. Commer. Res. Appl. 2018, 29, 142–156. [Google Scholar] [CrossRef]
  53. Çalli, L.; Çalli, F.; Çalli, B.A. Yönetim Bilişim Sistemleri Disiplininde Hazırlanan Lisansüstü Tezlerin Gizli Dirichlet Ayrımı Algoritmasıyla Konu Modellemesi. MANAS Sos. Araştırmalar Derg. 2021, 10, 2355–2372. [Google Scholar] [CrossRef]
  54. Hu, N.; Zhang, T.; Gao, B.; Bose, I. What do hotel customers complain about? Text analysis using structural topic model. Tour. Manag. 2019, 72, 417–426. [Google Scholar] [CrossRef]
  55. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  56. Hair, J.F.; Ringle, C.M.; Sarstedt, M. PLS-SEM: Indeed a Silver Bullet. J. Mark. Theory Pract. 2011, 19, 139–152. [Google Scholar] [CrossRef]
  57. Ji, S.; Wang, X.; Zhao, W.; Guo, D. An Application of a Three-Stage XGBoost-Based Model to Sales Forecasting of a Cross-Border E-Commerce Enterprise. Math. Probl. Eng. 2019, 2019, 1–15. [Google Scholar] [CrossRef] [Green Version]
  58. Alotaibi, Y.; Malik, M.N.; Khan, H.H.; Batool, A.; Islam, S.U.; Alsufyani, A.; Alghamdi, S. Suggestion Mining from Opinionated Text of Big Social Media Data. Comput. Mater. Contin. 2021, 68, 3323–3338. [Google Scholar] [CrossRef]
  59. Huang, Y.-P.; Yen, M.-F. A new perspective of performance comparison among machine learning algorithms for financial distress prediction. Appl. Soft Comput. 2019, 83, 105663. [Google Scholar] [CrossRef]
  60. Chu, K.-S.; Oh, C.-H.; Choi, J.-R.; Kim, B.-S. Estimation of Threshold Rainfall in Ungauged Areas Using Machine Learning. Water 2022, 14, 859. [Google Scholar] [CrossRef]
  61. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  62. Strobl, C.; Boulesteix, A.-L.; Kneib, T.; Augustin, T.; Zeileis, A. Conditional variable importance for random forests. BMC Bioinform. 2008, 9, 307. [Google Scholar] [CrossRef] [Green Version]
  63. Strobl, C.; Boulesteix, A.-L.; Zeileis, A.; Hothorn, T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 2007, 8, 25. [Google Scholar] [CrossRef] [Green Version]
  64. McCallum, A. Topic Model Diagnostics. UMASS. 2018. Available online: http://mallet.cs.umass.edu/diagnostics.php (accessed on 1 February 2021).
  65. Hu, S.-Y.; Piao, S.-Y.; Li, Z.-R.; Sun, Y.-C.; Jin, X.-Y.; Lee, J.-I. Research on Factors Influencing Chinese Consumers’ Intention to Buy Agricultural Fresh Products Online—Evidence from Tangshan City. J. Korean Soc. Int. Agric. 2020, 32, 309–314. [Google Scholar] [CrossRef]
  66. Koutsimanis, G.; Getter, K.; Behe, B.; Harte, J.; Almenar, E. Influences of packaging attributes on consumer purchase decisions for fresh produce. Appetite 2012, 59, 270–280. [Google Scholar] [CrossRef] [PubMed]
  67. Lee, J.-W.; Kim, J.-J. Study on the Selection Determinants on Consumers Purchasing Agricultural Products via Direct Market. East Asian J. Bus. Manag. 2020, 8, 43–56. [Google Scholar] [CrossRef]
  68. Leguina, A. A primer on partial least squares structural equation modeling (PLS-SEM). Int. J. Res. Method Educ. 2015, 38, 220–221. [Google Scholar] [CrossRef]
  69. Bagozzi, R.P.; Yi, Y. On the evaluation of structural equation models. J. Acad. Mark. Sci. 1988, 16, 74–94. [Google Scholar] [CrossRef]
  70. Markus, K.A. Principles and Practice of Structural Equation Modeling by Rex B. Kline. Struct. Equ. Model. A Multidiscip. J. 2012, 19, 509–512. [Google Scholar] [CrossRef]
  71. Wong, C.-H.; Tan, G.W.-H.; Loke, S.-P.; Ooi, K.-B. Adoption of mobile social networking sites for learning? Online Inf. Rev. 2015, 39, 762–778. [Google Scholar] [CrossRef]
  72. Hu, L.-T.; Bentler, P.M. Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychol. Methods 1998, 3, 424–453. [Google Scholar] [CrossRef]
  73. Bentler, P.M.; Bonett, D.G. Significance tests and goodness of fit in the analysis of covariance structures. Psychol. Bull. 1980, 88, 588–606. [Google Scholar] [CrossRef]
  74. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 4th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012; p. 821. [Google Scholar]
  75. Shiau, W.-L.; Chau, P.Y. Understanding behavioral intention to use a cloud computing classroom: A multiple model comparison approach. Inf. Manag. 2016, 53, 355–365. [Google Scholar] [CrossRef]
  76. Hair, J.F.; Ringle, C.M.; Sarstedt, M. Partial Least Squares Structural Equation Modeling: Rigorous Applications, Better Results and Higher Acceptance. Long Range Plan. 2013, 46, 1–12. [Google Scholar] [CrossRef]
Figure 1. The conceptual framework used.
Figure 1. The conceptual framework used.
Sustainability 14 16392 g001
Figure 2. Research Framework.
Figure 2. Research Framework.
Sustainability 14 16392 g002
Figure 3. Text mining process.
Figure 3. Text mining process.
Sustainability 14 16392 g003
Figure 4. LDA graphic model.
Figure 4. LDA graphic model.
Sustainability 14 16392 g004
Figure 5. LDA graphic model. Notes: **: p < 0.010; ***: p < 0.001.
Figure 5. LDA graphic model. Notes: **: p < 0.010; ***: p < 0.001.
Sustainability 14 16392 g005
Figure 6. Feature importance ranking based on XGBoost.
Figure 6. Feature importance ranking based on XGBoost.
Sustainability 14 16392 g006
Table 1. Topic extraction results.
Table 1. Topic extraction results.
Latent VariablesVariable NumberVariable NameTop Keywords
Logistics factorsT1Shipping speedShipping (0.173), speed (0.071), satisfaction (0.069), logistics (0.066), shopping (0.065)
T2Logistics timeExpress (0.103), logistics (0.083), soon (0.081), purchase (0.079), speed (0.057)
T3Delivery speedDelivery (0.117), send to (0.045), arrival (0.042), express (0.030), today (0.028)
T4Logistics packagingPackaging (0.101),received (0.046), inside (0.037), ice pack (0.037), foam (0.020)
Product FactorsT5Product qualityDelicious (0.090), special (0.061), fresh (0.046), taste (0.025), texture (0.020)
T6Product pricePrice (0.146), good (0.127), affordable (0.099), value for money (0.064), satisfied (0.051)
T7Product brandBrands (0.069), multiple (0.049), purchases (0.045), this (0.020), fits (0.019)
Platform factorsT8Marketing strategyEvents (0.095), bargains (0.058), deals (0.049), bargains (0.041), promotions (0.039)
T9After-sales strategycustomer service (0.034), merchants (0.031), after sales (0.027), bad reviews (0.024), discovery (0.018)
Epidemic factorsT10Impact of COVID-19Epidemic (0.278),Outbreak (0.044), period (0.033), now (0.028), fresh product (0.026)
Table 2. Topic model diagnostic measurements.
Table 2. Topic model diagnostic measurements.
Variable NameCoherenceExclusivityCorpus Distance
Shipping speed−259.830.571.70
Logistics time−231.970.641.60
Delivery speed−278.340.731.68
Logistics packaging−277.650.611.77
Product quality−251.380.571.58
Product price−235.210.521.54
Product brand−297.870.651.52
Marketing strategy−269.580.691.75
After-sales strategy−225.160.541.83
Impact of COVID-19−283.140.621.76
Table 3. Measurement validation.
Table 3. Measurement validation.
Latent VariablesMeasurement ItemsStandard Load FactorAVECR
Logistics factorsT10.6990.5100.807
T20.735
T30.724
T40.699
Products factorsT50.5540.6080.817
T60.841
T70.899
Platform factorsT80.8950.7620.865
T90.850
Table 4. Discriminant validation.
Table 4. Discriminant validation.
Logistics FactorsProducts FactorsPlatform Factors
Logistics factors0.714
Products factors0.3490.780
Platform factors0.4580.3020.873
Table 5. Hypothesis testing.
Table 5. Hypothesis testing.
Hypothesisf2Cofficients (β)t Statisticsp-ValueVIFResults
H1 (EF→LF)0.370.28115.46<0.0011.118Support
H2 (EF→PDF)0.310.23214.43<0.0011.104Support
H3 (EF→PFF)0.110.0964.65<0.011.953Support
H4 (LF→SAT)0.210.1449.66<0.0011.431support
H5 (PDF→SAT)0.250.1219.32<0.0011.479Support
H6 (PFF→SAT)0.180.1028.21<0.0011.554Support
H7 (EF→SAT)0.17−0.113−4.37<0.011.979Support
Remark: Q2LF = 0.224; Q2PDF = 0.205; Q2PEF = 0.212; Q2SAT = 0.184, SRMR = 0.076, NFI = 0.913.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Guan, G.; Liu, D.; Zhai, J. Factors Influencing Consumer Satisfaction of Fresh Produce E-Commerce in the Background of COVID-19—A Hybrid Approach Based on LDA-SEM-XGBoost. Sustainability 2022, 14, 16392. https://doi.org/10.3390/su142416392

AMA Style

Guan G, Liu D, Zhai J. Factors Influencing Consumer Satisfaction of Fresh Produce E-Commerce in the Background of COVID-19—A Hybrid Approach Based on LDA-SEM-XGBoost. Sustainability. 2022; 14(24):16392. https://doi.org/10.3390/su142416392

Chicago/Turabian Style

Guan, Gaofeng, Dong Liu, and Jiayang Zhai. 2022. "Factors Influencing Consumer Satisfaction of Fresh Produce E-Commerce in the Background of COVID-19—A Hybrid Approach Based on LDA-SEM-XGBoost" Sustainability 14, no. 24: 16392. https://doi.org/10.3390/su142416392

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop