Introduction
Recently, COVID-19 has broken out in more than 200 countries around the world, causing millions of people to become infected and die [1], and it has become a major global public health event. To make matters worse, this incident will continue for a long time as the number of infected people is still rising rapidly. Compared with other industries, due to the high density of people in traditional education, people are more likely to be infected by and spread this virus. Therefore, COVID-19 will have a greater impact on the traditional education model. To reduce cross-infection and transmission among teachers and students, governments have to close universities, which seriously affects the educational situation of hundreds of millions of students and the normal operation of schools in most countries worldwide [2]. The widespread of this virus has created a high demand for network learning. Students need timely guidance of ideology to overcome the difficulties caused by COVID-19. Therefore, how to use network technology to help freshmen resist COVID-19 and reduce its impact on education has become a new hot topic.
Moreover, the impact of COVID-19 will be disastrous for universities that implement enrollment reform without distinguishing majors; this model has been adopted by millions of students in many countries. Generally, such kinds of students need to receive 1–2 years of general education [3]. Then, they can gain a clear understanding of their majors and their strengths. After that, they understand how to choose their majors and learning materials. In this case, such freshmen have much more difficulty adapting to the impact of COVID-19 in the traditional teaching model. Taking China as an example, millions of students in most universities have adopted this new enrollment model. Therefore, how to quickly discover students’ talents through their historical data is very urgent.
Generally, with the wide application of learning management systems (LMSs), data mining or recommendation technology has become very popular. These technologies can help teachers and students obtain better learning feedback through the Internet [4], [5]. In particular, clustering, association rule mining and other methods can be used to analyze the data in online forums and learning management systems to obtain a highly interpretable student-performance model [6], [7]. Due to the specific purpose and function of traditional data mining algorithms, they are not suitable for direct application in the education field. This means that a preprocessing algorithm has to be used first and then some specific data mining methods can be applied to the problems. A. Dutt et al. [8] summarized the clustering algorithm used in data preprocessing in educational data mining (EDM), which provided the basis for applying a subsequent data mining algorithm. Compared with the active search of data mining, the passive push of recommendation algorithms is more suitable for freshmen who do not know enough about themselves. Some researchers have developed course recommendation systems using a collaborative filtering algorithm to facilitate students’ personalized learning experience [9]. Additionally, the recommendation algorithm has been successfully applied to student development and training [10].
To date, existing methods mainly mine isomorphic information, ignoring relationships among heterogeneous information in student history data. However, this heterogeneous information contains many student characteristics. Therefore, this paper proposes a personalized recommendation framework based on WHEN. The framework fully considers the differences among learning content, learning methods and learning environment. In this way, freshmen can quickly find their talents. This framework is mainly composed of the following parts: first, to effectively integrate heterogeneous information in the field of education, this paper proposes a WHEN; second, a series of semantically rich extension metapaths are defined in WHEN to realize multiple data mining tasks; third, a graph embedding method is used to learn the representation of individual students and recommended items, and a new random walk is proposed; and finally, it combines the learned representation and matrix decomposition algorithm to recommend learning resources and majors for freshmen.
The main innovations and contributions of this paper can be summarized as follows:
In this paper, we propose a new framework that can help freshmen who are not divided into majors to identify their talents and then recommend suitable majors and learning materials. This will effectively reduce the impact of COVID-19 on students.
This paper proposes a new WHEN structure, which fully considers the heterogeneity of data in student information. WHEN solves the problem that traditional methods can only mine isomorphic data.
This paper proposes a WHEN-based embedding method guided by extended metapaths to uncover the structural and semantic information of WHEN. Moreover, a new random walk method based on extended metapaths is proposed, which contributes to learning a more effective embedding representation for nodes in WHEN.
The rest of this paper is organized as follows. Section 2 presents related work. The framework for personalized recommendation is discussed in Section 3. The experiments and results are presented in Section 4. Section 5 concludes this paper.
Related Work
In this section, we summarize the relevant work from three aspects: general enrollment, educational data mining, and graph embedding-based recommendation.
A. General ENROLLMENT
Enrollment in large categories means that colleges and universities do not recruit students by specific majors but merge multiple similar majors and recruit students by one major category. From the perspective of the development of colleges and universities worldwide, many world-renowned universities, such as Yale University, Stanford University, and the University of Michigan, do not recruit students based on majors, which means that “general enrollment” is an inevitable trend in higher education. Though most universities in the United Kingdom require all students to select their major when they apply, Scotland allows flexibility in major selection during a student’s first and second year of study. At Sorbonne University in France, students of the Faculty of Science and Engineering choose a first-year survey program, with three to four science disciplines to ensure that students have a larger context in the sciences. At the end of the first year, students choose their major and their minor. In China, the talent training model of “general enrollment” is gradually being adopted by universities [11]. From 2001 to 2018, more than half of the traditional “Project 211” colleges and universities implemented general enrollment reform, with nearly half a million new students enrolled each year. The reform of the talent training mode in colleges, known as “broad enrollment and split training”, has become a part of deepening higher education reform [12], [13]. For the reform problems, the researchers analyzed the causes and identified solutions. Junyi Wan et al. [14] believed that scores should not be the only criteria for professional diversion, but multiple indicators should be set. Jiaojiao Li et al. [15] noted that in the general education stage, the proportion of compulsory and elective courses is not coordinated, so it is necessary to establish a more reasonable curriculum system. Chongyi Fan et al. [16] established a diversion mechanism in line with students’ personal interests to eliminate blindness in the diversion of computer students. Zhengqing Luo et al. [17] developed a professional diversion program to improve the quality of talent cultivation and promote the growth of professional capabilities.
It is becoming a new mode of enrollment reform in which students are not divided into majors. However, there is little research on how to help students improve their learning efficiency on the premise of reducing professional learning time and let them learn their strengths more accurately so that they can find suitable learning materials based on these strengths.
B. Educational Data Mining
Educational data mining is an interdisciplinary field of research using data mining techniques in educational settings, with the aim of better understanding how students learn to improve their academic level and explain educational phenomena [5]. Since the popularization of computers, many computer-based educational information systems have emerged, such as the student information system (SIS) and the learning management system (LMS). Those systems collect large quantities of student data. Most EDM work is based on historical data in LMSs [7], [18]. The commonly used methods in EDM include classification and regression, clustering, association rule mining [19], [20] and social network analysis [21]. Garcia et al. [19] proposed a collaborative recommendation method based on the information mined by association rules. Rabbany et al. [21] introduced the concept of social networks into EDM and used social network analysis technology to analyze student participation in online courses. The main EDM applications include student modeling [22], decision support systems [9], [23] and adaptive systems. Vialardi et al. [23] devised a method to predict student performance in a course and recommend the course in which they are most likely to be successful. O’Mahony et al. [9] used collaborative filtering recommendation algorithms to recommend online courses to students.
At present, most studies focus on the data mining task on course recommendation [24]–[26]. For students who are not divided into majors, there is little research on how to tap their specialties and recommend suitable learning materials.
C. Graph Embedding-Based Recommendation
To learn a low-dimensional, real-valued and dense vector of a node in the graph, graph embedding is proposed as a new information modeling method. Inspired by word2vec [27], a model in the natural language processing field, DeepWalk [28], was first proposed to learn the feature representation of nodes in a homogeneous graph. Similarly, both node2vec [29] and LINE [30] are graph embedding methods used in homogeneous graphs. However, networks with many heterogeneous relationships are complex and diverse. Metapath2vec [31] was proposed to learn a node representation in an HIN, which makes it possible to apply the graph embedding method to different recommended tasks.
Researchers have put much effort into graph embedding-based recommendation, aiming to improve recommendation system performance with additional information. Chuan Shi et al. [32] proposed a recommendation method called HERec, which effectively improves the performance of recommendation systems and alleviates the cold start problem. However, HERec is a general-purpose recommendation method, which means that it lacks consideration for domain-specific information. There have been many studies on the application of graph embedding-based recommendations in specific scenarios. Xiao Ma et al. [33] proposed a recommendation method called HGRec to solve the recommendation problem of scientific papers. Xijun He et al. [34] proposed a patent technology transaction recommendation model (PSR-VEC), which effectively solves the inactive trade phenomenon of patent products in the market. Jianxing Zheng et al. [35] used heterogeneous information in social networks to fully explore users’ potential interests and behavioral motivations, which helps model users and provides personalized recommendation services through the GUP method. Hao Wu et al. [36] improved recommendation performance by using a hybrid network presentation learning method that learns effective information from user cotagging networks and social networks. Since the recommendation performance based on matrix factorization is affected by sparsity and scalability, XueJian Zhang et al. [37] proposed a recommendation method called ISRM_NE, which realizes top-N recommendation and improves system performance in terms of interpretability and scalability.
COVID-19 will have a significant impact on the education of freshmen who are not divided into majors. However, traditional methods cannot deal with this new situation because they do not consider the individual differences among students. However, it is meaningful to use the graph embedding method to help students learn materials and main suggestions. However, the method based on metapath2vec cannot solve the recommendation problem based on weighted heterogeneous information. Therefore, we will mainly study how to fully consider the personalized differences and needs of students through the optimization of heterogeneous information in the educational environment for a personalized recommendation.
Proposed Framework
In Figure 1, it can be seen that the proposed framework consists of five stages: data preprocessing, WHEN construction, metapath selection, WHEN-based embedding and personalized recommendation.
Data preprocessing. A hierarchical questionnaire is designed to collect educational data. We extract 6 types of objects, 5 types of relations and 2 types of constraints from the noisy information.
WHEN construction. After describing the nodes and edges in detail, a formal definition of the WHEN and WHEN schema is given. Based on the above definition, the WHEN is constructed to integrate various student information.
Metapath selection. WHEN is projected into a set of subnetworks. Extended metapaths are selected from every subnetwork.
WHEN-based embedding. A new random walk strategy is proposed to sample each given metapath. The optimization objective of this method is to maximize the co-occurrence probability of nodes and their neighboring nodes in the sampled sequences. Multiple embeddings on the extended metapath are merged into one as the final result.
Personalized recommendation. Using the node representation obtained in the previous step, we optimize the performance of the matrix factorization algorithm.
A. Stage 1: Data Preprocessing
Using a questionnaire designed for 114 students, we obtain personal information and learning experience for each student. Personal information includes majors of interest, academic performance and financial status. Learning experience is composed of five categories: innovation, research, study, practice and social interaction. Each category reflects one ability. After preprocessing, the individual information of each student can be expressed as follows:\begin{align*} \left \{{\left \{{P,\! E,\!\left \{{M_{1}, \!\ldots \!, M_{k}}\right \}}\right \},\left \{{\left ({A_{1}, C_{1}}\right),\left ({A_{2}, C_{2}}\right), \!\ldots \!,\left ({A_{m}, C_{n}}\right)}\right \}}\right \} \\\tag{1}\end{align*}
B. Stage 2: When Construction
To construct the WHEN, six kinds of objects are extracted from the processed data to represent six different types of nodes. In addition, the relationships among them are determined, which are represented as five types of edges, while two of them have attribute value constraints. As shown in Figure 2, each circle represents one type of node set. The color identification in the figure indicates an attribute value constraint relationship, while black indicates a normal relationship.
1) Node
As shown in Figure 2, the six objects are as follows: students (S), academic performance (P), economic conditions (E), preferred major (M), activity experience (A) and activity category (C). Academic performance (P) refers to the learning situation of a student in a specific course. The academic performance of a student is quantified by dividing the grades in each course. As shown in Figure 2, John’s performance in math class is average; that is, John’s score in math class is 0-70. This type of node helps us integrate information about the student’s academic performance into the network. Economic condition (E) refers to the monthly cost of living of a certain student. We delimit the range of monthly cost of living and indicate it with different nodes to include the student’s economic condition in the network. Major (M) refers to the major that most students prefer when majors are divided, activity (A) refers to the activities that students have participated in during the whole freshman year, and category (C) refers to all types of activities. From the perspective of student ability assessment, activities are divided into five categories: innovation, scientific research, study, practice and social interaction.
2) Edge
As shown in Figure 2, green is used to identify a relation with constraints, and black represents a normal relation. In the relation between students and majors, different attribute values are used to represent different meanings. For example, John’s first preference major is software engineering (SE), so the highest weight is assigned. The second preference, information security (IS), corresponds to the second-highest weight. The relation between students and activities indicates the degree to which a student is interested in an activity. The more students that are interested in the activity, the closer the weight is to 1, and vice versa.
3) When
The WHEN can be represented as
4) When Schema
The schema of the constructed WHEN shown in Figure 3(a) can be defined as
C. Stage 3: Metapath Selection
Inspired by the work in [38], we project the WHEN schema, a relatively complex and unusual network structure, into a series of ordered subnets with relatively simple and common structures (bipartite network and stellate network). This work effectively helps us to find more meaningful metapaths. We first explain how to project the WHEN schema, and then we present the extended metapath found from each subnet.
1) Projected Subnetwork
For a network schema, denoted as
2) Extended Metapath
The extended metapath [39] can be expressed as
Figure 3(c) shows partial metapaths selected from three projected subnetworks, including three weighted metapaths and three unweighted metapaths. Taking the weighted metapath in Figure 3(c) as an example, the scoring range between
D. Stage 4: When-Based Embedding
As shown in Figure 4, a novel graph embedding method is proposed. It is composed of three steps: (1) generate a random walk sequence based on the extended metapath, (2) learn the embedding representations of nodes according to the optimization objective, and (3) fuse node representations on multiple metapaths.
1) Random Walk Strategy Based on an Extended Metapath
Generally, random walks based on metapaths are used to generate the sequence of nodes in the WHEN. For a given unweighted HEN, i.e., \begin{align*} P(n_{t+1}=&x | n_{t}=v, \rho) \\=&\begin{cases} \dfrac {1}{\left |{N^{A_{t+1}}(v)}\right |}, & (v, x) \in E {~\text {and }} \varphi (x)=A_{t+1}\\ 0, & {~\text {otherwise }} \end{cases}\tag{2}\end{align*}
For a given WHEN denoted as \begin{align*} P(n_{t+1}=&x | n_{t}=v, \rho) \\=&\begin{cases} \dfrac {1}{\left |{N_{w_{t}}^{A_{t+1}}(v)}\right |}, & (v, x) \in E {, } \varphi (x)=A_{t+1} {, }\\ & w_{t}=w_{l-t} {, } w_{t} \in \delta _{t}\left ({R_{\mathrm {t}}}\right) \\ 0, & {~\text {otherwise }} \end{cases}\tag{3}\end{align*}
2) Optimization Objective
Ignoring weighted information, the network can be expressed as \begin{align*}&\hspace {-3.2pc}\arg \max _{\theta } \sum _{v \in V}~\sum _{t \in A} \sum _{c_{t} \in N_{t}(v)} \log \mathrm {p}\left ({c_{t} | v; \theta }\right) \tag{4}\\&\log \mathrm {p}\left ({c_{t} | v; \theta }\right)=\frac {e^{x_{c_{t}} \cdot x_{v}}}{\sum _{u \in V} e^{x_{u} \cdot x_{v}}}\tag{5}\end{align*}
In each iteration of optimization, all nodes are traversed, which leads to low efficiency of the whole model. Therefore, we refer to word2vec [27] and adopt the negative sampling method to update only a small part of the model weight in each sample training to reduce the calculation burden and improve the quality of node embeddings. Given a negative sample size \begin{align*}&\hspace {-2.2pc}\log \sigma \left ({X_{c_{t}} \cdot X_{v}}\right)+\sum _{m=1}^{M} E_{u^{m} \sim P(u)}\left [{\log \sigma \left ({-X_{u^{m}} \cdot X_{v}}\right)}\right] \tag{6}\\&\sigma (x)=\frac {1}{1+e^{-x}}\tag{7}\end{align*}
3) Embedding Fusion on Multiple Metapaths
After optimization, the same node from different input sequences is mapped into different vector spaces. In the directed graph, i.e., \begin{equation*} e_{v}=g\left ({\left \{{e_{v}^{(l)}}\right \}_{l=1}^{|P|}}\right)\tag{8}\end{equation*}
E. Stage 5: Personalized Recommendation
On the recommendation-oriented WHEN, user-item pairs are identified as student-major and student-activity. The recommendation task predicts the students’ ratings of majors and activities. Considering the good performance of the classic matrix factorization algorithm in the recommendation system, node embeddings are integrated into the matrix factorization algorithm, and the user’s item ratings are defined as formula (9):\begin{equation*} \widehat {r_{u, i}}=x_{u}^{T} \cdot y_{i}+\alpha \cdot e_{u}^{(U)^{T}} \cdot \gamma _{i}^{I}+\beta \cdot \gamma _{u}^{U^{T}} \cdot e_{i}^{(I)}\tag{9}\end{equation*}
Then, a specific fusion function \begin{equation*} g\left ({\left \{{e_{v}^{(l)}}\right \}}\right)=\sigma \left ({\sum _{l=1}^{|\mathcal {P}|} w_{v}^{(l)} \sigma \left ({\mathbf {M}^{(l)} e_{v}^{(l)}+\boldsymbol {b}^{(l)}}\right)}\right)\tag{10}\end{equation*}
Finally, the fusion function formula (10) is substituted into formula (9) to obtain the following objectives, and the stochastic gradient descent (SGD) method is adopted to train the parameters of the recommendation model:\begin{align*} \kappa=&\sum _{\left ({u, i, r_{u, i}}\right) \in \mathrm {R}}\left ({r_{u, i}-\widehat {r_{u, l}}}\right)^{2}+\lambda \sum _{u}\left ({\left \|{x_{u}}\right \|_{2}+\left \|{y_{i}}\right \|_{2}}\right. \\&\left.{+\left \|{\gamma _{u}^{U}}\right \|_{2}+\left \|{\gamma _{i}^{I}}\right \|_{2}+\left \|{\Theta ^{(U)}}\right \|_{2}+\left \|{\Theta ^{(I)}}\right \|_{2}}\right)\tag{11}\end{align*}
Experiment
A. Dataset
We collected information from 114 students to build the WHEN. More details about the dataset information are shown in Table 2. The WHEN consists of six types of nodes: 114 nodes for students, 35 nodes for activities, 4 nodes for majors, 5 nodes for activity types, 3 nodes for economic status and 20 nodes for academic performance. There are five different types of edges: 956 student-activity edges, 114 student-economy edges, 228 student-major edges, 570 student-performance edges and 35 activity-category edges. The dataset is divided into training and setting tests for the following experiments.
B. Evaluation Metrics
We use the RMSE, MAE, precision, recall and F1-score, which are defined in Equations (12-16), respectively, to evaluate the performance of our method.
Root Mean Square Error:
\begin{equation*} R M S E=\sqrt {\frac {1}{|T|} \sum _{(u, i) \in T}\left ({r_{u, i}-\widehat {r_{u, i}}}\right)^{2}}\tag{12}\end{equation*} View Source\begin{equation*} R M S E=\sqrt {\frac {1}{|T|} \sum _{(u, i) \in T}\left ({r_{u, i}-\widehat {r_{u, i}}}\right)^{2}}\tag{12}\end{equation*}
Mean Square Error:
where\begin{equation*} M A E=\frac {1}{|T|} \sum _{(u, i) \in T}\left |{r_{u, i}-\widehat {r_{u, i}}}\right |\tag{13}\end{equation*} View Source\begin{equation*} M A E=\frac {1}{|T|} \sum _{(u, i) \in T}\left |{r_{u, i}-\widehat {r_{u, i}}}\right |\tag{13}\end{equation*}
represents the training set,T represents the actual user rating, andr_{u,i} represents the prediction rating.\widehat {r_{u, i}} Precision:
\begin{equation*} \mathrm {P}=\frac {Relevant \; items}{Total \; recommended \; items}\tag{14}\end{equation*} View Source\begin{equation*} \mathrm {P}=\frac {Relevant \; items}{Total \; recommended \; items}\tag{14}\end{equation*}
Recall:
\begin{equation*} \mathrm {R}=\frac {Relevant \; items}{Total \; relevant \; items}\tag{15}\end{equation*} View Source\begin{equation*} \mathrm {R}=\frac {Relevant \; items}{Total \; relevant \; items}\tag{15}\end{equation*}
F1-Score:
\begin{equation*} \mathrm {\textit {F1}}=\frac {2PR}{P+R}\tag{16}\end{equation*} View Source\begin{equation*} \mathrm {\textit {F1}}=\frac {2PR}{P+R}\tag{16}\end{equation*}
C. Effectiveness Evaluation
In this section, we evaluate the effectiveness of our model from three aspects: metapaths, the graph embedding method and the recommendation method.
1) Metapaths
To validate the effectiveness of the constructed metapaths, a contrast experiment is designed to compare the activity recommendation performance by adding different metapaths one by one. We use precision, recall, and F1-score as metrics to evaluate recommendation performance.
In this experiment, SAS, SACAS, SMS_2, SMS_3, ACA, SES and SPS are fused in turn. During the fusion process, the changes in the precision, recall and F1-score curves are shown in Figure 5. As each metapath is incorporated, the precision of our method increases gradually, while the recall and F1-score fluctuate slightly. Among the 7 metapaths, there are 4 weighted metapaths SAS, SACAS, SMS_2 and SMS_3, corresponding to
2) Graph Embedding Method
In this section, we evaluate the performance of the graph embedding method compared with other methods. We used a WHEN-based embedding method to embed each node of the heterogeneous network into a low-dimensional vector space. Five other graph embedding methods for comparison are briefly introduced as follows:
Graph factorization (GF) [40]: The adjacency matrix is decomposed to obtain the low-dimensional dense representation of nodes.
DeepWalk [28]: A method of learning the representation of nodes in a network, which introduces deep learning into the field of network embedding.
Node2vec [29]: Based on DeepWalk, this method uses two biased random walks (i.e., BFS and DFS) to better explore neighborhoods.
LINE [30]: This method explicitly preserves both first-order and second-order proximities. It is suitable for a variety of networks, including directed, undirected, binary or weighted edges.
SDNE [41]: This method uses a deep learning model to capture the nonlinear relationship between nodes.
For the above methods, GF is a traditional method, while DeepWalk, node2vec and LINE belong to the shallow model, mainly using the random walk method. Our method also belongs to the shallow model. SDNE belongs to the deep model, which mainly adopts the deep neural network method. Different graph embedding methods are applied to two recommendation tasks, including student-activity recommendations and student-major recommendations.
As shown in Figure 6(a) and 6(b), the performance was compared in two aspects: student-activity and student-major recommendations. The detailed data are given in Table 3. Obviously, WHEN-based embedding performs better on both tasks than the baselines, ranging from GF to DeepWalk. In the student-activity recommendation task, compared with GF, our method improves RMSE by approximately 84% and MAE by approximately 82%, which is similar to DeepWalk. In the student-major recommendation task, compared with GF, our method improves RMSE and MAE by approximately 60% and 68%, respectively, while node2vec only improves RMSE and MAE by approximately 34% and 26%, respectively. DeepWalk and node2vec are more suitable for homogeneous networks, which contain only one type of node and edge. This means that these three methods ignore the semantics of metapaths shown in Table 1 when dealing with the WHEN. In contrast, our method is more interpretable. LINE and SDNE exploit the first-order proximity and second-order proximity to characterize the local and global network structure. Focusing on the similarity of the network structure, the above two methods cannot distinguish metapaths that have the same structure but contain different semantics, such as
Comparison of RMSE and MAE between WHEN-based embedding and baselines on two recommendation tasks. The data are presented in the form of a histogram for easy observation.
3) Recommendation Method
To evaluate the performance of our model, we use precision, recall, F1-score, RMSE and MAE as evaluation criteria. More details about the compared methods are shown as follows:
HERec [32]: A general heterogeneous network embedding-based approach for HIN based recommendation.
User-based CF [42]: User-based collaborative filtering: This algorithm recommends to the user items that other users with similar interests like.
Item-based CF [43]: Item-based collaborative filtering: This algorithm recommends items that are similar to the user’s previous favorite items.
SVD [44]: A later factor model for dimensionality reduction, which solves the matrix sparsity and improves the operational efficiency.
SVD++ [45]: An extension of SVD considering implicit ratings.
Detailed experimental data are given in Table 4. As shown in Figure 7, both user-based CF and item-based CF are classical recommendation algorithms. User-based CF has lower RMSE values and higher precision, recall and F1 score values than item-based CF. The WHEN is a student-centric network that describes similar users, leading user-based CF to outperform item-based CF. SVD and SVD++ are methods of matrix decomposition. The RMSE and MAE values of SVD++ are smaller than those of SVD, while SVD has a higher recall rate and F1 score than SVD++. Compared with SVD, the precision value of SVD++ is improved by 27%. This implies that implicit feedback can significantly improve the accuracy of recommendation results. HERec and our method are graph embedding-based recommendation methods. Our approach improves RMSE and MAE by 91% and 86%, respectively, which are better than HERec. The precision and recall value of our method are improved by 10 and 7 percentage points, respectively, which are also better than HERec. The F1-score value reaches 0.58, which is 17.76% higher than HERec. As mentioned previously, HERec is based on a heterogeneous information network. Its essence is to learn embeddings for users and items, while other types of objects are only used as a bridge to construct the homogeneous neighborhood. Therefore, HERec might lose some important information when building a homogeneous neighborhood. Moreover, our approach is to learn embeddings for different types of nodes directly from heterogeneous neighborhoods, which considers more heterogeneous information. Thus, the performance of matrix decomposition methods is better than that of collaborative filtering methods. In this way, graph embedding-based recommendation methods have the best performance compared to other methods. Compared with other methods, our method can achieve the highest precision, recall and F1-score values, reaching 85.13%, 44.44% and 58%, respectively. The precision value indicates that 85.13% of the activities recommended by the framework are in students’ preferences, or 14.87% are not in students’ preferences. The recall value indicates that 44.44% of the students’ preferred activities are in the recommended list, while 55.56% of their preferred activities are not. The F1-score is used to comprehensively consider precision and recall.
Recommendation method performance evaluation. The data are presented in the form of a histogram for easy observation.
In summary, our method is more suitable for the educational environment than other recommendation algorithms. By constructing the WHEN, we comprehensively considered various heterogeneous information for each student, including behaviors, interests and economic conditions. Through WHEN-based embedding, rich semantic information can be mined on the specific extended metapaths, which ultimately helps us improve the recommendation performance. Therefore, the new method proposed in this paper can not only help students find online learning resources by using the network according to the historical information of students but also help students who are not divided into majors choose their own majors. It is an effective method to help students fight COVID-19 by using advanced networking technologies.
Conclusion
In the context of COVID-19 sweeping the world, it has brought unprecedented inconvenience to new students’ thoughts and life. How to overcome the challenges is very meaningful. It has become a challenging topic to carry out personalized online training for freshmen and help them reasonably carry out professional diversion. Because of the flexibility of modeling heterogeneous information, this paper proposed a new framework that integrates educational heterogeneous information through the WHEN and extracts additional information by embedding based on the WHEN. Experiments show the effectiveness and reliability of the framework. The main idea of this framework is a graph embedding method based on the WHEN. This method is not only suitable for the WHEN but also suitable for other fields with attribute value constraints, such as movie rating networks. In the future, we will attempt to integrate words and images into the WHEN to obtain abundant educational information.