Processing math: 100%
A Novel Framework With Weighted Heterogeneous Educational Network Embedding for Personalized Freshmen Recommendation Under the Impact of COVID-19 Storm | IEEE Journals & Magazine | IEEE Xplore

A Novel Framework With Weighted Heterogeneous Educational Network Embedding for Personalized Freshmen Recommendation Under the Impact of COVID-19 Storm


Framework.

Abstract:

The global explosion of COVID-19 has brought unprecedented challenges to traditional higher education, especially for freshmen who have no major; they cannot determine wh...Show More

Abstract:

The global explosion of COVID-19 has brought unprecedented challenges to traditional higher education, especially for freshmen who have no major; they cannot determine what their real talents are. Thus, it is difficult for them to make correct choices based on their skills. Generally, existing methods mainly mine isomorphic information, ignoring relationships among heterogeneous information. Therefore, this paper proposes a new framework to give freshmen appropriate recommendations by mining heterogeneous educational information. This framework is composed of five stages: after data preprocessing, a weighted heterogeneous educational network (WHEN) is constructed according to heterogeneous information in student historical data. Then, the WHEN is projected into different subnets, on which metapaths are defined. Next, a WHEN-based embedding method is proposed, which helps mine the weighted heterogeneous information on multiple extended metapaths. Finally, with the information mined, a matrix factorization algorithm is used to recommend learning resources and majors for freshmen. A large number of experimental results show that the proposed framework can achieve better results than other baseline methods. This indicates that the proposed method is effective and can provide great help to freshmen during the COVID-19 storm.
Framework.
Published in: IEEE Access ( Volume: 9)
Page(s): 67129 - 67142
Date of Publication: 26 April 2021
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Recently, COVID-19 has broken out in more than 200 countries around the world, causing millions of people to become infected and die [1], and it has become a major global public health event. To make matters worse, this incident will continue for a long time as the number of infected people is still rising rapidly. Compared with other industries, due to the high density of people in traditional education, people are more likely to be infected by and spread this virus. Therefore, COVID-19 will have a greater impact on the traditional education model. To reduce cross-infection and transmission among teachers and students, governments have to close universities, which seriously affects the educational situation of hundreds of millions of students and the normal operation of schools in most countries worldwide [2]. The widespread of this virus has created a high demand for network learning. Students need timely guidance of ideology to overcome the difficulties caused by COVID-19. Therefore, how to use network technology to help freshmen resist COVID-19 and reduce its impact on education has become a new hot topic.

Moreover, the impact of COVID-19 will be disastrous for universities that implement enrollment reform without distinguishing majors; this model has been adopted by millions of students in many countries. Generally, such kinds of students need to receive 1–2 years of general education [3]. Then, they can gain a clear understanding of their majors and their strengths. After that, they understand how to choose their majors and learning materials. In this case, such freshmen have much more difficulty adapting to the impact of COVID-19 in the traditional teaching model. Taking China as an example, millions of students in most universities have adopted this new enrollment model. Therefore, how to quickly discover students’ talents through their historical data is very urgent.

Generally, with the wide application of learning management systems (LMSs), data mining or recommendation technology has become very popular. These technologies can help teachers and students obtain better learning feedback through the Internet [4], [5]. In particular, clustering, association rule mining and other methods can be used to analyze the data in online forums and learning management systems to obtain a highly interpretable student-performance model [6], [7]. Due to the specific purpose and function of traditional data mining algorithms, they are not suitable for direct application in the education field. This means that a preprocessing algorithm has to be used first and then some specific data mining methods can be applied to the problems. A. Dutt et al. [8] summarized the clustering algorithm used in data preprocessing in educational data mining (EDM), which provided the basis for applying a subsequent data mining algorithm. Compared with the active search of data mining, the passive push of recommendation algorithms is more suitable for freshmen who do not know enough about themselves. Some researchers have developed course recommendation systems using a collaborative filtering algorithm to facilitate students’ personalized learning experience [9]. Additionally, the recommendation algorithm has been successfully applied to student development and training [10].

To date, existing methods mainly mine isomorphic information, ignoring relationships among heterogeneous information in student history data. However, this heterogeneous information contains many student characteristics. Therefore, this paper proposes a personalized recommendation framework based on WHEN. The framework fully considers the differences among learning content, learning methods and learning environment. In this way, freshmen can quickly find their talents. This framework is mainly composed of the following parts: first, to effectively integrate heterogeneous information in the field of education, this paper proposes a WHEN; second, a series of semantically rich extension metapaths are defined in WHEN to realize multiple data mining tasks; third, a graph embedding method is used to learn the representation of individual students and recommended items, and a new random walk is proposed; and finally, it combines the learned representation and matrix decomposition algorithm to recommend learning resources and majors for freshmen.

The main innovations and contributions of this paper can be summarized as follows:

  1. In this paper, we propose a new framework that can help freshmen who are not divided into majors to identify their talents and then recommend suitable majors and learning materials. This will effectively reduce the impact of COVID-19 on students.

  2. This paper proposes a new WHEN structure, which fully considers the heterogeneity of data in student information. WHEN solves the problem that traditional methods can only mine isomorphic data.

  3. This paper proposes a WHEN-based embedding method guided by extended metapaths to uncover the structural and semantic information of WHEN. Moreover, a new random walk method based on extended metapaths is proposed, which contributes to learning a more effective embedding representation for nodes in WHEN.

The rest of this paper is organized as follows. Section 2 presents related work. The framework for personalized recommendation is discussed in Section 3. The experiments and results are presented in Section 4. Section 5 concludes this paper.

SECTION II.

Related Work

In this section, we summarize the relevant work from three aspects: general enrollment, educational data mining, and graph embedding-based recommendation.

A. General ENROLLMENT

Enrollment in large categories means that colleges and universities do not recruit students by specific majors but merge multiple similar majors and recruit students by one major category. From the perspective of the development of colleges and universities worldwide, many world-renowned universities, such as Yale University, Stanford University, and the University of Michigan, do not recruit students based on majors, which means that “general enrollment” is an inevitable trend in higher education. Though most universities in the United Kingdom require all students to select their major when they apply, Scotland allows flexibility in major selection during a student’s first and second year of study. At Sorbonne University in France, students of the Faculty of Science and Engineering choose a first-year survey program, with three to four science disciplines to ensure that students have a larger context in the sciences. At the end of the first year, students choose their major and their minor. In China, the talent training model of “general enrollment” is gradually being adopted by universities [11]. From 2001 to 2018, more than half of the traditional “Project 211” colleges and universities implemented general enrollment reform, with nearly half a million new students enrolled each year. The reform of the talent training mode in colleges, known as “broad enrollment and split training”, has become a part of deepening higher education reform [12], [13]. For the reform problems, the researchers analyzed the causes and identified solutions. Junyi Wan et al. [14] believed that scores should not be the only criteria for professional diversion, but multiple indicators should be set. Jiaojiao Li et al. [15] noted that in the general education stage, the proportion of compulsory and elective courses is not coordinated, so it is necessary to establish a more reasonable curriculum system. Chongyi Fan et al. [16] established a diversion mechanism in line with students’ personal interests to eliminate blindness in the diversion of computer students. Zhengqing Luo et al. [17] developed a professional diversion program to improve the quality of talent cultivation and promote the growth of professional capabilities.

It is becoming a new mode of enrollment reform in which students are not divided into majors. However, there is little research on how to help students improve their learning efficiency on the premise of reducing professional learning time and let them learn their strengths more accurately so that they can find suitable learning materials based on these strengths.

B. Educational Data Mining

Educational data mining is an interdisciplinary field of research using data mining techniques in educational settings, with the aim of better understanding how students learn to improve their academic level and explain educational phenomena [5]. Since the popularization of computers, many computer-based educational information systems have emerged, such as the student information system (SIS) and the learning management system (LMS). Those systems collect large quantities of student data. Most EDM work is based on historical data in LMSs [7], [18]. The commonly used methods in EDM include classification and regression, clustering, association rule mining [19], [20] and social network analysis [21]. Garcia et al. [19] proposed a collaborative recommendation method based on the information mined by association rules. Rabbany et al. [21] introduced the concept of social networks into EDM and used social network analysis technology to analyze student participation in online courses. The main EDM applications include student modeling [22], decision support systems [9], [23] and adaptive systems. Vialardi et al. [23] devised a method to predict student performance in a course and recommend the course in which they are most likely to be successful. O’Mahony et al. [9] used collaborative filtering recommendation algorithms to recommend online courses to students.

At present, most studies focus on the data mining task on course recommendation [24]–​[26]. For students who are not divided into majors, there is little research on how to tap their specialties and recommend suitable learning materials.

C. Graph Embedding-Based Recommendation

To learn a low-dimensional, real-valued and dense vector of a node in the graph, graph embedding is proposed as a new information modeling method. Inspired by word2vec [27], a model in the natural language processing field, DeepWalk [28], was first proposed to learn the feature representation of nodes in a homogeneous graph. Similarly, both node2vec [29] and LINE [30] are graph embedding methods used in homogeneous graphs. However, networks with many heterogeneous relationships are complex and diverse. Metapath2vec [31] was proposed to learn a node representation in an HIN, which makes it possible to apply the graph embedding method to different recommended tasks.

Researchers have put much effort into graph embedding-based recommendation, aiming to improve recommendation system performance with additional information. Chuan Shi et al. [32] proposed a recommendation method called HERec, which effectively improves the performance of recommendation systems and alleviates the cold start problem. However, HERec is a general-purpose recommendation method, which means that it lacks consideration for domain-specific information. There have been many studies on the application of graph embedding-based recommendations in specific scenarios. Xiao Ma et al. [33] proposed a recommendation method called HGRec to solve the recommendation problem of scientific papers. Xijun He et al. [34] proposed a patent technology transaction recommendation model (PSR-VEC), which effectively solves the inactive trade phenomenon of patent products in the market. Jianxing Zheng et al. [35] used heterogeneous information in social networks to fully explore users’ potential interests and behavioral motivations, which helps model users and provides personalized recommendation services through the GUP method. Hao Wu et al. [36] improved recommendation performance by using a hybrid network presentation learning method that learns effective information from user cotagging networks and social networks. Since the recommendation performance based on matrix factorization is affected by sparsity and scalability, XueJian Zhang et al. [37] proposed a recommendation method called ISRM_NE, which realizes top-N recommendation and improves system performance in terms of interpretability and scalability.

COVID-19 will have a significant impact on the education of freshmen who are not divided into majors. However, traditional methods cannot deal with this new situation because they do not consider the individual differences among students. However, it is meaningful to use the graph embedding method to help students learn materials and main suggestions. However, the method based on metapath2vec cannot solve the recommendation problem based on weighted heterogeneous information. Therefore, we will mainly study how to fully consider the personalized differences and needs of students through the optimization of heterogeneous information in the educational environment for a personalized recommendation.

SECTION III.

Proposed Framework

In Figure 1, it can be seen that the proposed framework consists of five stages: data preprocessing, WHEN construction, metapath selection, WHEN-based embedding and personalized recommendation.

  • Data preprocessing. A hierarchical questionnaire is designed to collect educational data. We extract 6 types of objects, 5 types of relations and 2 types of constraints from the noisy information.

  • WHEN construction. After describing the nodes and edges in detail, a formal definition of the WHEN and WHEN schema is given. Based on the above definition, the WHEN is constructed to integrate various student information.

  • Metapath selection. WHEN is projected into a set of subnetworks. Extended metapaths are selected from every subnetwork.

  • WHEN-based embedding. A new random walk strategy is proposed to sample each given metapath. The optimization objective of this method is to maximize the co-occurrence probability of nodes and their neighboring nodes in the sampled sequences. Multiple embeddings on the extended metapath are merged into one as the final result.

  • Personalized recommendation. Using the node representation obtained in the previous step, we optimize the performance of the matrix factorization algorithm.

FIGURE 1. - Framework.
FIGURE 1.

Framework.

A. Stage 1: Data Preprocessing

Using a questionnaire designed for 114 students, we obtain personal information and learning experience for each student. Personal information includes majors of interest, academic performance and financial status. Learning experience is composed of five categories: innovation, research, study, practice and social interaction. Each category reflects one ability. After preprocessing, the individual information of each student can be expressed as follows:\begin{align*} \left \{{\left \{{P,\! E,\!\left \{{M_{1}, \!\ldots \!, M_{k}}\right \}}\right \},\left \{{\left ({A_{1}, C_{1}}\right),\left ({A_{2}, C_{2}}\right), \!\ldots \!,\left ({A_{m}, C_{n}}\right)}\right \}}\right \} \\\tag{1}\end{align*}

View SourceRight-click on figure for MathML and additional features. where P is academic performance, E is economic state, and \{M_{1}, \ldots, M_{k} \} is an ordered set of majors of interest. The second set is the learning experience for this student. Each experience is represented by a binary group (A_{m} , C_{n} ), where A_{m} is the m^{th} activity and C_{n} is the n^{th} category of activity (n \in \{1,2,3,4,5\} ).

B. Stage 2: When Construction

To construct the WHEN, six kinds of objects are extracted from the processed data to represent six different types of nodes. In addition, the relationships among them are determined, which are represented as five types of edges, while two of them have attribute value constraints. As shown in Figure 2, each circle represents one type of node set. The color identification in the figure indicates an attribute value constraint relationship, while black indicates a normal relationship.

FIGURE 2. - WHEN instance.
FIGURE 2.

WHEN instance.

1) Node

As shown in Figure 2, the six objects are as follows: students (S), academic performance (P), economic conditions (E), preferred major (M), activity experience (A) and activity category (C). Academic performance (P) refers to the learning situation of a student in a specific course. The academic performance of a student is quantified by dividing the grades in each course. As shown in Figure 2, John’s performance in math class is average; that is, John’s score in math class is 0-70. This type of node helps us integrate information about the student’s academic performance into the network. Economic condition (E) refers to the monthly cost of living of a certain student. We delimit the range of monthly cost of living and indicate it with different nodes to include the student’s economic condition in the network. Major (M) refers to the major that most students prefer when majors are divided, activity (A) refers to the activities that students have participated in during the whole freshman year, and category (C) refers to all types of activities. From the perspective of student ability assessment, activities are divided into five categories: innovation, scientific research, study, practice and social interaction.

2) Edge

As shown in Figure 2, green is used to identify a relation with constraints, and black represents a normal relation. In the relation between students and majors, different attribute values are used to represent different meanings. For example, John’s first preference major is software engineering (SE), so the highest weight is assigned. The second preference, information security (IS), corresponds to the second-highest weight. The relation between students and activities indicates the degree to which a student is interested in an activity. The more students that are interested in the activity, the closer the weight is to 1, and vice versa.

3) When

The WHEN can be represented as G_{WHEN}=(V,E,M) , where V is the object set, E is the relation set, and M is the attribute value set on the relation. In the constructed WHEN, V=S \cup A \cup M \cup P \cup E \cup C and S , A , M , P , E and C are the object sets of six types: student, activity, major, academic performance, economic state and activity category, respectively. For a relation type R connecting object types X and Y , i.e., X \stackrel {R / R^{-1}}{\longleftrightarrow } Y , we can use E^{X \leftrightarrow Y} to denote its relation sets. Therefore, in the constructed WHEN, E=E^{S \leftrightarrow A} \cup E^{S \leftrightarrow M} \cup E^{S \leftrightarrow P} \cup E^{S \leftrightarrow E} \cup E^{A \leftrightarrow C} represents a heterogeneous set of relations in G_{WHEN} . In a recommendation system, we need to focus on user-item relations. In our recommendation framework, the user-item relation can be represented as S \stackrel {R / R^{-1}}{\longleftrightarrow } A and S \stackrel {R / R^{-1}}{\longleftrightarrow } M . We use \left \langle{ u,i }\right \rangle for user-item pairs and r_{\left \langle{ u,i }\right \rangle } for user-item ratings. Given a user-item pair \left \langle{ u,i }\right \rangle , r_{\left \langle{ u,i }\right \rangle } reflects how interested user u is in item i , and the higher r_{\left \langle{ u,i }\right \rangle } is, the more interesting it is. For example, in the constructed WHEN, discrete values are used to indicate students’ ratings for major and activity, with 3 for the most interesting major, 2 for the second most interesting major, and so on. Given {r_{\left \langle{ s,m }\right \rangle }}{\in M_{1}=\{1,2,3\}} , {r_{\left \langle{ s,a }\right \rangle }}{\in M_{2}=\{1\}} , then M=M_{1} \cup M_{2} represents the set of scoring records on different types of relations in the WHEN.

4) When Schema

The schema of the constructed WHEN shown in Figure 3(a) can be defined as S_{WHEN}=(A,R,W) , where A , R , and W represent sets of object types, relation types, and attribute value types, respectively. The schema S is a meta template for WHEN G=(V,E,M) with an object type mapping function \phi: V \rightarrow A , a relation type mapping function \psi: E \rightarrow R , and an attribute value type mapping function between functions \theta: M \rightarrow W . Specifically, A=\{S,A,M,P,E,C\} , S , A , M , P , E and C represent the six node types, R=\{R_{1},R_{2},R_{3},R_{4},R_{5}\} , R_{1} , R_{2} , R_{3} , R_{4} and R_{5} represent the five edge types between nodes of different types, and W=\{W_{1},W_{2}\} , W_{1} and W_{2} represent attribute value constraints on different types of weighted links. It is worth noting that if |W|=0 , then the network is called an unweighted HEN. A specific instance of a WHEN is shown in Figure 2.

FIGURE 3. - Metapath selection.
FIGURE 3.

Metapath selection.

C. Stage 3: Metapath Selection

Inspired by the work in [38], we project the WHEN schema, a relatively complex and unusual network structure, into a series of ordered subnets with relatively simple and common structures (bipartite network and stellate network). This work effectively helps us to find more meaningful metapaths. We first explain how to project the WHEN schema, and then we present the extended metapath found from each subnet.

1) Projected Subnetwork

For a network schema, denoted as S=(A,R) , its projected subnetwork schema [38] can be represented as S'=(A', R') , where A'\subset A , R{'}\subset R . We use P for pivotal-type nodes and S for supportive-type nodes. Nodes in S are associated with nodes in P . A' includes one pivotal type of node and several supportive types of nodes, and R' contains heterogeneous links between different nodes.

P-S is used to denote a projected subnetwork. As shown in Figure 3(b), by continuously selecting a sequence of pivotal nodes \{C,A,S\} , WHEN is projected into an ordered subnet set, represented as (1) C-\{A\} , (2) A-\{S,C\} , and (3) S-\{P,E,M,A\} . As shown in each subnet, we marked the pivotal-type nodes in red. From the perspective of network structure, subnet (1) is a bipartite graph structure, and subnet (3) is similar to the stellate network structure. It is worth mentioning that in network projection, a WHEN may be projected into a set of sequences of WHENs and unweighted HENs. For example, subnet (2) and subnet (3) are WHENs, and subnet (1) is an unweighted HEN. Next, we describe the process of selecting a metapath from each subnet.

2) Extended Metapath

The extended metapath [39] can be expressed as \rho: A_{1} \stackrel {\delta _{1}\left ({R_{1}}\right)}{\longleftrightarrow } A_{2} \stackrel {\delta _{2}\left ({R_{2}}\right)}{\longleftrightarrow } A_{3} \cdots \stackrel {\delta _{l-1}\left ({R_{1-1}}\right)}{\longleftrightarrow } A_{l} , which shows the attribute value constraints on the relations. If there is an attribute value on the relation, the attribute value function \delta \left ({R}\right) represents the range of the attribute value; otherwise, it is an empty set. If the attribute value function \delta \left ({R}\right) on a metapath is not empty, then the metapath is called a weighted metapath. If the attribute value functions \delta \left ({R}\right) on a metapath are an empty set, then the metapath is called an unweighted metapath.

Figure 3(c) shows partial metapaths selected from three projected subnetworks, including three weighted metapaths and three unweighted metapaths. Taking the weighted metapath in Figure 3(c) as an example, the scoring range between S and M is \{1,2,3\} , S_{1} \stackrel {3}{\leftrightarrow } M_{1} denotes that major M_{1} is the first choice of student S_{1} , S_{1} \stackrel {3}{\leftrightarrow } M_{1} \stackrel {3}{\leftrightarrow } S_{2} denotes that both student S_{1} and student S_{2} choose major M_{1} as their first choice. In Table 1, we show all the extended metapaths selected and explain what they mean.

TABLE 1 Extended Metapath and Its Corresponding Semantics
Table 1- 
Extended Metapath and Its Corresponding Semantics

D. Stage 4: When-Based Embedding

As shown in Figure 4, a novel graph embedding method is proposed. It is composed of three steps: (1) generate a random walk sequence based on the extended metapath, (2) learn the embedding representations of nodes according to the optimization objective, and (3) fuse node representations on multiple metapaths.

FIGURE 4. - WHEN-based embedding.
FIGURE 4.

WHEN-based embedding.

1) Random Walk Strategy Based on an Extended Metapath

Generally, random walks based on metapaths are used to generate the sequence of nodes in the WHEN. For a given unweighted HEN, i.e., G=(V,E) and a metapath, i.e., \rho: A_{1}\stackrel {R_{1}}{\longrightarrow }\cdots A_{t}\stackrel {R_{t}}{\longrightarrow }A_{t+1}\cdots \stackrel {R_{l-1}}{\longrightarrow }A_{l} , the random walk path is generated by formula (2):\begin{align*} P(n_{t+1}=&x | n_{t}=v, \rho) \\=&\begin{cases} \dfrac {1}{\left |{N^{A_{t+1}}(v)}\right |}, & (v, x) \in E {~\text {and }} \varphi (x)=A_{t+1}\\ 0, & {~\text {otherwise }} \end{cases}\tag{2}\end{align*}

View SourceRight-click on figure for MathML and additional features. where n_{t} is the t^{th} node in the sequence, the type of v is A_{t} , and N^{A_{t+1}} (v) is the first-order neighborhood for vertex v , which has the type A_{t+1} .

For a given WHEN denoted as G=(V,E,M) and a symmetrical extended metapath denoted as \rho: A_{1}\stackrel {R_{1}}{\longleftrightarrow }\cdots A_{t}\stackrel {R_{t}}{\longleftrightarrow }A_{t+1}\cdots A_{l-t}\stackrel {R_{l-t}}{\longleftrightarrow }A_{l-t+1}\cdots \stackrel {R_{l-1}}{\longleftrightarrow }A_{l} (A_{t+1}=A_{l-t}) , the random walk path can be generated by formula (3):\begin{align*} P(n_{t+1}=&x | n_{t}=v, \rho) \\=&\begin{cases} \dfrac {1}{\left |{N_{w_{t}}^{A_{t+1}}(v)}\right |}, & (v, x) \in E {, } \varphi (x)=A_{t+1} {, }\\ & w_{t}=w_{l-t} {, } w_{t} \in \delta _{t}\left ({R_{\mathrm {t}}}\right) \\ 0, & {~\text {otherwise }} \end{cases}\tag{3}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \delta _{t}\left ({R_{\mathrm {t}}}\right) is the set of attribute values on the path that connects A_{t} and A_{t+1} . w_{t} and w_{l-t} represent an attribute value on two symmetric relations A_{t}\stackrel {\delta _{t}\left ({R_{\mathrm {t}}}\right)}{\longleftrightarrow } A_{t+1} and A_{l-t}\stackrel {\delta _{l-t}\left ({R_{\mathrm {l-t}}}\right)}{\longleftrightarrow } A_{l-t+1} , respectively. N_{w_{t}}^{A_{t+1}}(v) is the first-order neighborhood for vertex v with the type A_{t+1} , and the attribute value of the relations between nodes in N_{w_{t}}^{A_{t+1}}(v) and vertex v is w_{t} . If the attribute value constraint function \delta \left ({R}\right) is an empty set, then the extended metapath degenerates into an unweighted metapath, and the random walk method degenerates into formula (2). If \delta \left ({R}\right) is not empty, then the random walk strategy can walk according to the specified weight in the method described by formula (3). The entire random walk process follows a predefined metapath pattern until the desired length is reached.

2) Optimization Objective

Ignoring weighted information, the network can be expressed as G=(V,E) . The corresponding meta template is S=(A,R) with the object type mapping function \phi: V \rightarrow A and link type mapping function \psi: E \rightarrow R . Following metapath2vec, to learn the feature representation of a given vertex v \in A , we maximize the heterogeneous context N_{t} (v) of the node by formula (4) and formula (5):\begin{align*}&\hspace {-3.2pc}\arg \max _{\theta } \sum _{v \in V}~\sum _{t \in A} \sum _{c_{t} \in N_{t}(v)} \log \mathrm {p}\left ({c_{t} | v; \theta }\right) \tag{4}\\&\log \mathrm {p}\left ({c_{t} | v; \theta }\right)=\frac {e^{x_{c_{t}} \cdot x_{v}}}{\sum _{u \in V} e^{x_{u} \cdot x_{v}}}\tag{5}\end{align*}

View SourceRight-click on figure for MathML and additional features. where N_{t} (v) represents the node set of type t in the first-order neighborhood of vertex v . \log \mathrm {p}\left ({c_{t} | v; \theta }\right) is commonly defined as a softmax function, i.e., formula (5), where X\in R^{|V|\times d} is a latent representation matrix. X_{v} is the v^{th} row of X , representing the embedding of vertex v . In our framework, a large number of heterogeneous node sequences are obtained through the random walk process described in formula (3). Figure 4 (1) shows part of the sequence generated during the experiment. We can learn the node embeddings to optimize formula (4).

In each iteration of optimization, all nodes are traversed, which leads to low efficiency of the whole model. Therefore, we refer to word2vec [27] and adopt the negative sampling method to update only a small part of the model weight in each sample training to reduce the calculation burden and improve the quality of node embeddings. Given a negative sample size M , the optimization objective can be updated to formula (6):\begin{align*}&\hspace {-2.2pc}\log \sigma \left ({X_{c_{t}} \cdot X_{v}}\right)+\sum _{m=1}^{M} E_{u^{m} \sim P(u)}\left [{\log \sigma \left ({-X_{u^{m}} \cdot X_{v}}\right)}\right] \tag{6}\\&\sigma (x)=\frac {1}{1+e^{-x}}\tag{7}\end{align*}

View SourceRight-click on figure for MathML and additional features. where P(u) is a predefined distribution from which a negative node is extracted M times.

3) Embedding Fusion on Multiple Metapaths

After optimization, the same node from different input sequences is mapped into different vector spaces. In the directed graph, i.e., G=(V,E) , there is a vertex v\in V . P represents the selected metapath set containing vertex v , while e_{v}^{(l)} denotes the embedding of vertex v on the l^{th} metapath. Thus, \{e_{v}^{(l)}\}_{l=1}^{|P|} represents the set of embeddings on each metapath for vertex v . To make full use of these node embeddings to enhance the recommendation performance, a fusion function g(x) is adopted to fuse the node embedding set into one representation:\begin{equation*} e_{v}=g\left ({\left \{{e_{v}^{(l)}}\right \}_{l=1}^{|P|}}\right)\tag{8}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where e_{v} is the final representation of vertex v , which contains much more information under multiple semantic metapaths. In the next section, a specific fusion function is defined. Figure 4 (3) shows the process of merging several node embedding sets into one final embedding.

E. Stage 5: Personalized Recommendation

On the recommendation-oriented WHEN, user-item pairs are identified as student-major and student-activity. The recommendation task predicts the students’ ratings of majors and activities. Considering the good performance of the classic matrix factorization algorithm in the recommendation system, node embeddings are integrated into the matrix factorization algorithm, and the user’s item ratings are defined as formula (9):\begin{equation*} \widehat {r_{u, i}}=x_{u}^{T} \cdot y_{i}+\alpha \cdot e_{u}^{(U)^{T}} \cdot \gamma _{i}^{I}+\beta \cdot \gamma _{u}^{U^{T}} \cdot e_{i}^{(I)}\tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where x_{u}\in R^{D} and y_{i}\in R^{D} represent the latent factors of user u and item i , respectively, e_{u}^{(U)} and e_{i}^{(I)} represent the fused embedding of user u and item i , respectively, \gamma _{i}^{I} and \gamma _{u}^{U} are the latent factors paired with embeddings e_{u}^{(U)} and e_{i}^{(I)} , respectively, and \alpha and \beta are the parameters that can be adjusted to better integrate the three polynomials.

Then, a specific fusion function g(x) is given to realize personalized nonlinear fusion:\begin{equation*} g\left ({\left \{{e_{v}^{(l)}}\right \}}\right)=\sigma \left ({\sum _{l=1}^{|\mathcal {P}|} w_{v}^{(l)} \sigma \left ({\mathbf {M}^{(l)} e_{v}^{(l)}+\boldsymbol {b}^{(l)}}\right)}\right)\tag{10}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \sigma (\cdot) is a sigmoid function. P is a metapath set containing vertex v , M^{(l)}\in R^{D\times d} is a transformation matrix on the l^{th} metapath, b^{(l)}\in R^{D} is a bias vector on the l^{th} metapath, and \omega _{v}^{(l)} is the weight of vertex v on the l^{th} metapath.

Finally, the fusion function formula (10) is substituted into formula (9) to obtain the following objectives, and the stochastic gradient descent (SGD) method is adopted to train the parameters of the recommendation model:\begin{align*} \kappa=&\sum _{\left ({u, i, r_{u, i}}\right) \in \mathrm {R}}\left ({r_{u, i}-\widehat {r_{u, l}}}\right)^{2}+\lambda \sum _{u}\left ({\left \|{x_{u}}\right \|_{2}+\left \|{y_{i}}\right \|_{2}}\right. \\&\left.{+\left \|{\gamma _{u}^{U}}\right \|_{2}+\left \|{\gamma _{i}^{I}}\right \|_{2}+\left \|{\Theta ^{(U)}}\right \|_{2}+\left \|{\Theta ^{(I)}}\right \|_{2}}\right)\tag{11}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \widehat {r_{u, i}} is the predictive value of formula (9), \lambda is a regularization parameter, and \Theta ^{(U)} and \Theta ^{(I)} are the parameters in embedding fusion functions corresponding to user u and item i , respectively.

SECTION IV.

Experiment

A. Dataset

We collected information from 114 students to build the WHEN. More details about the dataset information are shown in Table 2. The WHEN consists of six types of nodes: 114 nodes for students, 35 nodes for activities, 4 nodes for majors, 5 nodes for activity types, 3 nodes for economic status and 20 nodes for academic performance. There are five different types of edges: 956 student-activity edges, 114 student-economy edges, 228 student-major edges, 570 student-performance edges and 35 activity-category edges. The dataset is divided into training and setting tests for the following experiments.

TABLE 2 Dataset Information
Table 2- 
Dataset Information

B. Evaluation Metrics

We use the RMSE, MAE, precision, recall and F1-score, which are defined in Equations (12-16), respectively, to evaluate the performance of our method.

  • Root Mean Square Error:\begin{equation*} R M S E=\sqrt {\frac {1}{|T|} \sum _{(u, i) \in T}\left ({r_{u, i}-\widehat {r_{u, i}}}\right)^{2}}\tag{12}\end{equation*}

    View SourceRight-click on figure for MathML and additional features.

  • Mean Square Error:\begin{equation*} M A E=\frac {1}{|T|} \sum _{(u, i) \in T}\left |{r_{u, i}-\widehat {r_{u, i}}}\right |\tag{13}\end{equation*}

    View SourceRight-click on figure for MathML and additional features. where T represents the training set, r_{u,i} represents the actual user rating, and \widehat {r_{u, i}} represents the prediction rating.

  • Precision:\begin{equation*} \mathrm {P}=\frac {Relevant \; items}{Total \; recommended \; items}\tag{14}\end{equation*}

    View SourceRight-click on figure for MathML and additional features.

  • Recall:\begin{equation*} \mathrm {R}=\frac {Relevant \; items}{Total \; relevant \; items}\tag{15}\end{equation*}

    View SourceRight-click on figure for MathML and additional features.

  • F1-Score:\begin{equation*} \mathrm {\textit {F1}}=\frac {2PR}{P+R}\tag{16}\end{equation*}

    View SourceRight-click on figure for MathML and additional features.

where P represents the proportion of the relevant activities or majors in the total recommended activities or majors, and R represents the proportion of the relevant activities or majors in the total relevant activities or majors.

C. Effectiveness Evaluation

In this section, we evaluate the effectiveness of our model from three aspects: metapaths, the graph embedding method and the recommendation method.

1) Metapaths

To validate the effectiveness of the constructed metapaths, a contrast experiment is designed to compare the activity recommendation performance by adding different metapaths one by one. We use precision, recall, and F1-score as metrics to evaluate recommendation performance.

In this experiment, SAS, SACAS, SMS_2, SMS_3, ACA, SES and SPS are fused in turn. During the fusion process, the changes in the precision, recall and F1-score curves are shown in Figure 5. As each metapath is incorporated, the precision of our method increases gradually, while the recall and F1-score fluctuate slightly. Among the 7 metapaths, there are 4 weighted metapaths SAS, SACAS, SMS_2 and SMS_3, corresponding to S \stackrel {1}{\leftrightarrow } A \stackrel {1}{\leftrightarrow } S , S \stackrel {1}{\leftrightarrow } A \leftrightarrow C \leftrightarrow A \stackrel {1}{\leftrightarrow } S , S \stackrel {2}{\leftrightarrow } M \stackrel {2}{\leftrightarrow } S and S \stackrel {3}{\leftrightarrow } M \stackrel {3}{\leftrightarrow } S , respectively. In addition, there are 3 unweighted metapaths ACA, SES and SPS, corresponding to A \leftrightarrow C \leftrightarrow A , S \leftrightarrow E \leftrightarrow S and S \leftrightarrow P \leftrightarrow S , respectively. Focusing on the precision curve, it is obvious that the addition of each metapath can gradually improve the precision value. With a single metapath SAS, the precision value reaches 56.76%, the recall value reaches a maximum of 46.60% and the F1-score is 51.18%. After that, three weighted metapaths are incorporated in turn, increasing precision by 5.41, 4.06 and 4.05 percentage points, respectively. SCACS results in a slight decrease in recall, but the F1-score has a certain increase at the same time. SMS_2 causes a sharp decrease in the curve of recall and F1-score, while SMS_3 has the opposite effect. According to Table 1, SMS_3 and SMS_2 indicate that students have the same first and second preferences when choosing majors, respectively. For the strong contrast between the effects of SMS_2 and SMS_3 on the recall and F1-score curves, the possible explanation may be that students tend to focus more on primary choices of major rather than secondary choices, which results in SMS_2 containing more noisy information. With the incorporation of the other three unweighted metapaths, the precision value of the recommendation is further improved. ACA, SES and SPS improve precision by 6.76, 4.05 and 4.05 percentage points, respectively. ACA has the highest precision value among 7 incorporated metapaths, increasing the precision value from 70.17% to 77.03% while reducing the recall value from 41.70% to 38.80%, which ultimately results in a decrease in the F1-score from 52.34% to 51.61%. Although SES and SPS have relatively lower precision values, they both slightly improve the recall value and ultimately contribute to the improvement of the F1 score. SES increases the recall value from 38.80% to 39.27%, which in turn raises the F1 score from 51.61% to 52.91%. With the final metapath SPS incorporated, the precision reaches the maximum value of 85.13%, while the F1 score reaches the historical high of 58.39%, and the corresponding recall value is 44.44%. The experimental data show that SES and SPS have improved effects on all three values, while ACA has a greater improvement effect on precision. In our recommendation task, the most important thing is to recommend more accurate rather than comprehensive items because if students are not interested in the recommended activities, their time will be greatly wasted. Therefore, it is worthwhile to sacrifice the recall value in exchange for an increase in the precision value, with the F1 value showing an overall upward trend. Thus, experimental results demonstrate the effectiveness of the chosen metapaths.

FIGURE 5. - Metapath effectiveness.
FIGURE 5.

Metapath effectiveness.

2) Graph Embedding Method

In this section, we evaluate the performance of the graph embedding method compared with other methods. We used a WHEN-based embedding method to embed each node of the heterogeneous network into a low-dimensional vector space. Five other graph embedding methods for comparison are briefly introduced as follows:

  • Graph factorization (GF) [40]: The adjacency matrix is decomposed to obtain the low-dimensional dense representation of nodes.

  • DeepWalk [28]: A method of learning the representation of nodes in a network, which introduces deep learning into the field of network embedding.

  • Node2vec [29]: Based on DeepWalk, this method uses two biased random walks (i.e., BFS and DFS) to better explore neighborhoods.

  • LINE [30]: This method explicitly preserves both first-order and second-order proximities. It is suitable for a variety of networks, including directed, undirected, binary or weighted edges.

  • SDNE [41]: This method uses a deep learning model to capture the nonlinear relationship between nodes.

For the above methods, GF is a traditional method, while DeepWalk, node2vec and LINE belong to the shallow model, mainly using the random walk method. Our method also belongs to the shallow model. SDNE belongs to the deep model, which mainly adopts the deep neural network method. Different graph embedding methods are applied to two recommendation tasks, including student-activity recommendations and student-major recommendations.

As shown in Figure 6(a) and 6(b), the performance was compared in two aspects: student-activity and student-major recommendations. The detailed data are given in Table 3. Obviously, WHEN-based embedding performs better on both tasks than the baselines, ranging from GF to DeepWalk. In the student-activity recommendation task, compared with GF, our method improves RMSE by approximately 84% and MAE by approximately 82%, which is similar to DeepWalk. In the student-major recommendation task, compared with GF, our method improves RMSE and MAE by approximately 60% and 68%, respectively, while node2vec only improves RMSE and MAE by approximately 34% and 26%, respectively. DeepWalk and node2vec are more suitable for homogeneous networks, which contain only one type of node and edge. This means that these three methods ignore the semantics of metapaths shown in Table 1 when dealing with the WHEN. In contrast, our method is more interpretable. LINE and SDNE exploit the first-order proximity and second-order proximity to characterize the local and global network structure. Focusing on the similarity of the network structure, the above two methods cannot distinguish metapaths that have the same structure but contain different semantics, such as S \stackrel {3}{\leftrightarrow } M \stackrel {3}{\leftrightarrow } S and S \stackrel {3}{\leftrightarrow } M \stackrel {2}{\leftrightarrow } S . Taking full account of the weight information in WHEN, our method achieves better RMSE and MAE values than the baselines. Experimental results show that the graph embedding method used in our framework is effective.

TABLE 3 Comparison Between WHEN-Based Embedding and Baselines on Two Recommendation Tasks. For Ease of Reading the Results, We Highlighted the Best Results of the Two Values of RMSE and MAE for Each Task in Bold
Table 3- 
Comparison Between WHEN-Based Embedding and Baselines on Two Recommendation Tasks. For Ease of Reading the Results, We Highlighted the Best Results of the Two Values of RMSE and MAE for Each Task in Bold
FIGURE 6. - Comparison of RMSE and MAE between WHEN-based embedding and baselines on two recommendation tasks. The data are presented in the form of a histogram for easy observation.
FIGURE 6.

Comparison of RMSE and MAE between WHEN-based embedding and baselines on two recommendation tasks. The data are presented in the form of a histogram for easy observation.

3) Recommendation Method

To evaluate the performance of our model, we use precision, recall, F1-score, RMSE and MAE as evaluation criteria. More details about the compared methods are shown as follows:

  • HERec [32]: A general heterogeneous network embedding-based approach for HIN based recommendation.

  • User-based CF [42]: User-based collaborative filtering: This algorithm recommends to the user items that other users with similar interests like.

  • Item-based CF [43]: Item-based collaborative filtering: This algorithm recommends items that are similar to the user’s previous favorite items.

  • SVD [44]: A later factor model for dimensionality reduction, which solves the matrix sparsity and improves the operational efficiency.

  • SVD++ [45]: An extension of SVD considering implicit ratings.

Detailed experimental data are given in Table 4. As shown in Figure 7, both user-based CF and item-based CF are classical recommendation algorithms. User-based CF has lower RMSE values and higher precision, recall and F1 score values than item-based CF. The WHEN is a student-centric network that describes similar users, leading user-based CF to outperform item-based CF. SVD and SVD++ are methods of matrix decomposition. The RMSE and MAE values of SVD++ are smaller than those of SVD, while SVD has a higher recall rate and F1 score than SVD++. Compared with SVD, the precision value of SVD++ is improved by 27%. This implies that implicit feedback can significantly improve the accuracy of recommendation results. HERec and our method are graph embedding-based recommendation methods. Our approach improves RMSE and MAE by 91% and 86%, respectively, which are better than HERec. The precision and recall value of our method are improved by 10 and 7 percentage points, respectively, which are also better than HERec. The F1-score value reaches 0.58, which is 17.76% higher than HERec. As mentioned previously, HERec is based on a heterogeneous information network. Its essence is to learn embeddings for users and items, while other types of objects are only used as a bridge to construct the homogeneous neighborhood. Therefore, HERec might lose some important information when building a homogeneous neighborhood. Moreover, our approach is to learn embeddings for different types of nodes directly from heterogeneous neighborhoods, which considers more heterogeneous information. Thus, the performance of matrix decomposition methods is better than that of collaborative filtering methods. In this way, graph embedding-based recommendation methods have the best performance compared to other methods. Compared with other methods, our method can achieve the highest precision, recall and F1-score values, reaching 85.13%, 44.44% and 58%, respectively. The precision value indicates that 85.13% of the activities recommended by the framework are in students’ preferences, or 14.87% are not in students’ preferences. The recall value indicates that 44.44% of the students’ preferred activities are in the recommended list, while 55.56% of their preferred activities are not. The F1-score is used to comprehensively consider precision and recall.

TABLE 4 Recommendation Method Performance Evaluation. The Lower MAE and RMSE and the Higher Precision, Recall and F1-Score Indicate Better Performance. The Best Results are Shown in Bold to Better Observe the Experimental Results
Table 4- 
Recommendation Method Performance Evaluation. The Lower MAE and RMSE and the Higher Precision, Recall and F1-Score Indicate Better Performance. The Best Results are Shown in Bold to Better Observe the Experimental Results
FIGURE 7. - Recommendation method performance evaluation. The data are presented in the form of a histogram for easy observation.
FIGURE 7.

Recommendation method performance evaluation. The data are presented in the form of a histogram for easy observation.

In summary, our method is more suitable for the educational environment than other recommendation algorithms. By constructing the WHEN, we comprehensively considered various heterogeneous information for each student, including behaviors, interests and economic conditions. Through WHEN-based embedding, rich semantic information can be mined on the specific extended metapaths, which ultimately helps us improve the recommendation performance. Therefore, the new method proposed in this paper can not only help students find online learning resources by using the network according to the historical information of students but also help students who are not divided into majors choose their own majors. It is an effective method to help students fight COVID-19 by using advanced networking technologies.

SECTION V.

Conclusion

In the context of COVID-19 sweeping the world, it has brought unprecedented inconvenience to new students’ thoughts and life. How to overcome the challenges is very meaningful. It has become a challenging topic to carry out personalized online training for freshmen and help them reasonably carry out professional diversion. Because of the flexibility of modeling heterogeneous information, this paper proposed a new framework that integrates educational heterogeneous information through the WHEN and extracts additional information by embedding based on the WHEN. Experiments show the effectiveness and reliability of the framework. The main idea of this framework is a graph embedding method based on the WHEN. This method is not only suitable for the WHEN but also suitable for other fields with attribute value constraints, such as movie rating networks. In the future, we will attempt to integrate words and images into the WHEN to obtain abundant educational information.

References

References is not available for this document.