1. Introduction
Education is the fundamental right of every citizen which leads to the development of a country [
1]. In Pakistan, to provide better quality higher education to the future generation, the Higher Education Commission (HEC) was created by the government of Pakistan in 2002 [
2]. Educational sectors of Pakistan have been looking forward to countering the novel challenges emerging in the way of achieving precision education [
3]. Cook, Kilgus and Burns [
4] pointed out that precision education is “a tactic to investigate and practice which is concerned with adapting preventive and interposition practices to individuals on the basis of best accessible evidence”.
In the achievement of precision education, platforms of digital learning play an essential role in the collection of student educational data along with various types of interactions: their performance, learning pattern, and behavior [
5], etc. As regards obtaining a higher education specifically at the university level, precision education has become a prime concern due to many reasons. Therefore, for achieving precision education, the improvement of the literacy rate has become essential.
As there is a huge amount of data on students available, there is a need to utilize these data for some valuable purpose. Data mining (DM) can justify this necessity by providing techniques to explore unseen facts and figures in students’ information [
6]. Two areas have been identified by Simens and Long [
7] for utilizing such a bulk amount of educational data [
8] gathered through digital platforms for learning as: learning analytics (LA) and educational data mining (EDM). Therefore, inspecting the subgroups of students, their attitude toward study, and their online learning pattern has drawn attention from EDM and LA-related research communities.
Educational data mining (EDM, hereafter) is delineated as the field of systematic investigation positioned over the progression of approaches aimed at the production of potential discoveries inside the unusual classes of data coming from educational settings, and later utilizing such procedures and methods to search through what means the students perform within different learning environments [
9]. Some examples of specific fields where EDM is seeing widespread use are shown in
Figure 1. Computer-based education, deep learning, computer science, learning analytics, statistics, and pattern recognition are the fields highlighted.
EDM has been utilizing DM methods for pattern mining for quite some time. Applying DM methods to the study of students’ conduct yields effective results by allowing educators to foresee the likelihood of student attrition [
10].
Figure 2 depicts the process through which DM is implemented in the academic setting. It explains how advisors are tasked with laying out the blueprints for the entire curriculum. Later, students interact with the plan, shared with them by their mentors. Subsequently, by applying certain DM algorithms to this educational data, unknown facts and figures are mined by giving useful recommendations about students. In traditional as well as online learning, DM techniques are being deployed for obtaining beneficial results.
Education institutions in the 21st century are inevitably going to move towards offering more courses online [
11]. In the 1990s, with the advent of the Internet and the World Wide Web (WWW), online education began taking place [
12]. Education delivery and learning models are changing [
13] as a result of the ongoing development of information technology [
14]. According to a report published by the U.S. Department of Education, which compared the results of the study conducted in a traditional classroom setting with those obtained in an online setting, the latter produced results that were either superior to or comparable to an equal level with those obtained through the former [
15]. Most of the digital learning platforms (DLP) available online that contribute to facilitating online education include Massive Open Online Course (MOOC), Google Meet, Google classroom, Small Private Online Course (SPOC), Learning Management System (LMS), and Zoom [
16].
Online learning has enabled students to obtain quality education in any place, eliminating the time barrier and communication gap between the educator and the pupil [
17]. The COVID-19 epidemic has reformed the whole world, though the influence and practice of online education milieus have significantly increased [
18]. Some studies reported positive responses [
19]; however, others stated negative attitudes [
20] of students concerning online education [
21] during the COVID-19 period. A report was published regarding the students of Pakistan, which stated that 77% of the students were having negative opinions and 84% were having reduced teacher–student communication regarding online learning during the pandemic [
22]. Due to this gap, problems such as poor internet connection, and the lack of learners’ interest in studies have emerged side by side. Thus, this sudden transition of education toward online learning has posed many challenges for the learner and teacher as well [
23]. In this study, we consider several research questions:
Some limitations of the literature that are present in the majority of current studies are outlined below:
Recent research has focused extensively on the importance of customizing such models in relation to individual courses.
Making models for each individual course is inefficient due to the overhead of maintaining multiple copies of each model. Therefore, a generic model is necessary.
The scalability problem has also been identified as the smaller number of attributes considered in previous studies.
Existing research has never used hybrid models to obtain precision education, which is essential for predicting students’ academic outcomes with superior accuracy.
Due to a lack of data samples necessary for precise prediction, the models used in prior studies tended to overfit the data they were given.
The significant contributions made by this study are briefly summarized below.
The proposed work has developed a model that is generic and performs well in predicting learners’ outcomes in online learning for the period of COVID-19 by considering various features that are not course-dependent.
The proposed study has used a hybrid ensemble model of machine learning considering different weak learners of supervised machine learning (SVM, logistic regression, KNN, naïve Bayes, and decision tree) for training to build a robust and efficient model.
The large dataset was collected through a survey filled out by university students of Pakistan, primarily students of bachelor’s, master’s, and PhD study levels, in order to develop a portable model considering sufficient data samples.
Three Meta-heuristic algorithms, PSO, HHO, and HGSO, for feature selection and one classifier VAE for feature extraction have been used to obtain the potential attributes that place a strong influence in making valid predictions.
Enhanced accuracy has been accomplished using the hybrid ensemble model of machine learning, predicting the performance of students involved in advanced studies and achieving precision education as well.
2. Literature Review
The COVID-19 pandemic has affected the education sector worldwide. A recent study [
24] proposed work for the analysis of certain factors that potentially contribute to the prediction of students’ satisfaction with an electronic means of obtaining education during the COVID-19 phase. In addition, it also contributed to finding how the utilization of various DM techniques assists in finding the utmost appropriate attributes that have a certain influence on student performance. This study provided an e-learning model of classification for the in-depth examination of students. The dataset used for this study was from three schools’ students in Iraq. The dataset was collected through a survey. A total of 1120 responses were collected, 1000 of which were utilized in this study after pre-processing. This questionnaire consisted of three parts: demographic information, feasibility and effectiveness of e-learning platforms, and student satisfaction with e-learning tools. In total, 35 potential attributes of the dataset were taken as a base to predict the performance of school students through the period of COVID-19. For analysis, the WEKA tool was used for this research. After the pre-processing of the data, classification algorithms were applied to train the model for student performance prediction to obtain the intended output. Later, in the second phase, the trained model was applied to make predictions on the dataset of students. Classification regressors of DT utilized for this study included random tree, decision tree, random forest, naïve Bayes, bagging, REP tree and KNN. The model successfully predicted the performance of the students. The highest accuracy achieved by the model through KNN was observed to be 96.8%.
To determine the influence of COVID-19 on the psychological well-being of learners during the lockdown period, the study has highlighted the importance and use of online tools and digital technologies during the COVID-19 period [
25]. It has scrutinized the influence of physical distancing, quarantine, and seclusion on college students’ psychological and mental health. The author has performed a SWOT analysis to highlight the challenges encountered by students in online teaching throughout COVID-19. This research work has used the online questionnaire to acquire data from students of Arab countries considering various attributes, i.e., their study patterns, sleep habits, psychological state, demography, etc. The total number of responses used was 1766 in number. After applying pre-processing steps to the collected data, the model training was completed. This study has utilized various classifiers of machine learning to build a model for making a prediction about students. Algorithms used for this study have predicted the real influence of online knowledge acquiring tools before and after the period of COVID-19. A 70 to 30 ratio was applied for training and testing, respectively. Chi-squared and ANOVA tests were used for validating the efficiency of the model. This study has concluded that there exists a positive relationship between online learning and student performance during it.
Studying how the student satisfaction level has been affected by online teaching through the COVID-19 period research has been conducted [
26]. This paper contributed to predicting the academic performance of students to find out how the effectiveness of online learning (OL) systems can be enhanced. For the determination and extraction of the information related to student satisfaction levels and online learning during COVID-19, the study has proposed a real-time dataset. The dataset was gathered through an online questionnaire filled out by the students of seven educational institutions in Egypt for the academic study year of 2021–2022. The dataset holds the reviews of students regarding OL. The total of the responses used for building the model comprised 18,691 responses containing 20 features. The dataset was then pre-processed to eliminate erroneous data. For selecting the best attributes, 11 diverse meta-heuristic algorithms were applied to fetch the best feature out of the dataset. Later, on the dataset taken from Kafrelsheikh University and Mansoura University in Mansoura, Egypt was trained using two classifiers of machine learning: Support Vector Machine (SVM) and k-NN. For conducting the whole experiment, Python was cast off. Certain performance metrics were applied for evaluating model performance. The resultant precision observed was 100%, proving that the model is sufficiently robust.
Identification of student learning behavior in in-class learning courses during COVID-19 was performed in [
27]. This study focused on tracking how various behaviors of learning affect the performance of students. This research work was directed towards a small population of students. The dataset used for this study was assembled through a survey of undergraduate students of mechanical engineering. Student response was collected via mobile app. A total of 133 responses were considered to hold the data for four different sections. The dataset was split between a ratio of 30% and 70% for testing and training, respectively. The dataset comprised student information regarding their class attendance, class participation, etc. One of the most important factors that dropped out was homework, which was not considered in this study. Later, these data collected from students were pre-processed, during which the grades of the students were converted to letters. Then, the SMOTE technique was used to balance the sampled dataset. The model was trained on various machine-learning classifiers which included support vector machine, decision tree, logistic regression, ensemble learning, random forest, and k-nearest neighbors. A small dataset was considered for the training model using 10-fold cross-validation technique for the detection of overfitting. Moreover, the grid search technique was used to optimize the performance of each used machine learning classifier. Ensemble learning showed an outclass performance with 84% accuracy as compared to other classifiers.
A study was proposed [
28] for the improvement of the online learning effect on students’ learning performance by providing them with timely personalized feedback to keep them safe from the risk of dropping out. The study has contributed to the prediction of students’ learning performance in online education. For that purpose, the study has proposed a deep learning model known as PT-GRU. For conducting this study, two online datasets were utilized: ZJOOC and WorldUC. These datasets comprised students’ data regarding online courses. The number of participants considered was 62 in total, who were enrolled in a Chinese university. Each course comprised 10 lessons. In total, 259 records were taken from ZJOOC and 7543 records from the WorldUC, splitting both datasets into 20% and 80% ratios for testing and training of the model for providing personalized feedback to students. To conduct this study on the two datasets considered, four classifiers were used. Of these, two classifiers of machine learning, decision tree and random forest, and two classifiers of deep learning model were used. Then, a quasi-experiment was conducted using the PT-GRU model. The highest accuracy achieved by the GRU in the ZJOOC dataset was 71.15% and on the other dataset, the highest accuracy was 81.44%, achieved through LSTM. The results proved that it has successfully provided personalized feedback to the university students.
Crucial factors [
29] have been identified which influence the performance of university students, in addition to the effect of them using social media during the pandemic period of COVID-19. In this study, the theory of constructivism was utilized and established with constructs that were linked with the increased use of social media for collaborative learning and the interaction of students during the pandemic for online learning. For this research, the dataset was collected through an online questionnaire from higher education students in Saudi Arabia. The questionnaire consisted mainly of 27 questions and each of the variables was graded between 1–5 on a five-point Likert scale. In total, 491 responses were received from students. After pre-processing, these responses were reduced to 480 due to the removal of erroneous data from them. Out of 27 questions, 4 questions were used to analyze online learning, 6 were used for analyzing the interaction of students with their mentors and peers, 4 were used for predicting the performance of students, and the remaining 4 were used to assess student satisfaction during the pandemic. For conducting this research, structural equation modelling (SEM) was used to analyze the dataset to discover the relationship between the dependent and independent variables considered for this research study. Later, for the model validation, three types of goodness-of-fit metrics were applied. The results of the study revealed a positive relationship between the following variables: student learning, the satisfaction of students, and the interaction of learners with mentors and peers.
An automated system was built through a recent proposed study [
30] that could carry out the prediction of students’ grades in online education based on the availability of the performance data of learners, all throughout the COVID-19 pandemic. To perform this study, the dataset of students was considered for the period from 2006–2017 to predict the grades of students. The total number of records used was 1000 for undergraduate and graduate students for 15 different courses. The IITR-APE dataset was used considering various parameters as a base for accurate performance prediction. Firstly, this dataset was pre-processed to remove outliers. Further, the variational auto-encoder technique was applied to obtain the most potential features out of the dataset. Later, the extracted features made some predictions about grades. The classifiers used in this research included random forest, linear regression, XGBoost, extra tree, multi-layer perceptron and KNN. Later, the model was tested using mean absolute error (MAE), R2-score, root-mean-squared error (RMSE), and mean squared error (MSE). The results revealed that deep learning models are best for making accurate predictions. Out of all the applied classifiers, the outcomes proved that the extra tree classifier achieved outclass results of 0.720 R2, 5.943 for MAE, 77.709 for MSE, and 8.781 for RMSE.
Different copying patterns were detected [
31] that were faced by undergraduate and graduate students while obtaining online education through virtual classrooms that have caused various types of anxiety and stress in their student life. To perform this research study, the dataset of students was collected through a questionnaire via Qualtrics from the postsecondary institutions of the US. A total of 517 responses were used. Of these responses, 423 were filled by females, 91 were filled by males, and 3 were those who reported their gender as non-binary. This dataset was collected between May and July 2020 when COVID-19 was at its peak. A total of 25 questions were asked through a survey to the students using the Likert scale. Then, for the extraction of the best features out of the dataset, the principal axis factoring technique was used. This study has used the technique of association rule mining for the first time to transform the data into the framework of market basket analysis to mine useful patterns of students. For the implementation of the model used in this study, the advanced version of the data mining Apriori algorithm was used, which is known as the “FP-growth classifier”. Then, support for each item in the dataset was found. The dataset used in this study was scanned twice. After the construction of the FP-tree divide and conquer technique, the FP-growth classifier was used to mine the items. The resultant outcome produced 78 and 14 strong “association rules” for the groups of graduate and undergraduate, respectively, collected through the dataset. Thus, the study proved that undergraduate students were more consistent during online learning throughout the pandemic period.
Use of digital platforms was traced through a study [
32] to identify the regulatory factors for online education throughout the COVID-19 pandemic. Furthermore, a total of four datasets were utilized taking the 589 students’ data from “X-University” and software of Microsoft Teams. Dataset-I comprised mainly seven courses and six attributes were considered from the 589 instances. Dataset-II consisted of five courses, eight attributes, and a total of 259 records of students. Dataset-III consisted of 4 subjects and 12 attributes with a record number of students, 280. Similarly, for dataset IV, only 2 subjects, 10 potential attributes, and 91 records of students were taken. To perform the proposed work, the decision tree (J48) classifier via a 10-fold cross-validation approach was used considering two dominating factors which included “Mid-term” and “Final-term”, from which Final-term was taken as the root node. Later, the classifier was used to make and define the set of rules for each of the considered datasets. During this process of mining hidden patterns from student data, it became evident that only three-to-four potential attributes are enough to make a valid classification of information. The results proved that the potential attributes were “Mid-term” and “Final-term” and the remaining attributes considered did not have much impact on learners in pandemic period.
Work related to the identification of learning patterns and the behavior of students via the Ebook system to mine useful patterns from the students’ data making useful predictions and achieving precision education as the major goal was conducted in [
33]. The dataset used for proceeding with the research was collected from undergraduate university students. The dataset comprised only one single course of “Accounting Information Systems”, with 113 entities. BookRoll was used for facilitating the Ebook system used by students. To identify the behavior of students’ learning, various indicators were considered from the data. Further, the collected data and the extracted indicators were normalized to a numerical value between 0 and 1. Later, for the identification of learning patterns, the agglomerative hierarchical clustering technique was applied. The diversification in the divided subgroups based on four indicators was then verified through the Mann–Whitney U and Kruskal–Wallis tests. The study revealed that the comprehensive learning approach was found successful in the prediction of students’ behavior. Below,
Table 1 precisely describes the work of some recent studies, their contribution, techniques used, their results, and limitations.
5. Conclusions and Future Recommendations
Acquiring a higher graduation rate through the provision of a superior education has necessitated the adoption of more precise methods of teaching. To accomplish this, certain precautions are required to be taken to ensure that students are performing well in their studies. Multiple data mining techniques can be used to uncover instructive patterns that can be implemented in the classroom. Additionally, ML regressors contribute to the success of precise education.
The proposed study has used the methods of machine learning to achieve precision education in online knowledge acquisition, specifically throughout the COVID-19 period, within Pakistan. Many recent studies have developed models for the task of student performance prediction; however, in some ways, these models lack generalizability, and the number of selected features is too small, which makes the model overfit due to a smaller number of data considered.
To resolve the aforementioned issues, the current study has used an online questionnaire to collect data from Pakistani students. A total of 12,000 responses were collected and 10,000 were utilized for training the model. Using these data, feature selection and extraction were performed. Three meta-heuristic classifiers of selection and one classifier for data extraction were applied. Of the total data, 25 data attributes were selected which have some potential influence on the performance of students. This dataset was split into the ratio of 70% and 30% for training and testing purposes. Firstly, this research work has utilized a learners’ dataset for training, testing, and validation phases for building model on the individual classifiers. Classifiers include DT, NB, SVM, KNN, and LR. SVM has given the highest accuracy of 87.5%. Later, a hybrid ensemble learning model consisting of ML regressors was applied. After training, the output accuracy increased to 98.6% with a minimal error rate of 1.4% achieving precision education. Therefore, this study has aided the advancement of a generalized model proficient in envisaging the learner’s performance in academia during online education gaining process in COVID-19, which helped provide early interventions to weak students, maximizing the pass–out ratio. Some limitations of the present study include the fact that, although this study has tried to consider a large dataset, it still lacks many other diverse fields which could be considered to make the model more generalizable. Moreover, the study has applied ML classifiers only. It could use other classifiers of DM as well to check whether they could improve the efficiency of model or not.
In the future, the proposed work can be extended as follows:
To achieve better accuracy rate in precision education in higher education for the post-COVID-19 period.
To increase the dataset and the number of attributes, as well as to improve model performance.
Extending the proposed work by applying several other classifiers in the hybrid ensemble learning model.
Considering vast academic fields for training models to make it more general in providing diverse feedback to students.
To compare students from developed and developing countries after the COVID-19 pandemic, this research could be enhanced to evaluate the students’ performance.
This research can be extended to several other countries as well by considering their datasets and applying deep learning models for analysis.
Future development should also focus on both the asynchronous and synchronous pedagogical approaches across a wide range of educational disciplines.