Abstract
Our emotional, psychological, and social well-being are all parts of our mental health. An individual’s routine can be disrupted and their mental is health affected by stress, despair, and anxiety. Mental health preservation and restoration are essential for each person as well as for communities and society as a whole. The COVID-19 pandemic has triggered a strong emotional and psychological reaction in many people, in addition to triggering a global health emergency. The pandemic’s uncertainty, disruptions, and social changes have amplified stress, fear, and depression, which are common responses to crises. Data collection for the COVID-19-related depression and anxiety assessment was limited to online methods because of the ongoing pandemic. In the field of mental health evaluation, the application of machine learning techniques has emerged as a promising strategy for identifying and grasping anxiety and depression symptoms. This paper employed K-Nearest Neighbors (kNN), Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM) techniques on the prevalence of anxiety and depression among Bangladeshi university students during the COVID-19 pandemic. This paper addresses how the accuracy of predictions made by various machine learning models is affected by the size of the datasets. The findings of this study illuminate the scalability and generalizability of different machine-learning methods. The findings validate that how accuracy of the models has consistently and significantly improved as the dataset size varies. The performance of classification models is further assessed using the F1 score, precision, and recall.




Similar content being viewed by others
Data availability
Dataset-1: https://doi.org/10.7910/DVN/N5BUJR (Islam et al., 2020b)
Dataset-2: https://doi.org/10.7910/DVN/FCDGEB (Qasrawi et al., 2022a, b)
Dataset-3: https://doi.org/10.5522/04/20183858 (Carollo et al., 2022a)
References
Bansal, G., Rajgopal, K., Chamola, V., Xiong, Z., & Niyato, D. (2022). Healthcare in Metaverse: A Survey on current metaverse applications in Healthcare. Ieee Access: Practical Innovations, Open Solutions, 10, 119914–119946. https://doi.org/10.1109/ACCESS.2022.3219845
Ben-David, A. (2007). A lot of randomness is hiding in accuracy. Engineering Applications of Artificial Intelligence, 20(7), 875–885. https://doi.org/10.1016/j.engappai.2007.01.001
Breck, E., Polyzotis, N., Roy, S., Whang, S., & Zinkevich, M. (2019). Data validation for machine learning. MLSys.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Brogle, S. E., Kerksieck, P., Bauer, G. F., & Morstatt, A. I. (2024). Managing boundaries for well-being: A study of work-nonwork balance crafting during the COVID-19 pandemic. Current Psychology. https://doi.org/10.1007/s12144-024-06118-x
Callahan, A., & Shah, N. H. (2017). Machine learning in healthcare. In Key advances in clinical informatics (pp. 279–291). Elsevier. https://doi.org/10.1016/B978-0-12-809523-2.00019-4
Carollo, A., Bizzego, A., Gabrieli, G., Wong, K. K. Y., Raine, A., & Esposito, G. (2022a). Dataset-self-perceived loneliness and depression during the COVID-19 Pandemic: a two-wave replication study. https://doi.org/10.5522/04/20183858.v1
Carollo, A., Bizzego, A., Gabrieli, G., Wong, K. K. Y., Raine, A., & Esposito, G. (2022b). Self-perceived loneliness and depression during the Covid-19 pandemic: A two-wave replication study. UCL Open Environment.
Chen, Z. S., (Param) Kulkarni, P., Galatzer-Levy, I. R., Bigio, B., Nasca, C., & Zhang, Y. (2022). Modern views of machine learning for precision psychiatry. Patterns, 3(11), 100602. https://doi.org/10.1016/j.patter.2022.100602
Costello, E. J. (2016). Early Detection and Prevention of Mental Health problems: Developmental Epidemiology and systems of support. Journal of Clinical Child & Adolescent Psychology, 45(6), 710–717. https://doi.org/10.1080/15374416.2016.1236728
Disher, N., Vranas, K. C., Golden, S. E., Slatore, C. G., Tuepker, A., & Nugent, S. (2024). Findings from a qualitative study about ICU physicians’ wellbeing during the COVID-19 pandemic. Current Psychology, 43(21), 19569–19580. https://doi.org/10.1007/s12144-024-05722-1
Dyar, M. D., & Ytsma, C. R. (2021). Effect of data set size on geochemical quantification accuracy with laser-induced breakdown spectroscopy. Spectrochimica Acta Part B: Atomic Spectroscopy, 177, 106073. https://doi.org/10.1016/j.sab.2021.106073
Eid, J., Bøhn, E. K., Guderud, M. R., Rath, T. M., & Sætrevik, B. (2024). A qualitative study of the psychological effects of quarantine as an infection control measure in Norway. Current Psychology. https://doi.org/10.1007/s12144-024-06162-7
Fulmer, R., Joerin, A., Gentile, B., Lakerink, L., & Rauws, M. (2018). Using psychological Artificial Intelligence (Tess) to relieve symptoms of depression and anxiety: Randomized Controlled Trial. JMIR Mental Health, 5(4), e64. https://doi.org/10.2196/mental.9782
Ganguly, C., Nayak, S., & Gupta, A. K. (2022). Mental health impact of COVID-19 and machine learning applications in combating mental disorders: a review. In Artificial Intelligence, machine learning, and mental health in pandemics (pp. 1–51). Elsevier. https://doi.org/10.1016/B978-0-323-91196-2.00016-8
Gao, B., Shen, Q., Luo, G., Xu, Y., & Lu, J. (2024). Why does COVID-19 make me depressed? The longitudinal relationships between fear of COVID-19 and depressive symptoms: A moderated mediation model. Current Psychology. https://doi.org/10.1007/s12144-024-05944-3
Goutte, C., & Gaussier, E. (2005). A probabilistic interpretation of precision, recall and f-score, with implication for evaluation (pp. 345–359). https://doi.org/10.1007/978-3-540-31865-1_25
Grolinger, K., Hayes, M., Higashino, W. A., L’Heureux, A., Allison, D. S., & Capretz, M. A. M. (2014). Challenges for MapReduce in big data. 2014 IEEE world congress on services, 182–189. https://doi.org/10.1109/SERVICES.2014.41
Hale, T., Webster, S., Petherick, A., Phillips, T., & Kira, B. (2020). Oxford COVID-19 government response tracker (OxCGRT). Last Updated, 8, 30.
Hawkins, D. M. (2004). The Problem of Overfitting. Journal of Chemical Information and Computer Sciences, 44(1), 1–12. https://doi.org/10.1021/ci0342472
Imlawi, J., & Alsharo, M. (2017). Evaluating classification accuracy: The impact of resampling and dataset size. International Journal of Business Information Systems, 24(1), 91. https://doi.org/10.1504/IJBIS.2017.080947
Islam, M. A., Barna, S. D., Raihan, H., Khan, M. N. A., & Hossain, M. T. (2020a). Depression and anxiety among university students during the COVID-19 pandemic in Bangladesh: A web-based cross-sectional survey. PloS One, 15(8), e0238162.
Islam, M. A., Barna, S. D., Raihan, H., Khan, M. N. A., & Hossain, M. T. (2020b). Data_Bangladesh_COVID-19. Harvard Dataverse. https://doi.org/10.7910/DVN/N5BUJR. 2 ed.
Jakkula, V. (2006). Tutorial on support vector machine (svm). School of EECS Washington State University, 37(2.5), 3.
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s12525-021-00475-2
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9. Journal of General Internal Medicine, 16(9), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
Linda T. (n.d.). What is machine learning and how does it work? In-depth guide. Retrieved December 7 (2023). from https://ieeexplore.ieee.org/abstract/document/7906512
Manova, V., Grosso, F., Khoury, B., & Pagnini, F. (2024). Social anxiety: Topics and emotions shared on Reddit before and during the coronavirus pandemic. Current Psychology. https://doi.org/10.1007/s12144-024-05891-z
Murthy, S. K. (1998). Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4), 345–389. https://doi.org/10.1023/A:1009744630224
Navarro-Prados, A. B., Jiménez García-Tizón, S., Meléndez, J. C., & López, J. (2024). Predictors of stress among nursing home staff during COVID-19 pandemic. Current Psychology. https://doi.org/10.1007/s12144-024-05851-7
Pecora, G., Laghi, F., Baumgartner, E., Di Norcia, A., & Sette, S. (2024). The role of loneliness and positivity on adolescents’ mental health and sleep quality during the COVID-19 pandemic. Current Psychology. https://doi.org/10.1007/s12144-024-05805-z
Peñacoba-Puente, C., Luque-Reca, O., Griffiths, M. D., García-Hedrera, F. J., Carmona-Monge, F. J., & Gil-Almagro, F. (2024). The effects of fear of COVID-19 among Spanish healthcare professionals in three years after the pandemic onset via validation of the FCV-19S: A prospective study. Current Psychology. https://doi.org/10.1007/s12144-024-06113-2
Qasrawi, R., Amro, M., VicunaPolo, S., Al-Halawa, D. A., Agha, H., Seir, R. A., Hoteit, M., Hoteit, R., Allehdan, S., & Behzad, N. (2022a). Machine learning techniques for predicting depression and anxiety in pregnant and postpartum women during the COVID-19 pandemic: A cross-sectional regional study. F1000Research, 11(390), 390.
Qasrawi, R., Hoteit, M., Allehdan, S., Boukari, K., & Tayyem, R. (2022b). Pregnancy and Mental Health Data during COVID19. ed) Harvard Dataverse. https://doi.org/10.7910/DVN/FCDGEB. (V1.
Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2(3), 160. https://doi.org/10.1007/s42979-021-00592-x
Shahinfar, S., Meek, P., & Falzon, G. (2020). How many images do I need? Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring. Ecological Informatics, 57, 101085. https://doi.org/10.1016/j.ecoinf.2020.101085
Sharifani, K., & Amini, M. (2023). Machine learning and deep learning: A review of methods and applications. World Information Technology and Engineering Journal, 10(07), 3897–3904.
Sharma, N., Sharma, R., & Jindal, N. (2021). Machine Learning and Deep Learning Applications-A Vision. Global Transitions Proceedings, 2(1), 24–28. https://doi.org/10.1016/j.gltp.2021.01.004
Sordo, M., & Zeng, Q. (2005). On sample size and classification accuracy: A performance comparison (pp. 193–201). https://doi.org/10.1007/11573067_20
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder. Archives of Internal Medicine, 166(10), 1092. https://doi.org/10.1001/archinte.166.10.1092
Stockwell, D. R. B., & Peterson, A. T. (2002). Effects of sample size on accuracy of species distribution models. Ecological Modelling, 148(1), 1–13. https://doi.org/10.1016/S0304-3800(01)00388-X
Talevi, D., Socci, V., Carai, M., Carnaghi, G., Faleri, S., Trebbi, E., di Bernardo, A., Capelli, F., & Pacitti, F. (2020). Mental health outcomes of the CoViD-19 pandemic. Rivista Di Psichiatria, 55(3), 137–144.
Vigo, D., Thornicroft, G., & Atun, R. (2016). Estimating the true global burden of mental illness. The Lancet Psychiatry, 3(2), 171–178. https://doi.org/10.1016/S2215-0366(15)00505-2
Wang, L. (2019). Research and implementation of machine learning classifier based on KNN. IOP Conference Series: Materials Science and Engineering, 677(5), 052038. https://doi.org/10.1088/1757-899X/677/5/052038
Yan, L., Ding, X., Gan, Y., Wang, N., Wu, J., & Duan, H. (2024). The impact of loneliness on depression during the COVID-19 pandemic in China: A two-wave follow-up study. Current Psychology. https://doi.org/10.1007/s12144-024-05898-6
Funding
The present study was accomplished without any outside financial support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
The datasets utilized in this study are secondary and were collected by other researchers. As such, we have depended on them to adhere to the standards of their institutions in regard to ethical approval, safeguarding subjects, and obtaining informed consent.
Competing interest
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Arora, P., Dahiya, S. Unveiling the impact of dataset size on machine learning models for anxiety and depression prediction amid the COVID-19 pandemic: determining optimal data collection thresholds. Curr Psychol (2025). https://doi.org/10.1007/s12144-025-07432-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s12144-025-07432-8