Skip to main content
Log in

Unveiling the impact of dataset size on machine learning models for anxiety and depression prediction amid the COVID-19 pandemic: determining optimal data collection thresholds

  • Published:
Current Psychology Aims and scope Submit manuscript

Abstract

Our emotional, psychological, and social well-being are all parts of our mental health. An individual’s routine can be disrupted and their mental is health affected by stress, despair, and anxiety. Mental health preservation and restoration are essential for each person as well as for communities and society as a whole. The COVID-19 pandemic has triggered a strong emotional and psychological reaction in many people, in addition to triggering a global health emergency. The pandemic’s uncertainty, disruptions, and social changes have amplified stress, fear, and depression, which are common responses to crises. Data collection for the COVID-19-related depression and anxiety assessment was limited to online methods because of the ongoing pandemic. In the field of mental health evaluation, the application of machine learning techniques has emerged as a promising strategy for identifying and grasping anxiety and depression symptoms. This paper employed K-Nearest Neighbors (kNN), Random Forest (RF), Decision Tree (DT), and Support Vector Machine (SVM) techniques on the prevalence of anxiety and depression among Bangladeshi university students during the COVID-19 pandemic. This paper addresses how the accuracy of predictions made by various machine learning models is affected by the size of the datasets. The findings of this study illuminate the scalability and generalizability of different machine-learning methods. The findings validate that how accuracy of the models has consistently and significantly improved as the dataset size varies. The performance of classification models is further assessed using the F1 score, precision, and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Dataset-1: https://doi.org/10.7910/DVN/N5BUJR (Islam et al., 2020b)

Dataset-2: https://doi.org/10.7910/DVN/FCDGEB (Qasrawi et al., 2022a, b)

Dataset-3: https://doi.org/10.5522/04/20183858 (Carollo et al., 2022a)

References

Download references

Funding

The present study was accomplished without any outside financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonika Dahiya.

Ethics declarations

Ethical approval

The datasets utilized in this study are secondary and were collected by other researchers. As such, we have depended on them to adhere to the standards of their institutions in regard to ethical approval, safeguarding subjects, and obtaining informed consent.

Competing interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arora, P., Dahiya, S. Unveiling the impact of dataset size on machine learning models for anxiety and depression prediction amid the COVID-19 pandemic: determining optimal data collection thresholds. Curr Psychol (2025). https://doi.org/10.1007/s12144-025-07432-8

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12144-025-07432-8

Keywords