Abstract
This paper presents a novel cross-dataset transfer learning approach for cough-based COVID-19 detection, enhancing model performance through data augmentation. Our methodology significantly improves results compared to baseline methods. An ablation study highlights the importance of alpha mixup among various hyperparameters in optimizing performance. The final model achieves an unweighted accuracy of 88.19%. Additionally, we provide a comparative summary with previous studies on the same evaluation set to offer insights into cough-based detection methods.











Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
The datasets generated and/or analyzed during the current study are obtained in the following schemes: Coswara: This dataset is available at the Coswara-Data repository: https://github.com/iiscleap/Coswara-Data. We used commit ID 401b516 during the study. COUGHVID: This dataset is available at the COUGHVID repository: https://zenodo.org/records/7024894. We used version 3.0 during the study. ComParE-CCS: This ComParE-CCS dataset is not publicly available. This is the dataset provided by the Computational Paralinguistic Challenge (ComParE) 2022 COVID-19 Cough sub-challenge organizer for their challenge. Please contact the authors [31] to obtain the dataset.
References
Hoda MN (2022) Editorial. Int J Inf Technol (Singapore) 14(7):3287–3290. https://doi.org/10.1007/s41870-022-01134-1
Yamin M (2020) Counting the cost of COVID-19. Int J Inf Technol (Singapore) 12(2):311–317. https://doi.org/10.1007/s41870-020-00466-0
Milling M, Pokorny FB, Bartl-Pokorny KD, Schuller BW (2022) Is speech the new blood? Recent progress in AI-based disease detection from audio in a nutshell. Front Digit Health 4(May):1–7. https://doi.org/10.3389/fdgth.2022.886615
Gupta R, Chaspari T, Kim J, Kumar N, Bone D, Narayanan S (2016) Pathological speech processing: State-of-the-art, current challenges, and future directions. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6470–6474. IEEE, Shanghai . https://doi.org/10.1109/ICASSP.2016.7472923
Pramono RXA, Imtiaz SA, Rodriguez-Villegas E (2016) A cough-based algorithm for automatic diagnosis of pertussis. PLoS ONE 11(9):1–20. https://doi.org/10.1371/journal.pone.0162128
Al-khassaweneh M, Abdelrahman RB (2013) A signal processing approach for the diagnosis of asthma from cough sounds. J Med Eng Technol 37(3):165–171. https://doi.org/10.3109/03091902.2012.758322
Swarnkar V, Abeyratne UR, Chang AB, Amrulloh YA, Setyati A, Triasih R (2013) Automatic identification of wet and dry cough in pediatric patients with respiratory diseases. Ann Biomed Eng 41(5):1016–1028. https://doi.org/10.1007/s10439-013-0741-6
Bertini F, Allevi D, Lutero G, Calzà L, Montesi D (2021) An automatic Alzheimer’s disease classifier based on spontaneous spoken English. Comput Speech Lang 72:101298. https://doi.org/10.1016/j.csl.2021.101298
Shuvo SB, Ali SN, Swapnil SI, Hasan T, Bhuiyan MIH (2021) A lightweight CNN model for detecting respiratory diseases from lung auscultation sounds using EMD-CWT-based hybrid scalogram. IEEE J Biomed Health Inf 25(7):2595–2603. https://doi.org/10.1109/JBHI.2020.3048006. arXiv:2009.04402
Han J, Xia T, Spathis D, Bondareva E, Brown C, Chauhan J, Dang T, Grammenos A, Hasthanasombat A, Floto A, Cicuta P, Mascolo C (2022) Sounds of COVID-19: exploring realistic performance of audio-based digital testing. Npj Digit Med 5(1):16. https://doi.org/10.1038/s41746-021-00553-x. arXiv:2106.15523
Anthes E (2020) Alexa, do I have COVID-19? Nature 586(7827):22–25. https://doi.org/10.1038/d41586-020-02732-4
Adjuik TA, Ananey-Obiri D (2022) Word2vec neural model-based techniqueto generate protein vectors for combating COVID-19: a machine learning approach. Int J Inf Technol (Singapore) 14(7):3291–3299. https://doi.org/10.1007/s41870-022-00949-2
Khanday AMUD, Rabani ST, Khan QR, Rouf N, Mohi Ud Din M (2020) Machine learning based approaches for detecting COVID-19 using clinical text data. Int J Inf Technol (Singapore) 12(3):731–739. https://doi.org/10.1007/s41870-020-00495-9
Singh D, Singh BK, Behera AK (2023) A real-time correlation model between lung sounds & clinical data for asthmatic patients. Int J Inf Technol 15(1):39–44. https://doi.org/10.1007/s41870-022-01138-x
Quatieri TF, Talkar T, Palmer JS (2020) A framework for biomarkers of COVID-19 based on coordination of speech-production subsystems. IEEE Open J Eng Med Biol 1:203–206. https://doi.org/10.1109/OJEMB.2020.2998051
Islam R, Abdel-Raheem E, Tarique M (2022) A study of using cough sounds and deep neural networks for the early detection of Covid-19. Biomed Eng Adv 3:100025. https://doi.org/10.1016/j.bea.2022.100025
Vahedian-azimi A, Keramatfar A, Asiaee M, Atashi SS, Nourbakhsh M (2021) Do you have COVID-19? An artificial intelligence-based screening tool for COVID-19 using acoustic parameters. J Acoust Soc Am 150(3):1945–1953. https://doi.org/10.1121/10.0006104
Bartl-Pokorny KD, Pokorny FB, Batliner A, Amiriparian S, Semertzidou A, Eyben F, Kramer E, Schmidt F, Schönweiler R, Wehler M, Schuller BW (2021) The voice of COVID-19: acoustic correlates of infection in sustained vowels. J Acoust Soc Am 149(6):4377–4383. https://doi.org/10.1121/10.0005194
Hamidi M, Zealouk O, Satori H, Laaidi N, Salek A (2023) COVID-19 assessment using HMM cough recognition system. Int J Inf Technol 15(1):193–201. https://doi.org/10.1007/s41870-022-01120-7
Hasan I, Dhawan P, Rizvi SAM, Dhir S (2023) Data analytics and knowledge management approach for COVID-19 prediction and control. Int J Inf Technol (Singapore) 15(2):937–954. https://doi.org/10.1007/s41870-022-00967-0
Mohammed EA, Keyhani M, Sanati-Nezhad A, Hejazi SH, Far BH (2021) An ensemble learning approach to digital corona virus preliminary screening from cough sounds. Sci Rep 11(1):1–11. https://doi.org/10.1038/s41598-021-95042-2
Chowdhury NK, Kabir MA, Rahman MM, Islam SMS (2022) Machine learning for detecting COVID-19 from cough sounds: an ensemble-based MCDM method. Comput Biol Med 145(March):105405. https://doi.org/10.1016/j.compbiomed.2022.105405
Casanova E, Candido Jr, A, Fernandes Jr, RC, Finger M, Gris LRS, Ponti MA, Pinto da Silva DP (2021) Transfer learning and data augmentation techniques to the COVID-19 identification tasks in ComParE 2021. In: Interspeech 2021, pp. 446–450. ISCA, ISCA https://doi.org/10.21437/Interspeech.2021-1798. https://www.isca-speech.org/archive/interspeech_2021/casanova21_interspeech.html
Sharma G, Umapathy K, Krishnan S (2022) Audio texture analysis of COVID-19 cough, breath, and speech sounds. Biomed Signal Process Control. https://doi.org/10.1016/j.bspc.2022.103703
Atmaja BT, Zanjabila Suyanto Sasou A (2023) Comparing hysteresis comparator and RMS threshold methods for automatic single cough segmentations. Int J Inf Technol. https://doi.org/10.1007/s41870-023-01626-8
Suyanto Z, Atmaja BT, Asmoro WA (2024) Performance improvement of Covid-19 cough detection based on deep learning with segmentation methods. J Appl Data Sci 5(2):520–531
Wang CC, Pan CA, Hung JW (2008) Silence feature normalization for robust speech recognition in additive noise environments. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp 1028–1031
Atmaja BT, Akagi M (2020) The effect of silence feature in dimensional speech emotion recognition. In: 10th international conference on speech prosody 2020, pp 26–30. ISCA, Tokyo. https://doi.org/10.21437/SpeechProsody.2020-6
Orlandic L, Teijeiro T, Atienza D (2021) The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms. Sci Data 8(1):156. https://doi.org/10.1038/s41597-021-00937-4
Haritaoglu ED, Rasmussen N, Tan DCH, J, JR, Xiao J, Chaudhari G, Rajput A, Govindan P, Canham C, Chen W, Yamaura M, Gomezjurado L, Broukhim A, Khanzada A, Pilanci M (2022) Using deep learning with large aggregated datasets for COVID-19 classification from cough, 1–10 arXiv:2201.01669
Schuller BW, Batliner A, Bergler C, Mascolo C, Han J, Lefter I, Kaya H, Amiriparian S, Baird A, Stappen L, Ottl S, Gerczuk M, Tzirakis P, Brown C, Chauhan J, Grammenos A, Hasthanasombat A, Spathis D, Xia T, Cicuta P, Rothkrantz LJM, Zwerts JA, Treep J, Kaandorp CS (2021) The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates. In: Interspeech 2021, pp 431–435. ISCA, ISCA . https://doi.org/10.21437/Interspeech.2021-19
Sharma N, Krishnan P, Kumar R, Ramoji S, Chetupalli SR, Nirmala R, Kumar Ghosh P, Ganapathy S (2020) Coswara - A database of breathing, cough, and voice sounds for COVID-19 diagnosis. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH 2020-Octob, 4811–4815 arXiv:2005.10548. https://doi.org/10.21437/Interspeech.2020-2768
McFee B, Lostanlen V, McVicar M, Metsai A, Balke S, Thomé C, Raffel C, Malek A, Lee D, Zalkow F, Lee K, Nieto O, Mason J, Ellis D, Yamamoto R, Seyfarth S, Battenberg E, Morozov V, Bittner R, Choi K, Moore J, Wei Z, Hidaka S, Nullmightybofo Friesch P, Stöter F-R, Hereñú D, Kim T, Vollrath M, Weiss A (2020) librosa/librosa: 0.7.2 . https://doi.org/10.5281/ZENODO.3606573
Guo J, Sainath TN, Weiss RJ (2019) A Spelling Correction Model for End-to-end Speech Recognition. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5651–5655. IEEE, Brighton, UK. https://doi.org/10.1109/ICASSP.2019.8683745
Choi K, Wang Y (2021) Listen, Read, and Identify: Multimodal Singing Language Identification. In: Proc Ofthe 22nd Int Society for Music Information Retrieval Conf, pp 121–127
Liu Z-T, Xiao P, Li D-Y, Hao M (2019) Speaker-Independent Speech Emotion Recognition Based on CNN-BLSTM and Multiple SVMs. In: International conference on intelligent robotics and applications, pp 481–491
Choi K, Fazekas G, Sandler M, Cho K (2018) A comparison of audio signal preprocessing methods for deep neural networks on music tagging. In: 2018 26th European signal processing conference (EUSIPCO), pp 1870–1874. IEEE, Rome, Italy. https://doi.org/10.23919/EUSIPCO.2018.8553106
Yang Y-Y, Hira M, Ni Z, Astafurov A, Chen C, Puhrsch C, Pollack D, Genzel D, Greenberg D, Yang EZ, Lian J, Hwang J, Chen J, Goldsborough P, Narenthiran S, Watanabe S, Chintala S, Quenneville-Belair V (2022) Torchaudio: building blocks for audio and speech processing. In: ICASSP 2022 - 2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), vol 2022-May, pp 6982–6986. IEEE, https://doi.org/10.1109/ICASSP43922.2022.9747236
Kong Q, Cao Y, Iqbal T, Wang Y, Wang W, Plumbley MD (2020) PANNs: large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Trans Audio Speech Lang Process 28(1):2880–2894. https://doi.org/10.1109/TASLP.2020.3030497. arXiv:1912.10211
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. In: 7th International conference on learning representations, ICLR arXiv:1711.05101
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system arXiv:1603.02754. https://doi.org/10.1145/2939672.2939785
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018) MixUp: beyond empirical risk minimization. In: 6th international conference on learning representations, ICLR 2018 - Conference Track Proceedings, pp 1–13
Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, Le QV (2019) Specaugment: a simple data augmentation method for automatic speech recognition. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, vol 2019-Septe, pp 2613–2617. https://doi.org/10.21437/Interspeech.2019-2680
Snyder D, Chen G, Povey D (2015) MUSAN: a music, speech, and noise corpus arXiv:1510.08484
Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12. https://doi.org/10.1109/MIS.2009.36
Goodfellow I, Bengio Y, Courville A (2015) Deep Learning Book. MIT Press, Cambridge
Atmaja BT, Sasou A (2022) Effects of data augmentations on speech emotion recognition. Sensors 22(16):5941. https://doi.org/10.3390/s22165941
Coppock H, Akman A, Bergler C, Gerczuk M, Brown C, Chauhan J, Grammenos A, Hasthanasombat A, Spathis D, Xia T, Cicuta P, Han J, Amiriparian S, Baird A, Stappen L, Ottl S, Tzirakis P, Batliner A, Mascolo C, Schuller BW (2023) A summary of the ComParE COVID-19 challenges. Front Digit Health 5:1–2. https://doi.org/10.3389/fdgth.2023.1058163. arXiv:2202.08981
Illium S, Müller R, Sedlmeier A, Popien CL (2021) Visual transformers for primates classification and covid detection. Proc Ann Conf Int Speech Commun Assoc INTERSPEECH 6:4341–4345. https://doi.org/10.21437/Interspeech.2021-273
Solera-Ureña R, Botelho C, Teixeira F, Rolland T, Abad A, Trancoso I (2021) Transfer learning-based cough representations for automatic detection of COVID-19. Proc Ann Conf Int Speech Commun Assoc INTERSPEECH 6:4336–4340. https://doi.org/10.21437/Interspeech.2021-1702
Acknowledgements
B.T.A. and A.S. are supported by the New Energy and Industrial Technology Development Organization (NEDO) Japan Project No. JPNP20006 and JSPS KAKENHI Grant Number 24K02967. Z., S., and W. A. A. are supported by project number 1014/PKS/ITS/2022, funded by the Directorate of Research and Community Service, Sepuluh Nopember Institute of Technology (ITS), Indonesia. The authors would like to thank Dr. Dhany Arifianto of VibrasticLab ITS for allowing us to use his computational resources for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no Conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Atmaja, B.T., Zanjabila, Suyanto et al. Cross-dataset COVID-19 transfer learning with data augmentation. Int. j. inf. tecnol. (2025). https://doi.org/10.1007/s41870-025-02433-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41870-025-02433-z