Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 9, 2021
Date Accepted: Nov 9, 2021
Date Submitted to PubMed: Dec 23, 2021

The final, peer-reviewed published version of this preprint can be found here:

Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study

Husnayain A, Shim E, Fuad A, Su ECY

Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study

J Med Internet Res 2021;23(12):e34178

DOI: 10.2196/34178

PMID: 34762064

PMCID: 8698803

Predicting New Daily COVID-19 Cases and Deaths Utilizing Search Engine Query Data in South Korea from 2020 to 2021: Infodemiology Study

  • Atina Husnayain; 
  • Eunha Shim; 
  • Anis Fuad; 
  • Emily Chia-Yu Su

ABSTRACT

Background:

Given the ongoing coronavirus disease 2019 (COVID-19) pandemic situation, accurate predictions could greatly help in the health resource management for future waves. However, as a new entity, COVID-19’s disease dynamics seemed difficult to predict. External factors, such as internet search data, need to be included in the models to increase the accuracy of these models. However, it remains unclear whether incorporating online search volumes into models leads to better predictive performances for a long-term prediction.

Objective:

This study aimed to analyze whether search engine query data are important variables that should be included in the models predicting short- and long-term periods of new daily COVID-19 cases and deaths.

Methods:

We used country-level case-related data, NAVER search volumes, and mobility data obtained from Google and Apple for the period of January 20, 2020 to July 31, 2021 in South Korea. Data were aggregated into four subsets (3, 6, 12, and 18 months). The first 80% of the data in all subsets were used as the training set and remaining data served as the test set. Generalized linear models (GLMs) with normal, Poisson, and negative binomial distribution were developed along with linear regression (LR) models with lasso, adaptive lasso, and elastic net regularization. Value of the root mean squared error (RMSE) were defined as a loss function and were used to assess the performance of the models. All analyses and visualizations were conducted in SAS Studio, which is part of the SAS OnDemand for Academics.

Results:

GLMs with different types of distribution functions may have been beneficial in predicting new daily COVID-19 cases and deaths in the early stages of the outbreak. Non-normal distributions of cases and deaths were better predicted using the Poisson or negative binomial function. Over longer periods, as the distribution of cases and deaths became more normally distributed, LR models with regularization may have outperform the GLMs. This study also found that better performances of the models were achieved in predicting new daily deaths compared to new daily cases. In addition, an evaluation of effect of features in the models showed that NAVER search volumes were useful variables in predicting new daily COVID-19 cases, particularly in the first six months of the outbreak. Searches related to logistical needs, particularly for “thermometer” and “mask strap” showed higher feature effects in that period. For longer prediction periods, NAVER search volumes were still found to be an important variable, although with a lower feature effect. This finding suggests that term utilization should be considered to maintain the predictive performance.

Conclusions:

NAVER search volumes were important variables in the short- and long-term prediction with higher feature effects for predicting new daily COVID-19 cases in the first six months of the outbreak. Similar results were also found for death predictions.


 Citation

Please cite as:

Husnayain A, Shim E, Fuad A, Su ECY

Predicting New Daily COVID-19 Cases and Deaths Using Search Engine Query Data in South Korea From 2020 to 2021: Infodemiology Study

J Med Internet Res 2021;23(12):e34178

DOI: 10.2196/34178

PMID: 34762064

PMCID: 8698803

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement