Accepted for/Published in: JMIR Formative Research
Date Submitted: Feb 25, 2022
Date Accepted: Nov 30, 2022
Date Submitted to PubMed: Dec 2, 2022
Using Twitter Data to Estimate Prevalence of Mental Disorder Symptoms in the United States During the COVID-19 Pandemic: Ecological Cohort Study
ABSTRACT
Background:
Existing research and national surveillance data suggested an increase of the prevalence of mental disorders during the coronavirus disease 2019 (COVID-19) pandemic. Social media, such as Twitter, could be a source of data for prediction due to its real-time nature, high availability, and large geographical coverage. However, there is a dearth of studies validating the accuracy of Twitter-based prevalence for mental disorders through the comparison with CDC-reported prevalence.
Objective:
This study aims to assess the correlations between Twitter-based prevalence of mental disorder symptoms (i.e., anxiety and depressive disorder symptoms) and the one based on national surveillance data, and identify the temporal trend of these correlations.
Methods:
State-level prevalence of anxiety and depressive symptoms were retrieved from the national Household Pulse Survey (HPS) through Centers for Disease Control and Prevention (CDC) from April 2020 to July 2021. Tweets were retrieved from the Twitter streaming API during the same period and used to estimate the prevalence of mental disorder symptoms for each state using keyword analysis. Stratified linear mixed models were employed. The magnitude and significance of model parameters were used to evaluate the correlations. Temporal trends of correlations were tested after adding the time variable to the model. Geospatial differences were compared based on random effects.
Results:
The Pearson correlations between the overall prevalence based on CDC and Twitter for anxiety and depressive disorder symptoms were 0.587 (P<0.001) and 0.368 (P<0.001), respectively. Stratified by four phases (i.e., April 2020, August 2020, October 2020, and April 2021) defined by HPS, linear mixed models showed that Twitter-based prevalence for anxiety disorder symptoms had a positive and significant correlation with CDC-reported prevalence in phases 2 and 3 while a significant correlation for depressive disorder symptoms was identified in phases 1 and 3.
Conclusions:
Positive correlations are identified between Twitter-based and CDC-reported prevalence and temporal trends of these correlations were found. Geospatial differences in prevalence of mental disorder symptoms were found between the northern and southern U.S. Findings from this study could inform the future investigation on leveraging social media platforms to estimate mental disorder symptoms and the provision of immediate prevention measures to improve health outcomes.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.