Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 12, 2022
Date Accepted: Sep 20, 2022
Date Submitted to PubMed: Oct 11, 2022

The final, peer-reviewed published version of this preprint can be found here:

Sexually Transmitted Disease–Related Reddit Posts During the COVID-19 Pandemic: Latent Dirichlet Allocation Analysis

Johnson A, Bhaumik R, Nandi D, Roy A, Mehta SD

Sexually Transmitted Disease–Related Reddit Posts During the COVID-19 Pandemic: Latent Dirichlet Allocation Analysis

J Med Internet Res 2022;24(10):e37258

DOI: 10.2196/37258

PMID: 36219757

PMCID: 9624277

Sexually Transmitted Disease-Related Reddit Posts During the COVID-10 Pandemic: Latent Dirichlet Allocation Analysis

  • Amy Johnson; 
  • Runa Bhaumik; 
  • Debarghya Nandi; 
  • Abhishikta Roy; 
  • Supriya D Mehta

ABSTRACT

Background:

Sexually Transmitted Diseases (STDs) are common and costly, impacting approximately one in five people annually. Reddit, the sixth most used internet site in the world, is a user-generated social media discussion platform that may be useful in monitoring discussion about STD symptoms and exposure.

Objective:

This study sought to define and identify patterns and insights into STD related discussions on Reddit over the course of the COVID-19 pandemic.

Methods:

We extracted posts from Reddit from March 2019 through July 2021. We used a machine learning text mining method, Latent Dirichlet Allocation (LDA), to conduct a text analysis to identify the most common topics discussed in the Reddit posts. We then used word clouds, qualitative topic labelling, and spline regression to characterize the content and distribution of topics observed.

Results:

Our extraction resulted in 24,311 total posts. LDA Coding showed that with 8 topics for each time period we achieved high coherence values (pre-COVID=0.41, pre-vaccine=0.42; post-vaccine=0.44). While most topic categories remained the same over time, the relative proportion of topics changed and new topics emerged. Spline regression revealed some key terms had variability in the percentage of posts that coincided with COVID-19 pre- and post- periods, while others were uniform across the study periods.

Conclusions:

Our study’s use of Reddit is a novel way to gain insights into STD symptoms experienced, potential exposures, testing decisions, common questions, and behavior patterns (e.g., during lock down periods). For example, reduction in STD screening may result in observed negative health outcomes due to missed cases, which also impacts onward transmission. As Reddit use is anonymous, users may discuss sensitive topics with greater detail, and more freely than in clinical encounters. Data from anonymous Reddit posts may be leveraged to enhance understanding of the distribution of disease and need for targeted outreach/screening programs. This study demonstrates Reddit has feasibility and utility to enhance understanding of sexual behaviors, STD experiences, and needed health engagement with the public.


 Citation

Please cite as:

Johnson A, Bhaumik R, Nandi D, Roy A, Mehta SD

Sexually Transmitted Disease–Related Reddit Posts During the COVID-19 Pandemic: Latent Dirichlet Allocation Analysis

J Med Internet Res 2022;24(10):e37258

DOI: 10.2196/37258

PMID: 36219757

PMCID: 9624277

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.