Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Apr 10, 2020
Date Accepted: May 15, 2020
Date Submitted to PubMed: May 19, 2020
#COVID-19: A Public Coronavirus Twitter Dataset Tracking Social Media Discourse about the Pandemic
ABSTRACT
Background:
At the time of this writing, the novel coronavirus (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much conversation about these phenomena now occurs online, e.g., on social media platforms like Twitter.
Objective:
In this paper, we describe a multilingual coronavirus (COVID-19) Twitter that we are making our dataset available to the research community in our COVID-19-TweetIDs Github repository: https://github.com/echen102/COVID-19-TweetIDs.
Methods:
We have been collecting this dataset since January 28, 2020, leveraging Twitter’s Streaming API [13] and Tweepy [17] to follow certain keywords and accounts that were trending at the time the collection began, and used Twitter’s Search API [16] to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020.
Results:
Since the inception of our collection, we have been actively maintaining and updating our Github repository on a weekly basis. We currently have published over 72 million tweets, with over 60% of the tweets in English. This manuscript also presents basic analysis that shows that Twitter activity responds and reacts to coronavirus-related events.
Conclusions:
It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This dataset could also help track scientific coronavirus misinformation and unverified rumors or enable the understanding of fear and panic – and undoubtedly more.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.