Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jul 17, 2020
Date Accepted: Nov 16, 2020
Date Submitted to PubMed: Nov 18, 2020
Hate Detection in COVID-19 Tweets in the Arab Region using Deep learning and Topic Modeling
ABSTRACT
Background:
The massive scale of social media platforms requires an automatic solution for detecting hate speech. Such solutions will help in reducing the manual analysis of content. Most of the past literature has casted the hate speech detection problem as a supervised text classification task, whether by using classical machine learning methods or, more recently, using deep learning methods. However, previous works investigated this problem in Arabic cyberspace is still limited compared to the published works in English.
Objective:
This study aims to identify hate-speech posted by Twitter users in the Arab region related to COVID-19 pandemic and discover the main issues discussed among them.
Methods:
We used ArCOV-19 dataset, which is an ongoing collection of Arabic tweets related to the novel Coronavirus COVID-19, starting from January 27, 2020. Tweets were analyzed for hate speech using pretrained Convolutional neural network (CNN) model, and the results of the tweets classification were given a score varied between 0 to 1, with 1 being the most hateful text. We also utilized Non-negative Matrix Factorization (NMF) to discover main issues and topics in hate tweets.
Results:
Analysis of hate-speech in Twitter data in the Arab region has identified that the number of non-hate tweets by far exceeded the number of hate tweets, where the percentage of hate tweets in COVID-19 related tweets is 3.2%. It also revealed that the majority of hate tweets (71.4%) are in the low level of hate. This study has identified Saudi Arabia as the highest Arab country in spreading COVID-19 hate tweets during the pandemic. Furthermore, it has shown that the second time period (Mar 1- Mar 30) has the largest number of hate tweets which represents 51.9% of all hate tweets. Conflicting to what was anticipated, in the Arab region, it has been found that the spread of COVID-19 hate-speech in Twitter is not consistent with the dissemination of the pandemic. The study has also identified the discussed topics in hate tweets during the pandemic. Analysis of 7 extracted topics showed that 6 of the 7 identified topics involved topics related to hate against China and Iran. Arab users also discussed topics related to political conflicts in Arab region during the COVID-19 pandemic.
Conclusions:
To nations around the world, the COVID-19 pandemic was a serious public health challenge. During COVID-19, frequent use of social media can contribute to spreading hate speech. Online hate speech can have a negative impact on society, and hate speech may have a direct correlation with real hate crimes, which raises the threat of being targeted by hate speech and abusive language. This study is the first to analyze hate speech in the context of Arabic COVID-19 tweets in the Arab region.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.