Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jul 17, 2020
Date Accepted: Nov 16, 2020
Date Submitted to PubMed: Nov 18, 2020

The final, peer-reviewed published version of this preprint can be found here:

Detection of Hate Speech in COVID-19–Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach

Alshalan R, Al-Khalifa H, Alsaeed D, Al-Baity H, Alshalan S

Detection of Hate Speech in COVID-19–Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach

J Med Internet Res 2020;22(12):e22609

DOI: 10.2196/22609

PMID: 33207310

PMCID: 7725497

Hate Detection in COVID-19 Tweets in the Arab Region using Deep learning and Topic Modeling

  • Raghad Alshalan; 
  • Hend Al-Khalifa; 
  • Duaa Alsaeed; 
  • Heyam Al-Baity; 
  • Shahad Alshalan

ABSTRACT

Background:

The massive scale of social media platforms requires an automatic solution for detecting hate speech. Such solutions will help in reducing the manual analysis of content. Most of the past literature has casted the hate speech detection problem as a supervised text classification task, whether by using classical machine learning methods or, more recently, using deep learning methods. However, previous works investigated this problem in Arabic cyberspace is still limited compared to the published works in English.

Objective:

This study aims to identify hate-speech posted by Twitter users in the Arab region related to COVID-19 pandemic and discover the main issues discussed among them.

Methods:

We used ArCOV-19 dataset, which is an ongoing collection of Arabic tweets related to the novel Coronavirus COVID-19, starting from January 27, 2020. Tweets were analyzed for hate speech using pretrained Convolutional neural network (CNN) model, and the results of the tweets classification were given a score varied between 0 to 1, with 1 being the most hateful text. We also utilized Non-negative Matrix Factorization (NMF) to discover main issues and topics in hate tweets.

Results:

Analysis of hate-speech in Twitter data in the Arab region has identified that the number of non-hate tweets by far exceeded the number of hate tweets, where the percentage of hate tweets in COVID-19 related tweets is 3.2%. It also revealed that the majority of hate tweets (71.4%) are in the low level of hate. This study has identified Saudi Arabia as the highest Arab country in spreading COVID-19 hate tweets during the pandemic. Furthermore, it has shown that the second time period (Mar 1- Mar 30) has the largest number of hate tweets which represents 51.9% of all hate tweets. Conflicting to what was anticipated, in the Arab region, it has been found that the spread of COVID-19 hate-speech in Twitter is not consistent with the dissemination of the pandemic. The study has also identified the discussed topics in hate tweets during the pandemic. Analysis of 7 extracted topics showed that 6 of the 7 identified topics involved topics related to hate against China and Iran. Arab users also discussed topics related to political conflicts in Arab region during the COVID-19 pandemic.

Conclusions:

To nations around the world, the COVID-19 pandemic was a serious public health challenge. During COVID-19, frequent use of social media can contribute to spreading hate speech. Online hate speech can have a negative impact on society, and hate speech may have a direct correlation with real hate crimes, which raises the threat of being targeted by hate speech and abusive language. This study is the first to analyze hate speech in the context of Arabic COVID-19 tweets in the Arab region.


 Citation

Please cite as:

Alshalan R, Al-Khalifa H, Alsaeed D, Al-Baity H, Alshalan S

Detection of Hate Speech in COVID-19–Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach

J Med Internet Res 2020;22(12):e22609

DOI: 10.2196/22609

PMID: 33207310

PMCID: 7725497

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

Advertisement