Next Article in Journal
Internet of Things: A General Overview between Architectures, Protocols and Applications
Next Article in Special Issue
A Sentence Classification Framework to Identify Geometric Errors in Radiation Therapy from Relevant Literature
Previous Article in Journal
The Impact of Equity Information as An Important Factor in Assessing Business Performance
Previous Article in Special Issue
Lightweight End-to-End Neural Network Model for Automatic Heart Sound Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Monitoring People’s Emotions and Symptoms from Arabic Tweets during the COVID-19 Pandemic

1
Center for Language Engineering, Al-Khawarizmi Institute of Computer Science, University of Engineering and Technology, Lahore 54890, Pakistan
2
College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
*
Author to whom correspondence should be addressed.
Information 2021, 12(2), 86; https://doi.org/10.3390/info12020086
Submission received: 13 January 2021 / Revised: 8 February 2021 / Accepted: 15 February 2021 / Published: 19 February 2021

Abstract

:
Coronavirus-19 (COVID-19) started from Wuhan, China, in late December 2019. It swept most of the world’s countries with confirmed cases and deaths. The World Health Organization (WHO) declared the virus a pandemic on 11 March 2020 due to its widespread transmission. A public health crisis was declared in specific regions and nation-wide by governments all around the world. Citizens have gone through a wide range of emotions, such as fear of shortage of food, anger at the performance of governments and health authorities in facing the virus, sadness over the deaths of friends or relatives, etc. We present a monitoring system of citizens’ concerns using emotion detection in Twitter data. We also track public emotions and link these emotions with COVID-19 symptoms. We aim to show the effect of emotion monitoring on improving people’s daily health behavior and reduce the spread of negative emotions that affect the mental health of citizens. We collected and annotated 5.5 million tweets in the period from January to August 2020. A hybrid approach combined rule-based and neural network techniques to annotate the collected tweets. The rule-based technique was used to classify 300,000 tweets relying on Arabic emotion and COVID-19 symptom lexicons while the neural network was used to expand the sample tweets that were annotated using the rule-based technique. We used long short-term memory (LSTM) deep learning to classify all of the tweets into six emotion classes and two types (symptom and non-symptom tweets). The monitoring system shows that most of the tweets were posted in March 2020. The anger and fear emotions have the highest number of tweets and user interactions after the joy emotion. The results of user interaction monitoring show that people use likes and replies to interact with non-symptom tweets while they use re-tweets to propagate tweets that mention any of COVID-19 symptoms. Our study should help governments and decision-makers to dispel people’s fears and discover new symptoms associated with the symptoms that were declared by the WHO. It can also help in the understanding of people’s mental and emotional issues to address them before the impact of disease anxiety becomes harmful in itself.

1. Introduction

The COVID-19 disease has been declared a pandemic due to the quick spread of the virus all over the world. It profoundly affects all aspects of society, including mental and physical health. COVID-19’s rapid spread created a global public health crisis that made governments limit some activities including the non-essential economy and social activities, and there were disruptions such as border closing, loss of jobs, airline shutdowns, and financial breakdowns. These insecurities and disturbances cause citizens to feel anxious, fearful, and depressed. In any public health crisis, tracking and monitoring systems helps the concerned decision-makers to create situational awareness of disease spread and to identify newly affected areas to initiate appropriate and timely responses. The damage and direct impact of a crisis, such as economic losses [1], hospitalized patients [2], and death rates [3] are often monitored. However, there are few studies or systems that provide awareness of citizens’ feelings towards the pandemic, especially for low-resourced languages such as Arabic [4,5,6,7].
Infodemiology is the science that uses the information in an electronic medium such as social media platforms to help public health and policy-makers assess and forecast epidemics and outbreaks [8]. It has been used in different applications related to the COVID-19 pandemic, including the tracking of the spread of the new COVID-19 disease [8], monitoring of the COVID-19 trends in Google Trends [9], determine the correlation between internet search for some symptoms and the confirmed case count of the COVID-19 [10], and investigating the information about the prevention of COVID-19 on the internet [11], etc.
Twitter, a social media platform, is one of the popular platforms used to conduct social media research in academia and industry [12]. It is used as a data source in many applications, including the monitoring of mental health [13], disease [14], transportation [15], patient safety [16], and customer opinions [17], etc. The ease of obtaining data from Twitter through its APIs attracts more researchers to conduct their research on Twitter’s data.
To understand the psychological and psychological implications of a pandemic, the emotions involved, such as fear and anger, must be taken into account. In this research, we used Twitter data to analyze the emotional reactions of citizens during the COVID-19 pandemic. Concerns may also be raised due to uncertainty, political changes, and the contradictory actions of different sectors of government. We analyzed six types of emotion to find citizens’ concerns to create situational awareness of emotional stress and anxiety levels in healthy individuals and intensify the symptoms of those with pre-existing psychiatric disorders. We have classified the dataset into six different emotions. These emotions have been identified by the psychologist Paul Echman as being universally experienced in all human cultures as shown in (https://www.verywellmind.com/an-overview-of-the-types-of-emotions-4163976 (accessed on 18 February 2021)). We also compare these types of emotion with tweets that mention any of the disease symptoms, to study citizens’ fears and analyze them by doctors and psychiatrists.
The majority of existing monitoring systems were developed for languages such as English and other European languages. To the best of our knowledge, this is the first study in Arabic to monitor people’s emotions during the COVID-19 pandemic from textual data on social media. The major contributions of this research are as follows:
  • The building and annotating of large Arabic emotion and symptom corpora from Twitter.
  • Developing a system for monitoring people’s emotions and link these emotions with tweets that mention any of the COVID-19 pandemic symptoms.
The rest of the paper is organized as follows: In Section 2, we present a literature survey of the existing monitoring systems and discuss their approaches and performance. In Section 3, we present our methodology for the corpus generation process and show in detail the data collection, emotional annotation, statistical information about the corpus, and deep learning classifier. In Section 4, we present the experimental results and analysis of the rule-based and automatic annotation on our corpus and the emotion monitoring over time, including the evolution and spikes of emotions, tracking emotions over time, and tracking user interaction. In Section 5, we present a discussion about the proposed emotion dataset and monitoring system. In Section 6, we conclude with a discussion of our work and present our future plan.

2. Related Work

Emotion detection is one of the natural language processing and text analytics fields used to discover people’s feelings in written texts [18]. It has been applied in different tracking systems, including disasters and social media monitoring. In this section, we will show the latest research on emotion monitoring and tracking with the main focus on the extraction of emotion from Arabic text and COVID-19.
Tracking citizens’ concerns during the COVID-19 pandemic has been studied in [19]. The authors presented an approach to measure and track citizens’ concern levels by analyzing the sentiment in Twitter data using the ratio of negative and very negative tweet counts over the total number of tweets on the dataset. They studied 30,000 tweets from March 14, 2020 to show the degree of concern by US states. Medical news and network analysis on Twitter data in Korea have been proposed in [20]. The study investigated information transmission networks and news sharing related to COVID-19 tweets. The study aimed to show how COVID-19 issues circulated on Twitter through network analysis. It has also shown that monitoring public conversations and spreading media news can assist public health professionals to make fast decisions. Tracking mental health and symptom mentions on Twitter during the COVID-19 pandemic has been studied in [21]. The study showed that social media can be used to measure the mental health of a country during a public health crisis. It can also enable the early detection of disease symptoms. A real-time dashboard (https://bit.ly/penncovidmap (accessed on 18 February 2021)) was developed to monitor the change in anxiety, top symptom mentions on Twitter in the US, and the common and most discussed topics related to healthcare. Emotion analysis using English Twitter data during the COVID-19 pandemic was presented in [22]. They used the National Research Council Canada (NRC) Word–Emotion Association Lexicon to classify the collected data into eight basic emotion categories (anger, fear, disgust, anticipation, joy, sadness, surprise, and trust). By using emotion analysis, authorities can better understand the mental health of the people.
Arabic Twitter data was used to study depressive emotions [23]. They collected data from the Gulf region targeting people who self-declared the diagnosis of depression. They collected another set of data as a control group to be labeled as non-depressed. They later built a predictive model using different supervised learning algorithms to see if the user’s tweet is depressed or not. Their feature set includes symptoms of clinical depression and online depression-related behavior. The data was unbalanced since they have 27 depressed people and 62 non-depressed people. They used the classical metrics for measuring the performance of their model (precision, recall, f-measure, and accuracy).
In [24], the authors highlighted the importance of looking after the mental health and psychosocial crisis that caused the COVID-19 outbreak. They argued that mental health and wellbeing are essential parts of healthcare; consequently, studying and mitigating these issues is vital for a stable and healthy community. They also explored some factors that might contribute to mental health issues: the uncertainty of this new illness; the unpredictability of new risks including self-isolation, social distancing, and quarantine; impaired social functioning; interpersonal issues; the perpetuation of emotional and behavioral disorders and psychological problems; predisposed mental health issues; and the tendency of being easily affected by traumatic events.
The magnitude of the novel corona virus (COVID-19) pandemic has led to considerable economic hardships, stress, anxiety, and concerns about the future. Social media can provide a place for measuring the pulse of mental health in communities. Social media plays a vital role in recording the reactions, opinions, and mental health features of social media users, as was found when the changes in psycholinguistic features before and after a lockdown in Wuhan and Lombardy were studied and the differences in the frequencies of word categories before and after lockdown were compared [25].
In [26], the study analyzed 167,073 tweets collected from the beginning of February to mid-March 2020. They studied the word frequencies and applied the Latent Dirichlet Allocation (LDA) technique to identify the most common topics in these tweets. Their analysis found 12 topics, which later on clustered into four main groups: the virus origin; its sources; its impact on people, countries, and the economy; and ways of mitigating the risk of infection.
The study of people’s emotions shifting from fear to anger was presented in [27]. The authors used over 20 million tweets during the COVID-19 outbreak to study people’s sentiment and its evolution. A list of topics, sentiments, and emotion attributes was used to annotate a dataset containing 63 million tweets [28]. The study reported basic descriptive statistics about the discussed topics, sentiments, and emotions and their temporal distributions. Monitoring depression trends on Twitter during the COVID-19 pandemic has been proposed in [29]. The authors collected a large English Twitter depression dataset containing 2575 users with their past tweets. They trained three classification models to examine the effect of depression on people’s Twitter language. Deep-learning model scores with psychological text features and user demographic information were combined to investigate these features with depression signals. Understanding the mindset of Indian people during the lockdown using Twitter data has been proposed in [30]. The authors used Python and R statistical software to analyze the collected dataset.
COVID-19 places more pressure on doctors and other health care workers [31]. Such pressure brings a high risk of psychological distress for doctors. To reduce mental health stigma in clinical workplaces, the study suggested supporting doctors and their families during the pandemic. An investigation of the mental health status of health staff was carried out in [32] to identify the population requiring psychological intervention. A survey was conducted to investigate three mental health issues: psychological distress, depressive symptoms, and anxious symptoms. The study recommended that the health workers in high-risk departments should receive psychiatric interventions. The health authorities have to consider setting up mental health teams for dealing with mental health problems and give psychological help and support to patients and health care workers [33]. The study suggested using web applications to monitor and evaluate the stress, anxiety, and depression levels of health care workers and provide treatment and diagnosis for them. Another research study conducted a questionnaire for 145 healthcare professionals working on COVID-19 wards in Italy [34]. The study asked healthcare workers to provide sociodemographic and clinical information in order to understand quality-of-life and mental health issues including anxiety and depression. The results show that healthcare professionals reported higher levels of depression and stress symptoms. The study suggested that healthcare workers developed mental health symptoms when working with COVID-19-infected people.
For the detection of the symptoms from Arabic text on social media, a research study has been presented in [35]. The authors used tweets in Arabic to identify the most common symptoms reported by COVID-19-infected people. The top three symptoms that were reported were fever, headaches, and anosmia. The analysis of the spread of influenza in the UAE has been proposed in [36]. The analysis was performed based on Arabic Twitter data. The authors proposed a system that can filter and extract features from the tweets and classify them into three categories: self-reporting, non-self-reporting, and non-reporting. The tweets were used to predict the number of future hospital visits using a linear regression classifier. Different topics about the spread of COVID-19 have been investigated in [37]. The study examined how Arabic people reacted to the COVID-19 pandemic on Twitter from different perspectives, including symptoms and negative emotions such as sadness.
As social media become important and ubiquitous for social networking, they support backchannel communications and allow for wide-scale interaction [38]. Social media content can be used to build tracking systems in many applications. The tracking of sentiment on news entities over time [39] is one of the systems that attract researchers in monitoring socio-politics issues. Sentiment-spike detection has been presented in [40]. The authors used Twitter data and analyzed the sentiment towards 70 entities from different domains. Tracking health trends from social media has been studied in [41]. The authors introduced an open platform that uses crowd-sourced labeling of public social media content. Tourism is another domain for which tracking systems have been built; specifically, tracking systems have been used to monitor tourist comments being written on social media [42]. The authors used a sentiment analysis lexicon to track the opinion of the tourists in Tunisia. Monitoring people’s opinions before or during elections is useful to track and analyze the campaigns using Twitter data [43]. The study identified the topics that were most causal for public opinion, and show the usefulness of measuring public sentiment during the election. Tracking and monitoring earthquake disasters from Weibo Chinese social media content was proposed in [44]. The authors focused on how to detect disasters from massive amounts of data on a micro-blogging stream. They used sentiment analysis to filter the negative messages to carry out incident discovery in a post-disaster situation.
To summarize, building a tracking system on top of social media content is very useful for governments and decision-makers. Most of the proposed work has been done for English data with fewer contributions in Arabic. The research related to the emotions and symptoms in Arabic text did not focus on the monitoring and tracking of emotions and symptom evolution over time. That motivated us to conduct our research on Arabic data to help health authorities, governments, and decision-makers to understand people’s emotions during the COVID-19 pandemic.

3. Methodology

The architecture of the proposed monitoring system is presented in Figure 1. The proposed framework starts with data collection from Twitter. Arabic emotion lexicons were used to annotate a list of 300,000 tweets using a rule-based approach. We then use deep learning classifiers to ensure the quality of the annotated tweets. The best deep learning model was used to annotate the large unlabeled dataset. A list of COVID-19 symptom keywords was extracted and prepared to be used in symptom mentions classification. The last step was to store the annotated dataset in a database to be used to perform monitoring tasks. This work intended to build an Arabic temporal corpus that can be used for the monitoring of people’s emotions and symptom mentions during the COVID-19 pandemic. To address the need, the development of the corpus has four steps: 1) data collection, 2) data preprocessing, 3) emotion annotation, and 4) symptom tweet detection.

3.1. Data Collection

We used Twint, an advanced Twitter scraping tool, to collect data from Twitter and build our dataset. The tool has an option to filter tweets based on the language; therefore, we selected the Arabic language to retrieve Arabic tweets only. The collected tweets spanned the period from 1 January 2020 to 31 August 2020. To collect the relevant Twitter data, we explored the trending and most popular hashtags in Arab countries. A list of hashtags was prepared to be used as search keywords to retrieve relevant tweets from Twitter during the pandemic, as shown in Table 1. This approach focuses on getting all of the tweets that are talking about COVID-19 in the context of Arab countries. Along with these hashtags, we used multiple queries to build joining terms related to COVID-19 with the name of all Arab world countries such as كورونا_السعودیة# “#Corona_Saudia”, كورونا_الكویت# “#Corona_Kuwait”, etc.
The result of data collection using popular hashtags and keywords is a dataset that contains around 5.5 million unique tweets that are posted by 1.4 million users with an average of approximately four tweets per user. Table 2 shows the top 20 users in terms of the number of tweets in our dataset. The table shows that 14 accounts from the top 20 list are verified accounts.

3.2. Corpus Statistics

Figure 2 and Figure 3 show the distribution of tweets in our dataset on a daily and monthly basis, respectively. It is shown that 21 March 2020 was the day with the most tweeting with more than 150,000 tweets, while March is the top in terms of the number of tweets with more than 1.44 million tweets.
The spike on 21 March is due to several reasons, such as the following:
  • There was a total of 51,448 hashtags on that day.
  • There was a total of 4819 unique hashtags on that day.
  • By observing the top hashtags, we found that most of them were from Saudi Arabia where there was the imposition of closing shops and then issuing a curfew in large cities of the Kingdom.
  • One day before that date, the Saudi government suspended all domestic flights, buses, taxis, and trains for 14 days.

3.3. Data Preprocessing

As the collected tweets were crawled from social media, the data are expected to be noisy and should be cleaned up before performing any of the NLP tasks to get better results. The first step in text preprocessing is to remove URLs, hashtags, mentions, and media. Additionally, we performed a normalization process to unify the shape of some Arabic letters that have different shapes. Furthermore, the repeated characters problem was resolved by removing the extra repeated letters to return the word to its correct syntax. For example, the “كورووووونا” “Corona” is replaced with “كورونا” by removing the multiple occurrences of the character “و” with a single character. The diacritics are used to add decorations to the text especially with text posted on social media except for the text from the Holy Quran. These diacritics were removed from the tweets. Finally, the duplicated tweets were already excluded during the crawling process by looking for the tweet’s ID.

3.4. Emotion Tweets Annotation

3.4.1. Rule-based Emotion Annotation

To annotate part of the dataset, we first used the six emotions of the Arabic emotion lexicon (https://github.com/motazsaad/emotion-lexiconfrom (accessed on 18 February 2021)) [37]. Detail about these lexicons is shown in Table 3. The lexicon words and phrases were used to retrieve tweets from the dataset using the rule-based (if-else) approach. We extracted 50,000 tweets against each emotion category. After performing emotion classification using the rule-based technique, we selected a sample of 300 tweets and annotated them manually to ensure the quality of the rule-based annotation technique. We also used an LSTM deep neural network to classify all tweets that were annotated using this annotation technique. The experiment and results are shown in the result section.

3.4.2. Automatic Emotion Annotation

We used a FastText (https://fasttext.cc/ (accessed on 18 February 2021)), a neural network library to automatically annotate the unlabeled tweets (around 5.2 million) as illustrated in Figure 4. The FastText algorithm is an open-source NLP library developed by Facebook AI. It is a fast and excellent tool to build NLP models and generate live predictions. The algorithm was tuned with the following parameters: a learning rate of 0.5, an n-gram of 1, and the number of epochs of 50. We selected this algorithm because we are dealing with a large dataset where the prediction task using the traditional machine learning algorithms is very slow.
We fed the FastText algorithm with tweets that had been annotated using emotion lexicons from the previous section. A FastText model was built using the input tweets and the unlabeled tweets passed through the created model to predict their emotion classes and probabilities towards the predicted classes, as shown in Figure 4. To ensure the quality of the new annotation approach, we defined a threshold value that considers tweets that have a probability value greater than or equal to the threshold to be added to the new labeled tweets to expand the rule-based dataset. The tweets that have a probability value less than the defined threshold were ignored as these tweets have a lower confidence value and they are mostly neutral. The threshold value was selected carefully after performing several experiments as described in the experiment section. We also trained a deep learning classifier to test the quality of the expanded dataset. Algorithm 1 shows the pseudo code explaining how the automatic emotion annotation worked.
Algorithm 1 Automatic Emotion Annotation Algorithm
Data: (LabeledData (300K Tweets), UnlabeledData (5.2M Tweets))
Result: NewLabeledData
TrainingSet = LabeledData;
UnlabeledData = UnlabeledData;
NewLabeledData;
ThresholdValue = 0.8
//training fastText model on TrainingSet
fastTextModel = TrainClassifier (TrainingSet);
while (t ≤ UnlabeledData.size()) do
//predict most likely emotion classes of t from fastTextModel
emotionClass = fastTextModel.predict(UnlabeledData(t));
//predict most likely emotion probabilities of t from fastTextModel
emotionClassProbability = fastTextModel.predict-prob(UnlabeledData(t));
if (emotionClassProbability ≥ ThresholdV alue) then
   NewLabeledData.Add(t and emotionClass);
End
end

3.5. Symptom Tweets Annotation

To detect tweets that had symptom keywords, we prepared a list of words that represent the COVID-19 symptoms manually, after reading and translating the symptoms keywords from the WHO website. For example, the symptom word “fever” “حمى” was considered as a keyword to extract all tweets mentioning this word along with its derivations such as الحمى،بالحمى،والحمى. We collected more than 500 keywords to be used as a lexicon for the COVID-19 symptoms. We then used a rule-based approach to classify the tweets into symptom or non-symptom using a set of “if-then” rules. A tweet was considered to be a symptom tweet if it contained one or more keywords. The following is a list of COVID-19 symptoms as mentioned on the WHO website:
Most common symptoms:
  • Fever
  • Tiredness
  • Dry cough
Less common symptoms:
  • Loss of smell or taste
  • Pains and aches
  • Headache
  • Sore throat
  • Diarrhea
  • Conjunctivitis
  • Rash on skin
  • Discoloration of fingers or toes
Serious symptoms:
  • Difficulty breathing or shortness of breath
  • Chest pain or pressure
  • Loss of speech or movement

3.6. Deep Learning Architecture

Deep neural networks are used to classify images, speech, and text using multiple processing layers with non-linear transformations. They can model high-level abstraction in data. LSTM is a special type of recurrent neural network (RNN) that is suitable for solving problems that require sequential information such as text classification tasks, using units with internal states that can remember information for long periods.
The neural network used in this research employs word embeddings with LSTM to perform the contextual text classification tasks. The preprocessed tweets were fed to the neural network, and we used the emotion classes to perform supervised learning on the tweets. Figure 5 shows the neural network layers and we describe each layer below.

3.6.1. Embedding Layer

The first layer in the network is the embedding layer that converts integer indices in the input into a dense real valued vector of fixed size. The embedding layer learns a mapping that embeds each word in the discrete vocabulary to a lower dimensional continuous vector space. The use of this layer enables the extraction of semantic features from the input without performing a manual definition of features. The output of the embedding layer is fed to the next layer in the network.

3.6.2. LSTM Layer

LSTM layer is the second layer in the network. The LSTM unit is a memory cell composed of four main components: 1) input gate, 2) self-recurrent connection, 3) forget gate, and 4) output gate. The input gate decides what new information to store in the cell state. The forget gate allows the cell state to remember or forget its previous state by controlling the cell’s self-recurrent connection. Similar to the input gate, the output gate either allows or prevents the cell state from affecting other units.

3.6.3. Dropout Layer

To avoid overfitting or dropout, a regularization technique is used by randomly dropping a fraction of the units while in the training step. This layer prevents units from co-adapting.

3.6.4. Fully Connected Layer

This layer is fully connected to all of the activations in the dropout layer. It is used to learn non-linear combinations of the high-level features learned by the previous layers.

3.6.5. Loss Layer

The last layer in the network is the loss layer. This layer determines how the deviations of the predicted classes from the actual classes are penalized. Since we are interested in both the multi-class and binary classification of tweets, we use binary_crossentropy and categorical_crossentropyas the loss functions in emotion and symptom classification, respectively.

4. Experiments

4.1. Dataset

We used a dataset consisting of 5.5 million tweets to perform both emotion and symptom classification as described in the “data collection” section. Table 4 shows statistical details about the dataset. The dataset contains more than 100 million words and 2.65 million unique words.

4.2. Experimental Setup

In this research, we performed multiclass classification (emotion detection) and binary classification (symptom and non-symptom detection). For the training and classification, an LSTM deep learning classifier was used to perform our experiments. We mapped each tweet in the corpus into a word embedding, which is a popular technique when working with text with a sequence length of 300 and embedding dimension of 100. We also limited the total number of words that we were interested in modeling to the 20,000 most frequent words. As described before, LSTM is a special kind of recurrent artificial neural network (RNN) architecture used in deep learning. It was designed to avoid the long-term dependency problem by remembering information for long periods. This makes LSTM suitable for problems that require sequential information, such as text processing tasks.

4.3. Evaluation Metrices

All the results are reported using the accuracy, precision, recall, and F1-measureas follows:
Accuracy = TP + TN TP + TN +   FP + FN
Precision = TP TP + FP
Recall = TP TP + FN
F 1 measure = 2 × Precision × Recall Precision + Recall
where the TP is the cases in which we predicted YES and the actual output was also YES, FP is the cases in which we predicted YES and the actual output was NO, TN is the cases in which we predicted NO and the actual output was NO, and FN is the cases in which we predicted NO and the actual output was YES.

4.4. Emotion Classification Results

4.4.1. Rule-Based Classification Results

In this section, the output of the rule-based classification, containing 300,000 tweets and representing all emotion categories (six emotion categories and 50,000 tweets for each category), was used to perform the first experiment. The sample tweets represent 5.46% of the dataset. The tweets were used to train an LSTM deep learning model (80% training and 20% testing). The results of the rule-based emotion classification are shown in Figure 6.
To evaluate the results obtained from the rule-based annotation, we randomly selected a list contains 360 tweets (60 tweets in each emotion class) from the large dataset. Two human annotators were asked to annotate the given tweets. Lists of guidelines were given to the annotators to understand how to annotate the sample list correctly. Both annotators agreed to annotate 300 tweets with the same emotion classes. We compared the results obtained from the rule-based annotation with the ground truth sample tweets which were annotated by humans. The confusion matrix of the comparison between the ground truth and the results obtained from the automatic annotation is shown in Table 5.
In Table 6, the results show that our rule-based emotion classification achieved~83% F1-score by comparing the results of rule-based annotation with the human annotation using 300 tweets.

4.4.2. Automatic Classification Results

The automatic annotation was explained in the annotation section. The tweets that have a confidence value greater than or equal to the defined threshold were added to represent the automatically annotated dataset. We perform this step to avoid neutral tweets and consider only tweets that have a high probability toward any of the emotion categories. This step helps in to ensure the quality of the developed corpus. We used the LSTM classifier to perform several experiments to obtain the best threshold value. Table 7 shows the distribution of emotion tweets after applying the threshold condition on the probability of emotion classes towards the tweets in our dataset. From the tweets distribution table below, it is clearly shown that increasing the threshold value decreases the number of tweets in the dataset and vice versa. Applying 90, 80, and 70% threshold values lead to 1.2 million, 1.67 million, and 2.16 million tweets, respectively. Each dataset was split into 80 and 20% for training and testing, respectively. Three classification experiments were conducted on these three datasets. Figure 6 shows that the classification results improved after expanding the rule-based emotion dataset. It shows that increasing the threshold value leads to increasing the quality of the classification results.
The below figure shows that selection of a 90% threshold value gives higher results and fewer tweets. Similarly, decreasing the threshold value gives low results and more tweets. The results show that we can improve the results from 90% (F1-score) using the rule-based technique to 97% (F1-score) using the automatic annotation technique.

4.5. Symptom Classification Results

For the symptom classification, we first use a total of 200,000 tweets to train CNN and LSTM models. The sample tweets represent 3.64% of the dataset. The tweets were labeled using a dictionary of symptom words using a rule-based approach as described in the methodology. The result of symptom classification using an LSTM deep learning classifier after splitting the dataset into 80% for training and 20% for testing is shown in Table 8.
Similarly, as in automatic emotion classification, we use a neural network to classify unlabeled tweets in our dataset automatically. Table 9 shows the distribution and percentage of symptom and non-symptom tweets. It is shown that the symptom tweets represent almost one-third of the dataset with more than 2 million tweets.

4.6. Monitoring System

Emotion monitoring from social media data helps in tracking the distribution of emotions, symptoms, and user interactions.

4.6.1. Monitoring Emotion Distribution

In this section, we show the distribution of tweets in six emotion categories as depicted in Figure 7. It is shown that the day of March 21 had the highest number of tweets in all emotion categories (Anger = 9571, Disgust = 5311, Fear = 8279, Joy = 23,098, Sadness = 5869, Surprise = 4120). It is also clear that public emotions decreased over time. People’s feelings are variable, but were at their highest during March.

4.6.2. Monitoring Symptom Distribution

The symptom and non-symptom tweet distributions are shown in Table 9. Symptom and non-symptom tweets distribution. The 21st of March has the highest number of tweets that mention symptom keywords with 56,242 tweets, as shown in Figure 8.

4.6.3. Monitoring User Interactions

In this section, we illustrate the tracking of the user interaction with all tweets in our dataset and the user interaction with emotions and symptoms tweets. In Figure 9, most of the user interactions are likes and re-tweets with 49.2 million and 15.4 million, respectively. The total number of replies is 5.8 million. The highest number of interactions (likes, re-tweets, and replies) was on 21 March with 1.46 million, 500,000, and 200,000, respectively.
In Figure 10, we show the user interactions with emotion categories. The joy category constitutes 36% of the total tweets in our dataset; it is the highest in terms of the number of likes, replies, and re-tweets with 22 million, 4 million, and 7 million, respectively. The second largest category in terms of the number of likes, replies, and re-tweets is the anger category with 8 million, 1.2 million, and 2.4 million, respectively. The third largest category is the fear category with 6.4 million, 1 million, and 1.9 million likes, replies, and re-tweets, respectively. The categories disgust, sadness, and surprise came in fourth, fifth, and sixth, respectively.
In Figure 11, we show the user interaction with non-symptom and symptom tweets. One of the findings is that the users interact with non-symptom tweets using likes and replies while they interact with symptom tweets using re-tweet, which means that users propagate most of the tweets that mention any of the COVID-19 symptoms to warn others. They interact with non-symptom tweets with 25.9 million, 4.4 million, and 7.1 million while they interact with symptom tweets with 23.4 million, 4.1 million, and 8.3 million likes, replies, and re-tweets, respectively.

5. Use Cases

As we are concerned with few emotion categories such as fear and anger along with tweets that mentioned COVID-19 symptoms, we will answer the following questions:
  • What do people fear?
  • Why are people angry?
  • What are the symptoms that cause people anxiety and fear?
To answer the above questions, we need to extract the discussed topics during the pandemic in both the fear and anger emotion categories and the topics of symptom tweets as well to understand people’s concerns during the pandemic.

5.1. Anger Emotion Tweets

As the anger emotion came second in terms of the number of tweets and user interactions, we will show the most discussed topics in this category in order to understand why people are angry. We selected the tweets in March as there are 1.44 million tweets. We used Latent Dirichlet Allocation (LDA), a topic modeling technique to extract the topics discussed during March. We used perplexity to estimate the optimal number of topics as shown in Figure 12. The graph shows that with eight topics, we got the highest coherence score of 0.364.
The following are the top eight most discussed topics:
  • An increase in COVID-19-infected people due to the outbreak of the corona virus epidemic.
  • The call to stay at home to curb the spread of the epidemic.
  • Wars continuing despite the spread of the epidemic in some countries, such as Yemen.
  • Anger and accusations of China spreading the virus.
  • The possibility of death from infection with the virus and neglect of governments.
  • The carelessness of people during the time of the pandemic.
  • Anger over China’s deliberate transmission of the pandemic to Muslims.
  • The risk of transmitting the pandemic via arrivals from Iran.

5.2. Fear Emotion Tweets

The fear category came third in terms of the number of tweets and user interactions. Similar to the detection of anger topics, we use LDA to detect the discussed topics during March. We used perplexity to estimate the optimal number of topics, as shown in Figure 13. The graph shows that the highest coherence value was obtained with four topics.
The following are the top five topics that are discussed as follows:
  • The government’s role in fighting the epidemic.
  • The collapse of countries’ economies due to the closure of borders.
  • The fear of infection and death from the virus and praying to God to raise the epidemic.
  • Fear of the long stay-at-home quarantine.

5.3. Symptom Tweets

We used LDA to detect the top five topics discussed in March. We used perplexity to estimate the optimal number of topics, as shown in Figure 14. The graph shown is that the optimal number of topics is four. We got a coherence score of “0.415”.
The following are the detected topics:
  • Discussion about disease and treatment.
  • Discussion about ways to prevent corona.
  • Prayers for healing for the COVID-19-infected people.
  • Exposure to some symptoms of infection with the corona virus, such as the throat.

6. Discussion and Limitation

The major finding in our research is to develop the first Arabic emotion dataset for monitoring people’s mental health issues during theCOVID-19 pandemic. We address these using COVID-19 messages on Twitter. In this research, we are concerned with the analysis of six emotion categories only between 1 January and 31 August 2020.
A combined approach of rule-based (Lexicon-based) and automatic annotation (neural networks) was useful for annotating our dataset. The results showed several essential points. First, the rule-based annotation approach helps in the annotation of part of our dataset and the outcomes of this step are a total of 300,000 tweets distributed among six emotion classes. The manual verification of the rule-based annotation in this step gives an 83% F1-score, which is promising compared with the size of the dataset. Second, the automatic annotation using neural networks is used to expand the dataset, which was annotated using the rule-based approach. The results of expansion show better results if we increase the probability of an emotion class towards the tweet. It also shows that we can improve the emotion classification from rule-based by 5.9, 4.8, and 0.9% using threshold values of 90, 80, and 70%, respectively.
The monitoring system shows that most of the tweets in our dataset were posted in March in both emotion and symptom tweets. It is also shown that more tweets are reflect people’s anger and fear. The system also depicted that people interact with COVID-19 non-symptom tweets using likes and replies while they use re-tweets to share the tweets that mention COVID-19 symptoms. The study concludes with multiple points that explain the cause of anger and fear during the pandemic.

7. Conclusion and Future Work

In this research, we present a monitoring system for people’s emotions and symptoms mentions during the COVID-19 pandemic. We use Twitter as a data source to collect 5.5 million tweets spanning from January to August 2020. Each tweet in the dataset was labeled with the emotion category—anger, disgust, fear, joy, sadness, and surprise— and symptom or non-symptom. We used rule-based and neural network approaches to annotate our dataset using emotion and symptom lexicons. We used these annotated tweets to build multiple deep learning models using an LSTM neural network to verify the quality of the datasets. After annotating all of the tweets in our dataset, we built a monitoring system to track the people’s emotions and symptom mentions to understand the change in people’s emotions and extract more symptoms.
Our findings facilitate the understanding of people’s emotions during the epidemic and can be used to track people’s emotions which lead to or indicate mental health issues. The monitoring system should help governments, health authorities, and decision-makers to reassure people and dispel their fear of the pandemic, in cooperation with psychologists, psychiatrists, and doctors.
In the future, and as our emotion categorization is limited to six emotion classes, we are planning to expand the emotion classes by adding more emotion categories. We also plan to build a web-based monitoring system that will crawl tweets and annotate tweets in real-time. The system will be helpful in monitoring people’s emotions during any future epidemic.

Author Contributions

Conceptualization, A.A.-L.; investigation, M.A.; methodology, A.A.-L.; project administration, M.A.; resources, A.A.-L.; software, A.A.-L.; supervision, M.A.; validation, M.A.; visualization, A.A.-L.; writing—original draft, A.A.-L.; writing—review & editing, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Prince Sultan University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code datasets are freely available for research purposes in this link: https://github.com/yemen2016/COVID-19-Arabic-Emotion-Dataset (accessed on 18 February 2021).

Acknowledgments

Authors are thankful to Prince Sultan University, Saudi Arabia for providing the fund to carry out the work.

Conflicts of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

  1. Kanitkar, T. The COVID-19 lockdown in India: Impacts on the economy and the power sector. Glob. Transit. 2020, 2, 150–156. [Google Scholar] [CrossRef]
  2. Iadecola, C.; Anrather, J.; Kamel, H. Effects of COVID-19 on the Nervous System. Cell 2020, 183, 16–27.e1. [Google Scholar] [CrossRef]
  3. Weinberger, D.M.; Chen, J.; Cohen, T.; Crawford, F.W.; Mostashari, F.; Olson, D.; Pitzer, V.E.; Reich, N.G.; Russi, M.; Simonsen, L.; et al. Estimation of Excess Deaths Associated With the COVID-19 Pandemic in the United States, March to May 2020. JAMA Intern. Med. 2020, 180, 1336–1344. [Google Scholar] [CrossRef] [PubMed]
  4. Alyami, M.; Henning, M.; Krägeloh, C.U.; Alyami, H. Psychometric Evaluation of the Arabic Version of the Fear of COVID-19 Scale. Int. J. Ment. Heal. Addict. 2020, 1–14. [Google Scholar] [CrossRef]
  5. Al-Musharaf, S. Prevalence and Predictors of Emotional Eating among Healthy Young Saudi Women during the COVID-19 Pandemic. Nutrients 2020, 12, 2923. [Google Scholar] [CrossRef]
  6. Olimat, S.N. COVID-19 Pandemic: Euphemism and Dysphemism in Jordanian Arabic. GEMA Online J. Lang. Stud. 2020, 20, 268–290. [Google Scholar] [CrossRef]
  7. Essam, B.A.; Abdo, M.S. How Do Arab Tweeters Perceive the COVID-19 Pandemic? J. Psycholinguist. Res. 2020, 1–15. [Google Scholar] [CrossRef]
  8. Mavragani, A. Tracking COVID-19 in Europe: Infodemiology Approach. JMIR Public Heal. Surveill. 2020, 6, e18941. [Google Scholar] [CrossRef] [Green Version]
  9. Sousa-Pinto, B. Assessment of the Impact of Media Coverage on COVID-19–Related Google Trends Data: Infodemiology Study. J. Med. Int. Res. 2020, 22, e19611. [Google Scholar] [CrossRef]
  10. Rajan, A.; Sharaf, R.; Brown, R.S.; Sharaiha, R.Z.; Lebwohl, B.; Mahadev, S. Association of Search Query Interest in Gastrointestinal Symptoms With COVID-19 Diagnosis in the United States: Infodemiology Study. JMIR Public Heal. Surveill. 2020, 6, e19354. [Google Scholar] [CrossRef]
  11. Hernández-García, I.; Giménez-Júlvez, T. Assessment of health information about COVID-19 prevention on the internet: Infodemiological study. JMIR Public Health Surveill. 2020, 6, e18717. [Google Scholar] [CrossRef] [Green Version]
  12. Ahmed, W.; Bath, P.A.; DeMartini, G. Chapter 4: Using Twitter as a Data Source: An Overview of Ethical, Legal, and Methodological Challenges. In Virtue Ethics in the Conduct and Governance of Social Science Research; Emerald Publishing Limited: Bingley, UK, 2017. [Google Scholar]
  13. McClellan, C.; Ali, M.M.; Mutter, R.; Kroutil, L.; Landwehr, J. Using social media to monitor mental health discussions − evidence from Twitter. J. Am. Med. Inf. Assoc. 2017, 24, 496–502. [Google Scholar] [CrossRef] [Green Version]
  14. Sinnenberg, L.E.; DiSilvestro, C.L.; Mancheno, C.; Dailey, K.; Tufts, C.; Buttenheim, A.M.; Barg, F.; Ungar, L.; Schwartz, H.; Brown, D.; et al. Twitter as a Potential Data Source for Cardiovascular Disease Research. JAMA Cardiol. 2016, 1, 1032–1036. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Mai, E.; Hranac, R. Twitter interactions as a data source for transportation incidents. In Proceedings of the Transportation Research Board 92nd Annual Meeting, Washington, DC, USA, 13–17 January 2013. [Google Scholar]
  16. Nakhasi, A.; Bell, S.G.; Passarella, R.J.; Paul, M.J.; Dredze, M.; Pronovost, P.J. The Potential of Twitter as a Data Source for Patient Safety. J. Patient Saf. 2019, 15, e32–e35. [Google Scholar] [CrossRef] [PubMed]
  17. Nawaz, H.; Ali, T.; Al-laith, A.; Ahmad, I.; Tharanidharan, S.; Nazar, S.K.A. Sentimental Analysis of Social Media to Find Out Customer Opinion. In Proceedings of the International Conference on Intelligent Technologies and Applications, Bahawalpur, Pakistan, 23–25 October 2018. [Google Scholar]
  18. Shivhare, S.N.; Khethawat, S. Emotion detection from text. arXiv 2012, arXiv:1205.4944. [Google Scholar]
  19. Chun, S.A.; Li, A.C.-Y.; Toliyat, A.; Geller, J. Tracking Citizen’s Concerns during COVID-19 Pandemic. In Proceedings of the The 21st Annual International Conference on Digital Government Research, Seoul, Korea, 15–19 June 2020; Association for Computing Machinery (ACM): New York, NY, USA, 2020. [Google Scholar]
  20. Park, H.W.; Park, S.; Chong, M. Conversations and Medical News Frames on Twitter: Infodemiological Study on COVID-19 in South Korea. J. Med Internet Res. 2020, 22, e18897. [Google Scholar] [CrossRef]
  21. Guntuku, S.C.; Sherman, G.; Stokes, D.C.; Agarwal, A.K.; Seltzer, E.; Merchant, R.M.; Ungar, L.H. Tracking Mental Health and Symptom Mentions on Twitter During COVID-19. J. Gen. Intern. Med. 2020, 35, 2798–2800. [Google Scholar] [CrossRef]
  22. Mathur, K.P.; Vaidya, S. Emotional Analysis using Twitter Data during Pandemic Situation: COVID-19. In Proceedings of the 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 10–12 June 2020. [Google Scholar]
  23. Almouzini, S.; Khemakhem, M.; Alageel, A. Detecting Arabic Depressed Users from Twitter Data. Procedia Comput. Sci. 2019, 163, 257–265. [Google Scholar] [CrossRef]
  24. Mukhtar, S. Pakistanis’ mental health during the COVID-19. Asian J. Psychiatry 2020. [Google Scholar] [CrossRef] [PubMed]
  25. Su, Y.; Xue, J.; Liu, X.; Wu, P.; Chen, J.; Chen, C.; Liu, T.; Gong, W.; Zhu, T. Examining the Impact of COVID-19 Lockdown in Wuhan and Lombardy: A Psycholinguistic Analysis on Weibo and Twitter. Int. J. Environ. Res. Public Heal. 2020, 17, 4552. [Google Scholar] [CrossRef]
  26. Abd-Alrazaq, A.; Alhuwail, D.; Househ, M.; Hamdi, M.; Shah, Z. Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study. J. Med Internet Res. 2020, 22, e19016. [Google Scholar] [CrossRef] [Green Version]
  27. Lwin, M.O.; Lu, J.; Sheldenkar, A.; Schulz, P.J.; Shin, W.; Gupta, R.; Yang, Y. Global Sentiments Surrounding the COVID-19 Pandemic on Twitter: Analysis of Twitter Trends. JMIR Public Heal. Surveill. 2020, 6, e19447. [Google Scholar] [CrossRef]
  28. Gupta, R.K.; Vishwanath, A.; Yang, Y. COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes. Arxiv 2020, arXiv:2007.06954, 2020. [Google Scholar]
  29. Zhang, Y. Monitoring Depression Trend on Twitter during the COVID-19 Pandemic. arXiv 2020, arXiv:2007.00228, 2020. [Google Scholar]
  30. Dimple, C.; Parul, G.; Payal, G. COVID-19 pandemic lockdown: An emotional health perspective of Indians on Twitter. Int. J. Soc. Psychiatry 2020. [Google Scholar] [CrossRef]
  31. Galbraith, N.; Boyda, D.; McFeeters, D.; Hassan, T. The mental health of doctors during the COVID-19 pandemic. BJPsych Bull. 2020, 1–4. [Google Scholar] [CrossRef] [PubMed]
  32. Liu, Z.; Han, B.; Jiang, R.; Huang, Y.; Ma, C.; Wen, J.; Zhang, T.; Wang, Y.; Chen, H.; Ma, Y. Mental health status of doctors and nurses during COVID-19 epidemic in China. 2020. preprint; SSRN 3551329. [Google Scholar] [CrossRef]
  33. Spoorthy, M.S.; Pratapa, S.K.; Mahant, S. Mental health problems faced by healthcare workers due to the COVID-19 pandemic–A review. Asian J. Psychiatry 2020, 51, 102119. [Google Scholar] [CrossRef]
  34. Di Tella, M.; Romeo, A.; Benfante, A.; Castelli, L. Mental health of healthcare workers during the COVID -19 pandemic in Italy. J. Eval. Clin. Pr. 2020, 26, 1583–1587. [Google Scholar] [CrossRef] [PubMed]
  35. Alanazi, E.; Alashaikh, A.; AlQurashi, S.; Alanazi, A. Identifying and Ranking Common COVID-19 Symptoms From Tweets in Arabic: Content Analysis. J. Med. Internet Res. 2020, 22, e21329. [Google Scholar] [CrossRef]
  36. Alkouz, B.; Al Aghbari, Z. Analysis and prediction of influenza in the UAE based on Arabic tweets. In Proceedings of the 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), Shanghai, China, 9–12 March 2018. [Google Scholar]
  37. Saad, M.K. Mining Documents and Sentiments in Cross-Lingual Context. Mining Documents and Sentiments in Cross-Lingual Context; Université de Lorraine: Nancy, France, 2015. [Google Scholar]
  38. Sutton, J.N.; Palen, L.; Shklovski, I. Backchannels on the front lines: Emergency uses of social media in the 2007. In Southern California Wildfires; University of Colorado: Boulder, CO, USA, 2008. [Google Scholar]
  39. Al-Laith, A.; Shahbaz, M. Tracking sentiment towards news entities from arabic news on social media. Futur. Gener. Comput. Syst. 2021. [Google Scholar] [CrossRef]
  40. Giachanou, A.; Mele, I.; Crestani, F. Explaining sentiment spikes in twitter. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA, 24–28 October 2016. [Google Scholar]
  41. Alves, A.L.F. A spatial and temporal sentiment analysis approach applied to Twitter microtexts. J. Inf. Data Manag. 2015, 6, 118. [Google Scholar]
  42. Chaabani, Y.; Toujani, R.; Akaichi, J. Sentiment analysis method for tracking touristics reviews in social media network. In Proceedings of the International Conference on Intelligent Interactive Multimedia Systems and Services, Gold Coast, Australia, 20–22 May 2018. [Google Scholar]
  43. Contractor, D. Tracking political elections on social media: Applications and experience. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
  44. Bai, H.; Yu, G. A Weibo-based approach to disaster informatics: Incidents monitor in post-disaster situation via Weibo text negative sentiment analysis. Nat. Hazards 2016, 83, 1177–1196. [Google Scholar] [CrossRef]
Figure 1. Emotions and Symptoms Monitoring System.
Figure 1. Emotions and Symptoms Monitoring System.
Information 12 00086 g001
Figure 2. Tweets Distribution (Daily).
Figure 2. Tweets Distribution (Daily).
Information 12 00086 g002
Figure 3. Tweets Distribution (Monthly).
Figure 3. Tweets Distribution (Monthly).
Information 12 00086 g003
Figure 4. Automatic Emotion Annotation.
Figure 4. Automatic Emotion Annotation.
Information 12 00086 g004
Figure 5. Deep LearningArchitecture.
Figure 5. Deep LearningArchitecture.
Information 12 00086 g005
Figure 6. LSTM Classification Results.
Figure 6. LSTM Classification Results.
Information 12 00086 g006
Figure 7. Emotion Distribution.
Figure 7. Emotion Distribution.
Information 12 00086 g007
Figure 8. Symptom Distribution.
Figure 8. Symptom Distribution.
Information 12 00086 g008
Figure 9. User Interaction.
Figure 9. User Interaction.
Information 12 00086 g009
Figure 10. User Interaction (Emotions).
Figure 10. User Interaction (Emotions).
Information 12 00086 g010
Figure 11. User Interaction (Symptoms).
Figure 11. User Interaction (Symptoms).
Information 12 00086 g011
Figure 12. The optimal number of topics (Anger Tweets).
Figure 12. The optimal number of topics (Anger Tweets).
Information 12 00086 g012
Figure 13. The optimal number of topics (fear tweets).
Figure 13. The optimal number of topics (fear tweets).
Information 12 00086 g013
Figure 14. The optimal number of topics (Symptom Tweets).
Figure 14. The optimal number of topics (Symptom Tweets).
Information 12 00086 g014
Table 1. List of hashtags.
Table 1. List of hashtags.
Sr. #HashtagTranslationSr. #HashtagTranslation
1#كورونا#Coronavirus9#كورونا_قطر#corona_Qatar
2#كورونا_المستجد#new_corona10#كورونا_الاردن#corona_Jordan
3#كورونا_الجديد#new_corona11#كورونا_السعودية#corona_Saudi_Arabia
4#الحجر_الصحي#Quarantine12#كورونا_الكويت#corona_Kuwait
5#الحجر_المنزلي#Quarantine13#كورونا_لبنان#corona_Lebanon
6#خليك_في_البيت#Stay_home14#كورونا البحرين#corona_Bahrain
7#كورونا_العراق#corona_Iraq15#كورونا_مصر#corona_Egypt
8#كورونا_ايران#corona_Iran16#كورونا_اليمن#corona_Yemen
Table 2. Top 20 users in our dataset.
Table 2. Top 20 users in our dataset.
Sr. #AccountTitleTotal TweetsIs Verified?Followers
1@corona_newsاخبار كورونا فيروس11,952No15.6K
2@aawsat_newsصحيفة الشرق الاوسط11,579Yes4.3M
3@aljawazatksaالجوازات السعودية10,043Yes1.6M
4@newssnapnetNewsSnap6837No3.1K
5@menafnarabicMENAFN.com Arabic6447No1.3K
6@newsemaratyahاخبار الامارات UAE NEWS5954Yes185.5K
7@aljoman_centerمركز الجُمان5669Yes20.6K
8@misrtalateenصحيفة مصر تلاتين5324No1K
9@alahramالأهرامAlAhram5238Yes5.6M
10@alahramgateبوابة الأهرام5213Yes158.8K
11@rtarabicRTARABIC5181Yes5.3M
12@alainbrkالعين الإخبارية - عاجل4926Yes71.8K
13@alroeyaصحيفة الرؤية4818Yes748.9K
14@kuna_arكـــــــــــونا KUNA4417Yes993K
15@libanhuitLiban84192Yes19.2K
16@alghadtvقناة الغد4011Yes153.3K
17@arabi21newsعربي214002Yes919.3K
18@ch23newsChannel 233916No5.5K
19@newselmostaqbalالمستقبل3778No486
20@emaratalyoumالإمارات اليوم3741Yes2.2M
Table 3. Emotion lexicons statistics.
Table 3. Emotion lexicons statistics.
EmotionArabic Words/Phrase
Anger748
Disgust155
Fear425
Joy1156
Sadness522
Surprise201
Total3207
Table 4. Corpus Statistics.
Table 4. Corpus Statistics.
TitleNumber
Total Tweets5,499,318
Total Words100,788,175
Unique Words2,657,173
Unique Users1,402,874
Average Words per Tweet18.3
Table 5. Confusion matrix of the comparison between ground truth and automatic annotation.
Table 5. Confusion matrix of the comparison between ground truth and automatic annotation.
AngerDisgustFearJoySadnessSurprise
Anger4032113
Disgust4382231
Fear1441310
Joy2204411
Sadness2113421
Surprise1113143
Table 6. Results evaluation.
Table 6. Results evaluation.
#ClassPrecisionRecallF1 Score
1Anger0.800.800.80
2Disgust0.760.780.77
3Fear0.820.870.85
4Joy0.880.790.83
5Sadness0.840.860.85
6Surprise0.860.880.87
Average0.8260.830.828
Table 7. Emotion tweets distribution.
Table 7. Emotion tweets distribution.
Sr. #EmotionThreshold (90%)Threshold (80%)Threshold (70%)
1Anger213,189297,781381,629
2Disgust206,025330,530417,140
3Fear231,869326,931421,748
4Joy241,264283,606406,080
5Sadness197,406280,685358,142
6Surprise119,198151,022185,202
Total Tweets1,208,9511,670,5552,169,941
Table 8. Symptom Classification Results.
Table 8. Symptom Classification Results.
Deep Learning ClassifierAccuracyF1-Score (Macro Avg)F1-Score (Weighted Avg)
LSTM0.750.750.75
Table 9. Symptom and non-symptom tweets distribution.
Table 9. Symptom and non-symptom tweets distribution.
#TypeNumber of Tweets%
1Symptom Tweets2,034,74837%
2Non-symptom Tweets3,464,57063%
Total Tweets5,499,318
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Al-Laith, A.; Alenezi, M. Monitoring People’s Emotions and Symptoms from Arabic Tweets during the COVID-19 Pandemic. Information 2021, 12, 86. https://doi.org/10.3390/info12020086

AMA Style

Al-Laith A, Alenezi M. Monitoring People’s Emotions and Symptoms from Arabic Tweets during the COVID-19 Pandemic. Information. 2021; 12(2):86. https://doi.org/10.3390/info12020086

Chicago/Turabian Style

Al-Laith, Ali, and Mamdouh Alenezi. 2021. "Monitoring People’s Emotions and Symptoms from Arabic Tweets during the COVID-19 Pandemic" Information 12, no. 2: 86. https://doi.org/10.3390/info12020086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop