You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Analyzing research trends of sentiment analysis and its applications for Coronavirus disease (COVID-19): A systematic review

Abstract

COVID-19 epidemic is one of the worst disaster which affected people worldwide. It has impacted whole civilization physically, monetarily, and also emotionally. Sentiment analysis is an important step to handle pandemic effectively. In this work, systematic literature review of sentiment analysis of Indian population towards COVID-19 and its vaccination is presented. Recent exiting works are considered from four primary databases including ACM, Web of Science, IEEE Explore, and Scopus. Total 40 publications from January 2020 to August 2022 are selected for systematic review after applying inclusion and exclusion algorithm. Existing works are analyzed in terms of various challenges encountered by the existing authors with collected datasets. It is analyzed that mainly three techniques namely lexical, machine and deep learning are used by various authors for sentiment analysis. Performance of various applied techniques are comparative analyzed. Direction of future research works with recommendations are highlighted.

1Introduction

SARS-CoV-2 spreaded a new global epidemic known as COVID-19 which horrified and shaken the entire globe [11, 18]. Millions of people infected by this harmful virus during pandemic. The economic growth of many countries also affected due to strict travel restrictions implemented by the governments [13]. Total 581,831,612 infected people and 6,413,423 fatalities recorded all over the world due to COVID-19 [66]. Total 564,126,546 confirmed and 6,371,354 death cases reported globally [28]. India is one of the most affected country by this. Total 43,847,065 confirmed with 525,930 death cases reported in India from March 2020 to June 2022. Figure 1(a) and (b) depicts the total confirmed and death cases of COVID-19 in India till 1 June 2022. COVID-19 transmits from person to person by coughing aerosols, droplets, and by contacting contaminated object or surface.

Fig. 1

(a) Confirmed cases (b) Dead cases.

(a) Confirmed cases (b) Dead cases.

Enormous strain on healthcare infrastructure around the world has been seen due to this virus [5]. The scientific research community have also faced challenges to understand and trace behaviour of virus. Vaccination is one of the important and safest strategy to prevent this pandemic [2]. COVID-19 vaccination launched in India on 16 January 2021 for healthcare and front-line workers [22, 35]. After that, phase wise vaccination program has been conducted for common people. Mass vaccination is a challenging task due to growing anti-vaccination sentiments shared by the people. Emergence of social media allows users to express their opinions freely and openly [36]. Twitter is one of the social media used by many users to communicate with others by posting messages of 280 characters with emojis. Due to the severity of the COVID-19 outbreak, many people expressed their thoughts and feelings about this virus and vaccine on Twitter [68]. Many countries have implemented total lockdown and stay home campaign. Users have shared their opinion about COVID-19 and it vaccine on Twitter during lockdown period. As a result, Twitter has become the most prominent medium to discuss about effect of COVID-19 and its vaccinations. Sentiment Analysis is the way to evaluate the emotions expressed by users [37]. Such analysis helps to divide the opinions into various categories such as positive, negative, and neutral [23]. Emotions can be categorized by applying various techniques such as (i) classical lexical method, (ii) machine learning, and (iii) deep-learning techniques. This study provides a comprehensive assessment of various techniques applied by exiting authors for sentiment analysis of COVID-19 and its vaccines.

1.1Contributions

Numerous studies on COVID-19 sentiment classification have been conducted by various authors. This is the first review to examine Indian sentiments toward COVID-19 as well as its vaccination. The major contribution of this work is

  • - To collect the articles from numerous online sources and organise it based on the machine and deep learning models.

  • - To provide details of existing work in concise and understandable manner.

  • - To provide a classification and recommendation system for better understanding of existing work.

This study aims to identify data sources, data volume, and various techniques applied by exiting authors for sentiment analysis as well as future studies regarding COVID-19 and its vaccine. The rest of this paper is structured as follows: Section 2 introduces the article selection process. The various methodologies applied in existing work are briefly described in Section 3. Section 4 discusses the open issues and future research direction. Finally, Section 6 concludes this work with various recommendations.

2Capturing techniques of relevant articles

Complete online investigation is conducted with the peer-reviewed journals indexed in major online databases namely, web of science, IEEE, ACM, and Scopus. These digital libraries allow broad access of published articles. Systematic mapping technique is applied to examine the related articles of COVID-19 and its vaccination. Different steps are followed to collect relevant articles as shown in Fig. 2. It contains three steps (i) Article collection, (ii) Selection of article based on inclusion and exclusion criteria, and (iii) Screening of article identification and its eligibility. Figure 2 shows the screening procedure and inclusion criteria. Brief description about all steps are given in next sections.

Fig. 2

Flow of data collection method.

Flow of data collection method.

2.1Article collection

In this investigation, all COVID-19 and vaccination-related papers are retrieved from January 2020 to August 2022 without linguistic restrictions. Article search is started from scholarly repositories by using set of queries combined with AND and OR Boolean operators [26]. The sample of search queries are given as: “COVID-19” OR “COVID19” OR “coronavirus” OR “sars-cov-2” OR “SARS 2” AND “Sentiment Classification” AND “Sentiment Analysis ” AND ” Opinion Mining” OR “vaccine” OR “vaccination”. Additional search criteria is narrowed by using publication year (2020, 2021, 2022), language, duplication and document type. Total 354 articles are collected after performing a literature search including 56 web of science, 97 IEEE, 143 Scopus, and 39 ACM.

2.2Inclusion and exclusion criteria of papers

Further, most relevant papers are selected by adopting inclusion and exclusion parameters as shown in Table 1. Total 317 abstracts included for review after filtering out the duplicate articles.

Table 1

Criteria for inclusion and exclusion used in this investigation\label tab:my-table

Inclusion Criteria1.Publication year(s): either 2020 or 2021
2.Written in English
3.Original research work
4.The article should propose or use artificial-based learning models.
5.The article should either suggest or use COVID-19 or its vaccination sentiment analysis methods.
exclusion criteria1.Not published between 2020 or 2021
2.Articles Written in Languages Other Than English
3.Referring to websites, conferences, reviews, book chapters, and literature surveys
4.Related to diseases other than COVID-19
5.Only COVID-19-related article

2.3Screening of article Identification and its eligibility

Article screening is done by (i) checking the titles and abstracts, and (ii) comprehensive study of selected articles. First, abstracts and titles of the remaining 317 papers are reviewed. After that, full-text reading is performed and created a final set of articles that fulfilled the inclusion criteria defined for this investigation. Finally, total 40 papers are included for analysis after excluding 277 publications.

3Discussions and outcomes

Which approaches have been used for the development of COVID-19 and its vaccine sentiment analysis tools?

All collected articles are divided into three main categories: (i) first subcategory of articles covered lexicon based sentiment analysis, (ii) second group of articles addressed the machine learning-based emotion analysis, (iii) third group contains publications based on hybrid models of lexicon and machine learning techniques. Table 2 summarizes the COVID-19 tweet datasets used by the various authors for experimental analysis. Various techniques applied by the existing authors for sentiment analysis and classification are summarized in Table 3 and briefly described in next subsections.

Table 2

Techniques adopted for opinion mining by Existing Authors in the literature

RefLexiconLearning Model
[62]TextblobHTML5
[59]Minimum RedundancyBERT
Maximum Relevancy (mRMR)
[60]TextblobEnsemble Classifier
[39]TextblobSVM and Logistic Regression
[16]-BERT
[61]TextblobLDA
[6]TextblobLSTM
[1]VADERLDA
[9]SentiStrengthBERT and GraphBERT
[44]-Random forest
[45]RLDA
[24]TextBlob and VADERLinear SVC classifier
[58]Vader-
Table 3

Summary of dataset used by Existing Authors

RefSourceSizeDuration
[49]Twitter73,7602020
[8]Twitter24,000March 25th to March 28th 2020
[48]Twitter24,9985thApril to 6thApril 2020
[20]Instagram-April-May 2021
[1]Times of India and
The Hindu newspapers2,170May and June 2020
[62]Twitter1943706thMay 2020
[59]Twitter596,78420th Jan to 25th April 2020
[60]Twitter310023rd March to 01th Nov 2020
[25]Twitter8,84,11125th March to 09th june 2020
[21]Newspapers100,000January to December 2020
[16]Github [50]309023rd March and 15th July 2020
[61]Twitter189,888March to April 2020
[43]Twitter11,8154th January to 22nd March 2021
[6]Twitter1076723rd March to 02nd April 2021
[1]Twitter2170May and June 2020
[9]Twitter36,231,45716th January 2021 to 30th November 2021
[44]Twitter48,913March 2021 to June 2021
[45]Twitter50,00023rd March 2020 to 21th May 2020
[34]Sentiment
140 [4]128,09623rd March 2020 to 13th May 2020
[24]Twitter12,74105th April 2020 to 17th April 2020
[58]Twitter401,03703rd May 2021 to 29th August 2021

3.1Lexicon-based method

A lexicon based method for evaluating sentiments does not require human supervision [3, 54]. It determines the sentiment orientation based on the polarity value of the phrase [55]. Indian public opinion towards COVID-19 vaccination is determined by analyzing 73,760 tweets by Praveen et al. [49]. Authors have used topic modelling technique to grasp the difficulties of people with COVID-19 vaccination. They have have analyzed only 17% tweets are unfavourable while 47% are neutral. Many Indians are hesitant to get vaccinated due to fear of its adverse responses. Barkur et al. examined the Indian feelings about lockdown by using 24,000 tweets gathered between 25 March to 28 March 2020 [8]. Various unfavourable emotions such as anxiety, disgust, and sorrow towards lockdown are evaluated. A recent research used a multifaceted approach to explore the effect of COVID-19 epidemic on Indian emotions [48]. Authors analyzed positive attitudes of Indians toward lockdown and COVID-19. A web portal based on real-time tweet is proposed by Venigalla et al. to reflect the Indian sentiment during COVID-19 [62]. This platform allows visitors to check general sentiment of people of specific state on specific date and time.

Limitation of this study is that the current portal shows the state-level sentiment of few cities only.

Gupta et al. created a unique emotional care strategy to examine the heterogeneous linguistic data regarding COVID-19 [25]. Their study examined eight basic feelings across various topics such as ecology, security, healthcare, education, and economy. Twitter API is utilized to collect tweets from several regions of India. The published period of tweet is also chosen between time frame of 25 March to 09 June 2020. Only English tweets with the hashtags #COVID-19, #Lockdown, #LockdownDiaries, and #CoronaVirus are examined in their work. The emotion lexicon of National Research Council (NRC) is employed for sentiment evaluation which contains scores for various emotions such as anger, anticipation, trust, sorrow, pleasure, disgust, fear, and surprise. They have analyzed (9% -15%) reduction in happiness of people with 16% sad feelings and 18% scared about the health sector. Sv et al. investigated Indians perspectives regarding adverse effects of COVID-19 vaccines by using topic modelling technique [61]. Total 189,888 tweets containing “COVID Vaccine” and "Side effects” terms are collected by using Python Twint library. Authors have analyzed Indian emotions regarding risks associated with the COVID-19 vaccination. Total 78.5% of tweets are analyzed as neutral or favourable. Experiments indicated fear at workplace which leads to negative sentiments of Indians regarding COVID-19 vaccination. Rule based VEDAR technique is applied by by Mir et al. to analyze opinions of Indians on COVID-19 vaccination [43]. In addition, they identified common keywords used by the Indians to express their thoughts on Twitter regarding vaccinations. Total 11,815 Tweets are collected by using “Covid19vaccine” and "Coronavirusvaccine” hashtags from January to March 2021. Total 162 tweets are used for analysis after removing 2700 duplicate tweets. Total 639, 521, and 241 tweets are analyzed as positive, neutral, and negative, respectively. Most tweets are analyzed positive towards COVID vaccinations which indicates its widespread support. Dubey et al. examined Indian Twitter users’ opinions about two COVID-19 vaccines namely, Covishield and Co-vaxin by using NRC lexicon [19]. They examined two datasets of tweets posted between January 14th and January 18th, 2021 with #Covishield and #Covaxin. Authors have analyzed favourable attitudes and trust of people towards both vaccines. Agarwal et al. experimented with news stories about internal migration in India during lockdown period due to COVID-19 [1]. They analyzed news articles published in Times of India and The Hindu newspapers. Total 2,170 separate news stories published between May 2020 and June 2020 are collected by using the terms "migrants,” "migration,” "lockdown,” "COVID-19,” and "pandemic". The tone and perspective of news items are determined by using VADER module. Most articles are analyzed as neutral with small percentage of strong negative or positive polarity in their work. Chehal et al. examined emotions of Indians and their perception about e-commerce during both second and third lockdown periods using a Twitter dataset [15]. Less percentage of negative emotions are reported during third lockdown as compared to second lockdown period. Authors have analyzed the online retailing tendency of people. It is also analyzed that the priorities of people shifted from purchasing nutrition items, clothing, and home goods to baby goods, beauty aids, games, and sports equipment during third lockdown. Misra et al. examined and acquired information regarding reverse migration in India via Twitter mining [45]. They retrieved almost 50,000 Tweets from March 2020 to May 2020 by using trending hashtag such as #IndianMigrantWorkers and Twitter API. The emotions are identified by using the NRC Emotion Lexicon after noise removal from collected data.

Limitation: To obtain Twitter data, the researchers exclusively used only the popular hashtags #IndianMigrantWorker and #MigrantWorker which does not reflect the entire population. The tweets posted in other languages are not included for analysis. Different perspective can be analyzed by using tweets posted in other Indian languages also.

Sing et al. evaluated COVID-19 sentiment associated mucormycosis (CAM) during the second wave in India [58]. Total 401,037 Twitter posts are collected between 3rd May and 29th August 2021 by applying Twitter API. Higher percentage of positive emotion is analyzed as compared to negative emotions by using VADER tool.

3.2Machine learning

The second category of research work addresses the application of machine learning techniques for the sentiment analysis of social media data regarding COVID-19 and its immunization. The machine learning techniques use the labelled training data to obtain predictive information regarding target opinions. Chintalapudi et al. collected Indian tweet posted between 23rd March 2020 to 15th July 2020 [16]. Collected data is categorised as happy, angry, scared, and sad by using Bi-directional Encoder Representation from Transformer (BERT) technique. Performance of BERT technique is compared with Long Short-Term Memory (LSTM), support vector machine (SVM), and logistic regression (LR) model also. Performance of BERT model is higher than other models with 89% accuracy. Kumar et al. applied hybrid model of BiLSTM and convolution neural network (CNN) to evaluate the publicly accessible Sentiment140 dataset and labelled Indian COVID-19 tweets and achieved 90% accuracy [34].

Limitation: Authors have used English text only for the sentiment analysis. The text from other languages can also be used to improve its correctness.

3.3Hybrid models

A hybrid model combines lexical analysis and machine learning to analyze unlabeled data [17]. In this method, unlabeled data is annotated with lexicon-based algorithms before training and testing of machine learning algorithms [41].

Singh et al. investigated an emotion detection technique by using COVID-19 tweets of all over world [59]. Twitter scrapper API is used to extract data from 20 January to 25 April 2020 with #corona&virus, #COVID19, and #COVID2019 hashtags. Relevant features are selected with maximum Relevance and Minimum Redundancy (mRMR) technique. Sentiments are classified by utilizing BERT model with 94% accuracy. Sunitha et al. suggested an emotion analysis approach for evaluating real-time COVID-19 related tweets [60]. Approximately 3100 tweets are gathered between March 2020 to November 2021 from Indian and European citizens. Next, fasttext, Word2Vec, GloVe, and TF-IDF techniques are used for feature extraction from preprocessed data. The ensemble classifier is used to categorise the emotions as anger, sadness, joy, or fear. The suggested model successfully classified the emotions of both Indians and Europeans with an accuracy of 97.28% and 95.2%, respectively. Majumder et al. conducted comparative study of sentiment analysis using SVM and LR model [39]. The Indians COVID-19 tweets are gathered from March 2020 to June 2020. All collected data is converted into lowercase before removing hyperlinks and punctuation. Next, a label encoding approach is employed to get the labelled data by converting it into numeric format. Borah et al. employed a multi-modal deep learning approach to analyze 36,231,457 tweets related to COVID-19 vaccine from 51,682 Indians [9]. All Tweets are collected using #ReadyToVaccinate, #Covishield, #CovidVaccine, and #Covaxin hashtags. Analysis is done by using SentriStrength tools which assigns a value between -4 to +4 to each tweet. Extreme negative and extreme positive sentiment is denoted by -4 and +4, respectively. The textual data and the network topology are encoded by using BERT and GraphBERT.

Limitation: Sentiment analysis is applied only to tweets posted by urban residents. The perspective of rural residents is not included in the analysis. Furthermore, only limited hashtags determines inclusion of tweets in the dataset for analysis.

Gupta et al. evaluated 12,741 tweets to analyzed sentiments of Indian Twitter users by using natural language processing (NLP) and machine learning techniques [24]. Linear SVC classifier with unigrams is used for classification and achieved maximum 84.4% accuracy. Positive attitude of Indians towards lockdown decision of government is analyzed.

Ghasiya et al. identified COVID-19 relevant issues and sentiments published in newspapers between January to December 2020 [21]. Total 100,000 news articles are scraped with COVID-19 and Coronavirus keywords from eight major newspapers of four countries. Their work is divided into two parts (i) topic modification and (ii) sentiment classification. According to topic modelling, all four countries have similar issues of sports, education, and economy. After that, they used state-of-the-art RoBERTa model to determine sentiment of headlines and achieved 90% validation accuracy. Their findings indicate more positive news in South Korea and UK as compared to negative.

Xie et al. investigated reaction of Chinese microblog users about COVID-19 using text mining techniques [67]. They collected web crawler of 719,570 Weibo posts. It is analyzed that people supported the front-line soldiers during COVID-19 outbreak and positive messages percentage dominated negative messages.

Ermatita et al. proposed a multi-modal fusion neural network for COVID-19 sentiment analysis of Instagram text and posters [20]. Integrated inputs of images and captions are given to modified deep learning architectures with multi-modal graph layers and self-attention. Authors obtained 87% accuracy with multi-modal Fusion Neural Networks.

Gupta et al. proposed a novel emotional care scheme for analysing real-time COVID-19 textual data [25]. They have analyzed eight emotions towards various categories such as politics, market, education, health, lockdown, and nature. According to this textual analysis, ’joy’ feeling reported less towards everything (9-15%) except nature (17%).

3.4Sentiment analysis approaches

Various techniques applied by the exiting authors for COVID-19 sentiment analysis are briefly described in following sections.

3.4.1Lexicons based sentiment analysis

Lexicons are collections of words or terms used for sentiment analysis in natural language processing. It can be created manually or extracted automatically from a text corpus. Lexicon contains both positive and negative words or phrases with polarity scores. Several sentiment based lexicons resources are described as: (i) SentiWordNet: It is a lexical resource that assigns sentiment scores to each word based on their senses and WordNet.

(ii) VADER (Valence Aware Dictionary and sEntiment Reasoner): It is a a rule-based sentiment analysis tool that uses a lexicon of words and emoticons to determine text sentiment.

(iii) TextBlob: It is a Python based simple API which provides part-of-speech tagging and noun phrase extraction. This method is based on modified version of Naive Bayes algorithm which provides sentiment polarity score between -1 and +1. Highly negative, highly positive, and neutral sentiment are denoted by +1, -1, and 0, respectively. Polarity score can be computed as:

(1)
P(S|W)=P(W|S)*P(S)/P(W)
P (S|W) is probability of sentiment S with given words W. P (W|S) is probability of words W with given sentiment S. P (S) denotes prior probability of sentiment S. P (W) indicates probability of words W.

3.4.2Logistic regression

It is statistical based categorization model that assesses correlation between categorical dependent and one or more independent variables [52]. Proper feature selection can improve the accuracy and generalizability of model. Mathematical expression of logistic regression can be given as:

(2)
p(y=1|x,w)=1/(1+e-(wx+b))
here, p (y = 1|x, w) denotes the predicted probability of positive class with given input features x and model weights w. Exponential function and bias term are denoted as e and b, respectively.

3.4.3Naive Bayes

It is probabilistic-based classifier which estimates likelihood of a group [40]. Only small quantity of data is needed to train this classifier [47]. It produces better outcome because of its simplicity and stable foundation. Bayes theorem is mathematically expressed as [33, 38]:

(3)
C(FR)=C(RF)·C(F)C(R)
here, C (F ∣ R) denotes class probability F of given document R. C (F) represents prior probability of class F, C (R) denotes knowledge from the text itself to be categorised. C (R ∣ F) represents document probability R having distribution in class space F.

3.4.4Random forest classifier

This classifier builds a forest of decision trees to handle complex classification and regression problems [10]. It generates numerous decision-making models during training to predict different classes. It employs Gini Index and Entropy for data categorization which are mathematically represented as:

(4)
GiniImpurity=1-i=1kpi2

(5)
Entropy=-i=1kpilog2pi

here, k and pi denotes number of classes and proportion of samples belonging to ith class, respectively. Random Forest constructs decision trees by minimizing the impurity measure at each node to accurately classify the new instances.

3.4.5Convolution neural network (CNN)

It is a feed-forward neural network which contains four components (i) hidden layers, (ii) convolution layer, (iii) ReLU layer, and (iv) pooling layer to extract data-based features [46]. ReLU and pooling layers are standard components which uses grid-like structure to extracts essential features from given input data. All features are generated by using a window of words xt and it is mathematically given as:

(6)
S=T(Q·xt+p)
here, filter weight and bias is denoted by Q and p, respectively. The T presents convolutional nonlinear activation function.

3.4.6LSTM

It is a type of recurrent neural network (RNN) commonly used for text classification. It uses a combination of three main gates: (i)input, (ii) forget, and (iii) output, to control the flow of information. Input gates control the amount of information added to memory cells at each time step. Forget gate determines amount of time for which memory cell retains the old information. Finally, output gate controls the amount of current memory cell information used to generate output at each time step. All three gates are mathematically denoted as: [12, 63]:

(7)
it=σ(Wxixt+Whiht-1+bi)
(8)
ft=σ(Wxfxt+Whfht-1+bf)
(9)
ot=σ(Wxoxt+Whoht-1+bo)
here, xt is given input at time step t, ht-1 is the hidden state from the previous time step. Input, forget, and output gate activation at time step t are denoted as it, ft, and ot, respectively. σ denotes the sigmoid function.

3.4.7BiLSTM

It is variation of the LSTM network that integrates both past and future inputs in a single time step. It utilises bidirectional LSTM layer to uncover discoverable patterns by traversing the input data history in both directions. LSTM classifier works well with variable-length sequences but unable to use contextual information from future tokens [51]. First and second layer of BiLSTM network traverses the text in forward and reverse sequence, respectively [32]. Finally, output layer deals with historical and prospective context of each sequence point. The bidirectional architecture of BiLSTM improves its capability to understand the meaning of given text. The mathematical equation of this model is given as:

(10)
ht=LSTM(ht-1,xt)
(11)
ht=LSTM(ht+1,xt)
(12)
ht=[ht;ht]
(13)
y=softmax(Whyht+by)
here, ht and denotes hidden states of forward and backward LSTMs at time t, respectively. LSTM input at time t is denoted as xt. Concatenation of both hidden states are represented by [;]. Weight matrix and bias vector for fully connected layer are represented by Why and by. Predicted class probability is given as y.

3.4.8Gated recurrent unit (GRU)

This model regulates the internal flow of information with a gating mechanism namely update and reset gates [64]. Both gates determine the amount of data accepted or discarded from the previous level. Mathematically, both reset gate and update gate are defined as [69]:

(14)
qa=σ(Tq*[pa-1,ma])
(15)
la=σ(Tl*[pa-1,ma])
here, q and l are reset and update gates at time step a, respectively; pa - 1 is the hidden state at the previous time step; m is the input at time step a; T and T are weight matrices to be learned during training. Output of this model at time step a is calculated as:
(16)
pa=(1-la)*pa-1+la*tanh(Tp*[qa*pa-1,ma])
here, T is another weight matrix to be learned during training. Output p is passed through a final Softmax layer to obtain predicted class probabilities [7].

4Discussions

Two significant aspects is examined in this work listed as: (i) Various challenges encountered by existing researchers. (ii) Relevance and benefits of COVID-19 sentiment analysis. Various challenges, motivations, and recommendations are briefly described in next sections.

4.1Challenges

Existing researchers have faced several technological challenges while assessing the COVID-19 vaccines data. The data type, its annotation, and data pre-processing are the main challenges faced by the researchers. Accurate data plays a key role to perform accurate analysis and to get valid conclusion. It is difficult to accurately analyze data with irony, sarcasm, and slang words by using natural language processing. Credibility and originality of collected data are two crucial challenges of sentiment analysis on social media. Regrettably, only few authors have utilized data from Instagram, Facebook, or other social media platforms. Collected data must be labelled for clarity and manual annotation of huge number of text data is a difficult task. It is analyzed that many authors have used VEDAR and TextBlob technique for data labelling. The various data set used by existing authors are shown in Table 3. It is analyzed from Tables 3 that many authors employed only a subset of available data for their studies. So, resulting models may not be applicable to wide range of situations. The availability of verified COVID-19 datasets is a big challenge for the scientific community. Training and testing of model is performed by using small dataset due to unavailability of annotated dataset in early stages of research.

Table 4 shows the summarised COVID-19 and its vaccine-related research work of existing authors by using machine and deep learning models.

Table 4

Comparison of existing models

RefModelAccuracy (%)
[57]LSTM-RNN84.56
[23]Linear SVC98.15
[42]CNN-Bi-LSTM99.33
[31]RNN93.02
[56]Naïve Bayes91
[30](H-SVM)96
[53]ETC93
[29]LSTM81.15
[14]LogisticRegression81
[27]LSTM+FastText82.4
[65]BERT75.65
[53]LSTM93

It can be analyzed from Table 4 that accuracy obtained by various machine learning model is between 75% and 99%. However, highest accuracy of 99.33% is achieved by using hybrid deep learning model of CNN and BiLSTM.

4.2Motivations

Application of data analysis plays an important role in various fields. Analysis and evaluation of the people’s sentiments towards serious diseases is a important application of natural language processing and data mining. Sentiment analysis is a simple, efficient, and effective way to evaluate public opinion on illnesses and their transmission.

4.3Recommendations

The recommendations mostly indicate future tactics that can be implemented for advanced sentiment analysis research of various themes. The data analysis outcome depends on characteristics of data sets and applied techniques. Hindi and other Indian languages can also be used for sentiment analysis. Some additional recommendations that can be considered for advancement of sentiment analysis research:

  • - Multi-lingual Sentiment Analysis: multiple language based sentiment analysis can provide more comprehensive view of sentiments across different cultures and geographies. Therefore, development of sentiment analysis techniques for different Indian languages is a thrust area of research.

  • - Fine-grained Sentiment Analysis: Most existing sentiment analysis methods classify texts into positive, negative, and neutral. However, this approach oversimplifies the complexity of human emotions. Fine-grained sentiment analysis such as happiness, sadness, anger, and fear, can be considered.

  • - Domain-Specific Sentiment Analysis: Sentiment analysis techniques developed for one domain may not perform better on another domain due to different language, context, and cultural norms. Therefore, developing domain-specific sentiment analysis model can improve its accuracy and effectiveness for specific applications, such as customer reviews, political speeches, or social media conversations.

  • - Combination of Multiple Data Sources: Data from multiple sources such as social media, news articles, and survey responses can provide more comprehensive and diverse data set for sentiment analysis. Integrating these data sources can lead to more accurate and reliable sentiment analysis results.

  • - Integration of Deep Learning: Deep learning techniques have shown promising results of sentiment analysis. Therefore, integrating deep learning techniques with traditional machine-learning approaches can further improve the accuracy and effectiveness of sentiment analysis.

4.4Future implications of research about COVID

Several potential outcomes are outlined below:

  • - Massive amounts of data can be used to investigate various systems.

  • - Sentiment analysis can be applied by using multiple languages.

  • - Income level and demographics can also be considered while analysing public’s sentiments.

  • - Sentiment analysis can be performed according to different age groups of people.

  • - Collection of accurate spatial data is crucial for effective Geo-referencing.

  • - During crisis, more targeted analysis can be performed to support policymakers, governments, and communities.

5Conclusions

Systematic literature review of the sentiment analysis of COVID-19 and its vaccination in INDIA during past three years is presented in this work. The applications of lexicons, machine, and deep Learning techniques for sentiment analysis of COVID-19 and its vaccination are analyzed. Various data sources, available data set and methods applied by the various authors are discussed. The common problems with collected data and limitations of the existing works are also presented. It is concluded that data set of multiple languages can be utilized for effective and accurate analysis. Hybrid model of deep and machine learning can be utilized to analyze massive data from various sources.

References

[1] 

Agarwal S. and Sarkar S. , Topical analysis of migration coverage during lockdown in india by mainstream print media, Plos One 17: ((2022) ), e0263787.

[2] 

Aggrawal P. , Jolly B.L.K. , Gulati A. , Sethi A. , Kumaraguru P. and Sethi T. , Psychometric analysis and coupling of emotions between statebulletins and twitter in india during covid-19 infodemic, Frontiers in Communication 6: ((2021) ), 695913.

[3] 

Alamoodi A.H. , Zaidan B.B. , Zaidan A.A. , Albahri O.S. , Mohammed K. , Malik R.Q. , Almahdi E.M. , Chyad M.A. , Tareq Z. , Albahri A.S. , et al., Sentiment analysis and its applications in fighting covid-19and infectious diseases: A systematic review, Expert systems with applications 167: ((2021) ), 114155.

[4] 

Alec Go and Richa Bhayani L.H. , A twitter sentiment analysis tool, 2020. URL: https://help.sentiment140.com/home

[5] 

Aljedaani W. , Saad E. , Rustam F. , de la Torre Diez I. and Ashraf I. , Role of artificial intelligence for analysis of covid-19 vaccination-related tweets: Opportunities, challenges, and future trends, Mathematics 10: ((2022) ), 3199.

[6] 

Aryal R.R. and Bhattarai A. , Sentiment analysis on covid-19 vaccination tweets using naive bayes and lstm, Advances in Engineering and Technology: An International Journal 1: ((2021) ), 57–70.

[7] 

Atef S. and Eltawil A.B. , Assessment of stacked unidirectional andbidirectional long short-term memory networks for electricity load forecasting, Electric Power Systems Research 187: ((2020) ), 106489.

[8] 

Barkur G. , Kamath G.B. , et al., Sentiment analysis of nationwidelockdown due to covid 19 outbreak: Evidence from india, Asianjournal of psychiatry 51: ((2020) ), 102089.

[9] 

Borah A. , Detecting covid-19 vaccine hesitancy in india: a multimodal transformer based approach, Journal of Intelligent Information Systems (2022), 1–17.

[10] 

Breiman L. , Random forests, Machine Learning 45: ((2001) ), 5–32.

[11] 

Cao L. and Liu Q. , Covid-19 modeling: A review, medRxiv, (2022).

[12] 

Cao Y. , Long M. , Wang J. , Yang Q. and Yu P.S. , Deep visual-semantic hashing for cross-modal retrieval, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2016), pp. 1445–1454.

[13] 

Chahar S. and Roy P.K. , Covid-19:Acomprehensive review of learning models, Archives of Computational Methods in Engineering (2021), 1–26.

[14] 

Chakraborty K. , Bhatia S. , Bhattacharyya S. , Platos J. , Bag R. and Hassanien A.E. , Sentiment analysis of covid-19 tweets by deeplearning classifiers—a study to show how popularity isaffecting accuracy in social media, Applied Soft Computing 97: ((2020) ), 106754.

[15] 

Chehal D. , Gupta P. and Gulati P. , Covid-19 pandemic lockdown: Anemotional health perspective of indians on twitter, International Journal of Social Psychiatry 67: ((2021) ), 64–72.

[16] 

Chintalapudi N. , Battineni G. and Amenta F. , Sentimental analysis ofcovid-19 tweets using deep learning models, Infectious Disease Reports 13: ((2021) ), 329–339.

[17] 

Chiny M. , Chihab M. , Bencharef O. and Chihab Y. , Lstm, vader andtf-idf based hybrid sentiment analysis model, InternationalJournal of Advanced Computer Science and Applications 12: ((2021) ).

[18] 

Cruz-Cardenas J. , Zabelina E. , Guadalupe-Lanas J. , Palacio-Fierroand A. and Ramos-Galarza C. , Covid-19, consumer behavior, technology, and society: A literature review and bibliometric analysis, Technological Forecasting and Social Change 173 (2021), 121179. URL: https://www.sciencedirect.com/science/article/pii/S0040162521006120, doi: https://doi.org/10.1016/j.techfore.2021.121179

[19] 

Dubey A.D. , Public sentiment analysis of covid-19 vaccination drive in india, Available at: SSRN 3772401. (2021).

[20] 

Ermatita E. , Abdiansah A. , Rini D.P. and Febry F. , Sentimentanalysis of covid-19 using multimodal fusion neural networks, TEM Journal 11: ((2022) ), 1316–1321.

[21] 

Ghasiya P. and Okamura K. , Investigating covid-19 news across fournations: A topic modeling and sentiment analysis approach, IEEE Access 9: ((2021) ), 36645–36656.

[22] 

Ghosh S. , Shankar S. , Chatterjee K. , Chatterjee K. , Yadav A.K. , Pandya K. , Suryam V. , Agrawal S. , Ray S. , Phutane V. , et al., Covishield (azdvaccine effectiveness among healthcare and frontline workers of indian armed forces: Interim results of vin-wincohort study, Medical Journal Armed Forces India 77: ((2021) ), S264–S270.

[23] 

Gulati K. , Kumar S.S. , Boddu R.S.K. , Sarvakar K. , Sharma D.K. and Nomani M. , Comparative analysis of Machine learning-basedclassification models using sentiment classification of tweetsrelated to covid-19 pandemic, Materials Today: Proceedings 51: ((2022) ), 38–41.

[24] 

Gupta P. , Kumar S. , Suman R.R. and Kumar V. , Sentiment analysis oflockdown in india during covid-19: A case study on twitter, IEEE Transactions on Computational Social Systems 8: ((2020) ), 992–1002.

[25] 

Gupta V. , Jain N. , Katariya P. , Kumar A. , Mohan S. , Ahmadian A. and Ferrara M. , An emotion care model using multimodal textual analysison covid-19, Chaos, Solitons & Fractals 144: ((2021) ), 110708.

[26] 

Hall K. , Chang V. and Jayne C. , A review on natural language processing models for covid-19 research, Healthcare Analytics (2022), 100078.

[27] 

Imran A.S. , Daudpota S.M. , Kastrati Z. and Batra R. , Cross-culturalpolarity and emotion detection using sentiment analysis and deeplearning on covid-19 related tweets, IEEE Access 8: ((2020) ), 181074–181090.

[28] 

India W. , Who india weekly covid-19 situational report, edition 116 published july 22, (2022). URL: https://cdn.who.int/media/docs/default-source/wrindia/situation-report/india-situation-report-116.pdf?sfvrsn=1a5f2a59_2

[29] 

Jelodar H. , Wang Y. , Orji R. and Huang S. , Deep sentiment classification and topic discovery on novel coronavirus or covid-19 online discussions: Nlp using lstm recurrent neural network approach, IEEE Journal of Biomedical and Health Informatics 24: ((2020) ), 2733–2742.

[30] 

Kaur H. , Ahsaan S.U. , Alankar B. and Chang V. , Aproposed sentiment analysis deep learning algorithm for analyzing covid-19 tweets, Information Systems Frontiers (2021), 1–13.

[31] 

Khan M. and Malviya A. , Big data approach for sentiment analysis of twitter data using hadoop framework and deep learning, in: 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic- ETITE), IEEE, (2020), pp. 1–5.

[32] 

Krkova V. , Manolopoulos Y. , Hammer B. , Iliadis L. and Maglogiannis I. , Artificial Neural Networks and Machine learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4–7, 2018, Proceedings, Part III. volume 11141: . Springer, ((2018) ).

[33] 

Kumar A. , Singh J.P. , Rana N.P. and Dwivedi Y.K. , Multichannel convolutional neural network for the identification of eyewitness tweets of disaster, Information Systems Frontiers (2022), 1–16.

[34] 

Kumar V. , Spatiotemporal sentiment variation analysis of geotaggedcovid-19 tweets from india using a hybrid deep learning model, Scientific Reports 12: ((2022) ), 1–14.

[35] 

Kumari A. , Ranjan P. , Chopra S. , Kaur D. , Kaur T. , KalanidhiA. K.B. , Goel T. , Singh A. , Baitha U. , Prakash B. and Vikram N.K. , Whatindians think of the covid-19 vaccine: A qualitative studycomprising focus group discussions and thematic analysis, Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15: ((2021) ), 679–682. URL: https://www.sciencedirect.com/science/article/pii/S1871402121000953, doi: https://doi.org/10.1016/j.dsx.2021.03.021

[36] 

Lamsal R. , Design and analysis of a large-scale covid-19 tweets dataset, Applied Intelligence 51: ((2021) ), 2790–2804.

[37] 

Li J. and Hovy E. , Reflections on sentiment/opinion analysis, in: A practical guide to sentiment analysis, Springer, (2017), pp. 41–59.

[38] 

Lindley D.V. , Fiducial distributions and bayes’ theorem, Journal of the Royal Statistical Society, Series B (Methodological) (1958), 102–107.

[39] 

Majumder S. , Aich A. and Das S. , Sentiment analysis of people during lockdown period of covid-19 using svm and logistic regression analysis. Available at: SSRN 3801039. (2021).

[40] 

McKeown K. , Agarwal A. and Biadsy F. , Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams, (2009).

[41] 

Mendon S. , Dutta P. , Behl A. and Lessmann S. , A hybrid approach of and lexicons to sentiment analysis: enhancedin sights from twitter data of natural disasters, Information Systems Frontiers 23: ((2021) ), 1145–1168.

[42] 

Mengistie T.T. and Kumar D. , Deep learning based sentiment analysis on covid-19 public reviews, in: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, (2021), pp. 444–449.

[43] 

Mir A.A. and Sevukan R. , Sentiment analysis of indian tweets about covid-19 vaccines, Journal of Information Science 01655515221118049, (2022).

[44] 

Mishra S. , Verma A. , Meena K. and Kaushal R. , Public reactions towards covid-19 vaccination through twitter before and after secondwave in india, Social Network Analysis and Mining 12: ((2022) ), 1–16.

[45] 

Misra P. and Gupta J. , Impact of covid 19 on indian migrant workers: decoding twitter data by text mining, The Indian Journal of Labour Economics 64: ((2021) ), 731–747.

[46] 

Patel S. , A comprehensive analysis of convolutional neural network models, International Journal of Advanced Science and Technology 29: ((2020) ), 771–777.

[47] 

Patil T.R. , Msss performance analysis of naive bayes and j48classification algorithm for data classification, Intl Journalof Computer Science and Applications 6: ((2013) ).

[48] 

Prabhu A.N. , Kamath G.B. , Pai D.V. , et al., Keeping the countrypositive during the covid 19 pandemic: Evidence from india, Asian Journal of Psychiatry 51: ((2020) ), 102118.

[49] 

Praveen S. , Ittamalla R. and Deepak G. , Analyzing the attitude ofindian citizens towards covid-19 vaccine–a text analyticsstudy, Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15: ((2021) ), 595–599.

[50] 

Preda G. , covid-19-tweets data-set, (2020). URL: https://github.com/gabrielpreda/covid-19-tweets

[51] 

Qiu Q. , Xie Z. , Wu L. and Tao L. , Dictionary-based automatedinformation extraction from geological documents using a deeplearning algorithm, Earth and Space Science 7: ((2020) ), e2019EA000993.

[52] 

Roy K. , Kar S. and Das R.N. , Chapter 6 – selected statistical methods in qsar, in: K. Roy, S. Kar, Das, R.N. (Eds.), Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press, Boston, (2015), pp. 191–229. URL: https://www.sciencedirect.com/science/article/pii/B9780128015056000065, doi: https://doi.org/10.1016/B978-0-12-801505-6.00006-5

[53] 

Rustam F. , Khalid M. , Aslam W. , Rupapara V. , Mehmood A. and Choi G.S. , A performance comparison of supervised Machine learning models for covid-19 tweets sentiment analysis, Plos One 16: ((2021) ), e0245909.

[54] 

Saad E. , Din S. , Jamil R. , Rustam F. , Mehmood A. , Ashraf I. and Choi G.S. , Determining the efficiency of drugs under special conditions from users’ reviews on healthcare web forums, IEEE Access 9: ((2021) ), 85721–85737.

[55] 

Sallam R.M. , Hussein M. and Mousa H.M. , Improving collaborative filtering using lexicon-based sentiment analysis, International Journal of Electrical and Computer Engineering 12: ((2022) ), 1744.

[56] 

Samuel J. , Ali G.M.N. , Rahman M.M. , Esawi E. and Samuel Y. , Covid-19 public sentiment insights and Machine learning for tweets classification, Information 11: ((2020) ), 314.

[57] 

Singh C. , Imam T. , Wibowo S. and Grandhi S. , A deep learning approach for sentiment analysis of covid-19 reviews, Applied Sciences 12: ((2022) a), 3709.

[58] 

Singh M. , Dhillon H.K. , Ichhpujani P. , Iyengar S. and Kaur R. , Twitter sentiment analysis for covid-19 associated mucormycosis, Indian Journal of Ophthalmology 70: ((2022) b), 1773–1779.

[59] 

Singh M. , Jakhar A.K. and Pandey S. , Sentiment analysis on the impact of coronavirus in social life using the bert model, Social Network Analysis and Mining 11: ((2021) ), 1–11.

[60] 

Sunitha D. , Patra R.K. , Babu N. , Suresh A. and Gupta S.C. , Twitter sentiment analysis using ensemble based deep learning model towards covid-19 in india and european countries, Pattern Recognition Letters 158: ((2022) ), 164–170.

[61] 

Sv P. , Tandon J. , Hinduja H. , et al., Indian citizen’s perspectiveabout side effects of covid-19 vaccine–a Machine learningstudy, Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15: ((2021) ), 102172.

[62] 

Venigalla A.S.M. , Chimalakonda S. and Vagavolu D. , Mood of india during covid-19-an interactive web portal based on emotion analysis of twitter data, in: Conference companion publication of the 2020 on computer supported cooperative work and social computing, pp. 65–68. (2020).

[63] 

Wang M. , Xie J. , Tan Z. , Su J. , Xiong D. and Li L. , Towards linear time neural machine translation with capsule networks, arXiv preprint arXiv:1811.00287, (2018).

[64] 

Wang S. , Shao C. , Zhang J. , Zheng Y. and Meng M. , Traffic flow prediction using bi-directional gated recurrent unit method, Urban Informatics 1: ((2022) ), 16.

[65] 

Wang T. , Lu K. , Chow K.P. and Zhu Q. , Covid-19 sensing: negativesentiment analysis on social media in china via bert model, IEEE Access 8: ((2020) ), 138162–138169.

[66] 

WHO, Covid-19 weekly epidemiological update,edition 104 published 10 august, (2022). URL: https://help.sentiment140.com/home

[67] 

Xie R. , Chu S.K.W. , Chiu D.K.W. and Wang Y. , Exploring public response to covid-19 on weibo with lda topic modeling and sentiment analysis, Data and Information Management 5: ((2021) ), 86–99.

[68] 

You G. , Gan S. , Guo H. and Dagestani A.A. , Public opinion spread andguidance strategy under covid-19: A sis model analysis, Axioms 11: ((2022) ), 296.

[69] 

Zhu X. , Zhang M. , Hong Y. and He R. , Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I. volume 12430: . Springer Nature, ((2020) ).