Graph-based joint pandemic concern and relation extraction on Twitter
Introduction
The outbreak of coronavirus (COVID-19) in 2019 has been causing a rapid increase in both infection and death rates around the world. Especially when the pandemic moved into the second, third, or even fourth wave, it caused devastating loss of human life, impacted the global economy, transformed our daily lives, and posed a threat to our society (Killgore, Cloonen, Taylor, & Dailey, 2020). According to the studies on the past pandemic outbreaks, e.g., Zika Fu et al., 2016, Glowacki et al., 2016, Ebola Lazard et al., 2015, Van Lent et al., 2017, and H1N1 Chew and Eysenbach, 2010, Szomszor et al., 2011, social media platforms, e.g., Twitter, have proven to be a popular channel for spreading information, especially related to public opinions and concerns (Damiano & Catellier JR, 2020). This is because people tend to perceive more details regarding the pandemic by reading the newsfeeds and interpreting the comments from others through social networks (Hu et al., 2019, Li et al., 2018). Twitter, a popular and informative social network platform, allows people to post and interact with messages known as “tweets”. They can also communicate and express opinions about the latest events (Killgore et al., 2020). User-generated tweets from Twitter turn out to be prophetic, namely, valuable indicators of what issues will likely happen in the pandemic. Therefore, it is important to make use of tweets and investigate what various people are discussing during the pandemic. The attitudes and behaviours of our society are affected directly by public concerns. Thus, how to effectively extract public concerns and analyse the corresponding relationships will assist people in understanding the anxiety and fears of the society in this pandemic situation. Furthermore, the potential social crisis can also be revealed by analysing public concerns, which significantly contribute to social management control.
Motivated by this background, great effort has been dedicated to mining social media data and exploring opinions towards pandemic outbreaks (da Silva, Tsigaris, & Erfanmanesh, 2021). Most existing research works can be categorised into traditional survey methods, e.g., survey and questionnaire Nelson, Pettitt, Flannery, and Allen (2020), and machine learning model-based methods, e.g., topic modelling (Kassab et al., 2020, Van Der Vegt and Kleinberg, 2020). The existing studies are capable of extracting fundamental public concerns, e.g., “social distancing”, “hand sanitiser” and “face masks”, which require intensive human effort in labelling large datasets, turning out to be inefficient. Moreover, in any epidemic emergence situation, e.g., COVID-19, traditional approaches, such as questionnaires and clinical tests, neither collect enough data for deep learning model training nor rapidly generate a model for concern detection. Therefore, it is vital to design an end-to-end model that is capable of automatically analysing social media data and detecting public concerns without requiring a large-scale of data to be labelled manually.
Deep learning methods are increasingly applied to valuable information extraction. However, most methods rely heavily on data labelled by the annotators, requiring much time and financial resources (Kipf & Welling, 2017). Moreover, the noisy and imbalanced social media data prevent deep learning-based methods from generalisation (Rathan, Hulipalled, Venugopal, & Patnaik, 2018). In many existing studies, the proposed models are not able to track real-time statistics of public concerns related to pandemics due to the required labelled dataset (Hou et al., 2020, Jahanbin et al., 2020, Lazard et al., 2015, Li et al., 2020). To mitigate this issue, preliminary research was conducted to mine public concerns by proposing an Automated Concern Exploration (ACE) framework (Shi et al., 2021). The proposed framework can detect concerns from tweets automatically and construct a concern knowledge graph to present the interconnections of the extracted concern entity set. However, several advent limitations are still to be addressed. (1) only BERT embedding of tweets is used, which cannot capture regional dependency word features from tweets to improve the performance of concern extraction. (2) the relation between concerns in one tweet posted by a user is not detected, which is critical to reveal meaningful information about public concerns. (3) the framework employs a rule-based method, having poor generalisability and appearing difficult to transfer to future occurring pandemics.
In this paper, we propose and develop an end-to-end model with Concern Graph (CG) and concern states to simultaneously identify public concerns and corresponding relations. “Public concern” is formally defined with a consideration of its type and degree, and construct a concern graph to represent the regional features, improving the concern identification effectiveness. Furthermore, the proposed method can extract concern relations by integrating concern states with Graph Convolutional Network (GCN) (Kipf & Welling, 2017). Extensive experiments are conducted to evaluate the proposed method by using both manual-labelled and auto-labelled datasets. The experimental results explicitly demonstrate that our method outperforms state-of-the-art models.
The novelties of our research work are presented as follows: To the best of our knowledge, the proposed method is the first to apply the deep learning-based method to detect public concerns, which rapidly assists the authority to understand people’s anxiety and fears about COVID-19; Furthermore, the concern relation is extracted along with concerns, helping to identify any potential social crisis; We are the first to define a concern graph which contributes to the detection of concerns and corresponding relationships, which leads to the performance improvement of the proposed method. Our contributions in this research work are summarised below:
- •
A concern graph data structure is defined to capture the inherent structural information of concerns more efficiently.
- •
A novel end-to-end model is presented to jointly extract concerns and relations consisting of Concern Graph (CG) and shared state of concerns.
- •
The proposed model is evaluated on manual-labelled data and auto-labelled data, and the results indicate the proposed method is effective for auto-labelled data.
Section snippets
Related work
In this section, the existing studies are firstly reviewed, which are related to public concern mining and detection. Then, modern Named Entity Recognition (NER) and Relation Extraction (RE) approaches are inspected and compared since the concern detection, defined in this paper, tends to explore the concern entities and the corresponding relations. Finally, the GCN and its variants are reviewed since GCN has been widely adopted in NER and RE based on recent studies.
Preliminaries
In this section, the relevant definitions are presented, including public concerns, concern relations, and graphs. In addition, the concern detection problem is formally formulated.
Graph-based concern and relation extraction
For a set of tweets , the goal of our method is to identify public concerns and concern relations . In this section, the joint extraction of concerns and relations model with concern graph is illustrated in Fig. 3. The proposed method consists of four main components, i.e., embedding layer, encoding layer, concern decoding layer, and concern relation extraction layer. Each component is described in detail below. The embedding layer is introduced in Section 4.1, followed
Experiments
In this section, extensive experiments are conducted to evaluate the proposed approach by using COVID-19 Twitter datasets. First, COVID-19 dataset collection and pre-processing are described. Second, the proposed approach is compared against six state-of-the-art baselines in terms of precision, recall, and F1 score. Third, quantitative analytical results and conduct ablation studies are presented following the experimental results. Finally, a case study is given to illustrate the effectiveness
Conclusion and future work
In this paper, an end-to-end model is presented to simultaneously extract concern and concern relations from the social media dataset of COVID-19. GCN and Bi-LSTM are jointly combined to learn sequential and regional dependency features from tweets. In order to capture more features of model input, the influence of graph structure for concern and relation extraction is explored. The sequential and regional features from the dataset are concatenated, enabling the embedding vectors to represent
CRediT authorship contribution statement
Jingli Shi: Writing – original draft, Conceptualization, Data curation, Methodology, Software. Weihua Li: Conceptualization, Project administration, Writing – review & editing, Supervision. Sira Yongchareon: Writing – review, Formal analysis, Resources, Validation. Yi Yang: Writing – review, Project administration, Resources. Quan Bai: Formal analysis, Project administration, Writing – review, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work was supported by the Callaghan Innovation [CSITR1902, 2020], New Zealand’s Innovation Agency.
References (60)
- et al.
Joint entity recognition and relation extraction as a multi-head selection problem
Expert Systems with Applications
(2018) - et al.
How people react to Zika virus outbreaks on Twitter? A computational content analysis
American Journal of Infection Control
(2016) - et al.
Identifying the public’s concerns and the Centers for Disease Control and Prevention’s reactions during a health crisis: An analysis of a Zika live Twitter chat
American Journal of Infection Control
(2016) - et al.
Joint extraction of entities and overlapping relations using source-target entity labeling
Expert Systems with Applications
(2021) - et al.
Joint extraction of entities and relations using graph convolution over pruned dependency trees
Neurocomputing
(2020) - et al.
Loneliness: A signature mental health concern in the era of COVID-19
Psychiatry Research
(2020) - et al.
Detecting themes of public concern: a text mining analysis of the Centers for Disease Control and Prevention’s Ebola live Twitter chat
American Journal of Infection Control
(2015) - et al.
Named entity recognition for extracting concept in ontology building on Indonesian language using end-to-end bidirectional long short term memory
Expert Systems with Applications
(2021) - Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the...
- Battaglia, P., Pascanu, R., Lai, M., Rezende, D. J., & kavukcuoglu, K. (2016). Interaction networks for learning about...
Topics, trends, and sentiments of tweets about the COVID-19 pandemic: Temporal infoveillance study
Journal of Medical Internet Research
In the eyes of the beholder: Sentiment and topic analyses on social media use of neutral and controversial terms for COVID-19
Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak
PLoS One
A content analysis of coronavirus tweets in the United States just prior to the pandemic declaration
Cyberpsychology, Behavior and Social Networking
Assessment of public attention, risk perception, emotional and behavioural responses to the COVID-19 outbreak: Social media surveillance in China
Context-aware influence diffusion in online social networks
Bidirectional LSTM-CRF models for sequence tagging
Using Twitter and web news mining to predict COVID-19 outbreak
Asian Pacific Journal of Tropical Medicine
On nonnegative matrix and tensor decompositions for COVID-19 Twitter dynamics
Track Iran’s national COVID-19 response committee’s major concerns using two-stage unsupervised topic modeling
International Journal of Medical Informatics
A county-level dataset for informing the United States’ response to COVID-19
Hospital
Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news
Journal of Information Science
Cited by (6)
Multi-modal semantics fusion model for domain relation extraction via information bottleneck
2024, Expert Systems with ApplicationsExtracting and structuring information from the electronic medical text: state of the art and trendy directions
2024, Multimedia Tools and ApplicationsDOR: a novel dual-observation-based approach for recommendation systems
2023, Applied IntelligenceInformation extraction from electronic medical documents: state of the art and future research directions
2023, Knowledge and Information SystemsClassification of Events Tweets Using Machine Learning
2023, OCIT 2023 - 21st International Conference on Information Technology, Proceedings