Graph-based joint pandemic concern and relation extraction on Twitter

https://doi.org/10.1016/j.eswa.2022.116538Get rights and content

Highlights

  • The inherent features captured by concern graph for improving the model performance.

  • A novel deep learning model for sequential and regional dependency feature learning.

  • A novel end-to-end model for concerns and relations extraction.

  • Generic and effective concerns extraction evaluated by using real-world data.

Abstract

Public concern detection provides potential guidance to the authorities for crisis management before or during a pandemic outbreak. Detecting people’s concerns and attention from online social media platforms has been widely acknowledged as an effective approach to relieve public panic and prevent a social crisis. However, detecting concerns in time from massive volumes of information in social media turns out to be a big challenge, especially when sufficient manually labelled data is in the absence during public health emergencies, e.g., COVID-19. In this paper, we propose a novel end-to-end deep learning model to identify people’s concerns and the corresponding relations based on Graph Convolutional Networks and Bi-directional Long Short Term Memory integrated with Concern Graphs. Except for the sequential features from BERT embeddings, the regional features of tweets can be extracted by the Concern Graph module, which not only benefits the concern detection but also enables our model to be high noise-tolerant. Thus, our model can address the issue of insufficient manually labelled data. We conduct extensive experiments to evaluate the proposed model by using both manually labelled tweets and automatically labelled tweets. The experimental results show that our model can outperform the state-of-the-art models on real-world datasets.

Introduction

The outbreak of coronavirus (COVID-19) in 2019 has been causing a rapid increase in both infection and death rates around the world. Especially when the pandemic moved into the second, third, or even fourth wave, it caused devastating loss of human life, impacted the global economy, transformed our daily lives, and posed a threat to our society (Killgore, Cloonen, Taylor, & Dailey, 2020). According to the studies on the past pandemic outbreaks, e.g., Zika Fu et al., 2016, Glowacki et al., 2016, Ebola Lazard et al., 2015, Van Lent et al., 2017, and H1N1 Chew and Eysenbach, 2010, Szomszor et al., 2011, social media platforms, e.g., Twitter, have proven to be a popular channel for spreading information, especially related to public opinions and concerns (Damiano & Catellier JR, 2020). This is because people tend to perceive more details regarding the pandemic by reading the newsfeeds and interpreting the comments from others through social networks (Hu et al., 2019, Li et al., 2018). Twitter, a popular and informative social network platform, allows people to post and interact with messages known as “tweets”. They can also communicate and express opinions about the latest events (Killgore et al., 2020). User-generated tweets from Twitter turn out to be prophetic, namely, valuable indicators of what issues will likely happen in the pandemic. Therefore, it is important to make use of tweets and investigate what various people are discussing during the pandemic. The attitudes and behaviours of our society are affected directly by public concerns. Thus, how to effectively extract public concerns and analyse the corresponding relationships will assist people in understanding the anxiety and fears of the society in this pandemic situation. Furthermore, the potential social crisis can also be revealed by analysing public concerns, which significantly contribute to social management control.

Motivated by this background, great effort has been dedicated to mining social media data and exploring opinions towards pandemic outbreaks (da Silva, Tsigaris, & Erfanmanesh, 2021). Most existing research works can be categorised into traditional survey methods, e.g., survey and questionnaire Nelson, Pettitt, Flannery, and Allen (2020), and machine learning model-based methods, e.g., topic modelling (Kassab et al., 2020, Van Der Vegt and Kleinberg, 2020). The existing studies are capable of extracting fundamental public concerns, e.g., “social distancing”, “hand sanitiser” and “face masks”, which require intensive human effort in labelling large datasets, turning out to be inefficient. Moreover, in any epidemic emergence situation, e.g., COVID-19, traditional approaches, such as questionnaires and clinical tests, neither collect enough data for deep learning model training nor rapidly generate a model for concern detection. Therefore, it is vital to design an end-to-end model that is capable of automatically analysing social media data and detecting public concerns without requiring a large-scale of data to be labelled manually.

Deep learning methods are increasingly applied to valuable information extraction. However, most methods rely heavily on data labelled by the annotators, requiring much time and financial resources (Kipf & Welling, 2017). Moreover, the noisy and imbalanced social media data prevent deep learning-based methods from generalisation (Rathan, Hulipalled, Venugopal, & Patnaik, 2018). In many existing studies, the proposed models are not able to track real-time statistics of public concerns related to pandemics due to the required labelled dataset (Hou et al., 2020, Jahanbin et al., 2020, Lazard et al., 2015, Li et al., 2020). To mitigate this issue, preliminary research was conducted to mine public concerns by proposing an Automated Concern Exploration (ACE) framework (Shi et al., 2021). The proposed framework can detect concerns from tweets automatically and construct a concern knowledge graph to present the interconnections of the extracted concern entity set. However, several advent limitations are still to be addressed. (1) only BERT embedding of tweets is used, which cannot capture regional dependency word features from tweets to improve the performance of concern extraction. (2) the relation between concerns in one tweet posted by a user is not detected, which is critical to reveal meaningful information about public concerns. (3) the framework employs a rule-based method, having poor generalisability and appearing difficult to transfer to future occurring pandemics.

In this paper, we propose and develop an end-to-end model with Concern Graph (CG) and concern states to simultaneously identify public concerns and corresponding relations. “Public concern” is formally defined with a consideration of its type and degree, and construct a concern graph to represent the regional features, improving the concern identification effectiveness. Furthermore, the proposed method can extract concern relations by integrating concern states with Graph Convolutional Network (GCN) (Kipf & Welling, 2017). Extensive experiments are conducted to evaluate the proposed method by using both manual-labelled and auto-labelled datasets. The experimental results explicitly demonstrate that our method outperforms state-of-the-art models.

The novelties of our research work are presented as follows: To the best of our knowledge, the proposed method is the first to apply the deep learning-based method to detect public concerns, which rapidly assists the authority to understand people’s anxiety and fears about COVID-19; Furthermore, the concern relation is extracted along with concerns, helping to identify any potential social crisis; We are the first to define a concern graph which contributes to the detection of concerns and corresponding relationships, which leads to the performance improvement of the proposed method. Our contributions in this research work are summarised below:

  • A concern graph data structure is defined to capture the inherent structural information of concerns more efficiently.

  • A novel end-to-end model is presented to jointly extract concerns and relations consisting of Concern Graph (CG) and shared state of concerns.

  • The proposed model is evaluated on manual-labelled data and auto-labelled data, and the results indicate the proposed method is effective for auto-labelled data.

Section snippets

Related work

In this section, the existing studies are firstly reviewed, which are related to public concern mining and detection. Then, modern Named Entity Recognition (NER) and Relation Extraction (RE) approaches are inspected and compared since the concern detection, defined in this paper, tends to explore the concern entities and the corresponding relations. Finally, the GCN and its variants are reviewed since GCN has been widely adopted in NER and RE based on recent studies.

Preliminaries

In this section, the relevant definitions are presented, including public concerns, concern relations, and graphs. In addition, the concern detection problem is formally formulated.

Graph-based concern and relation extraction

For a set of tweets T, the goal of our method is to identify public concerns C={c1,,cn} and concern relations R={r1,,rn}. In this section, the joint extraction of concerns and relations model with concern graph is illustrated in Fig. 3. The proposed method consists of four main components, i.e., embedding layer, encoding layer, concern decoding layer, and concern relation extraction layer. Each component is described in detail below. The embedding layer is introduced in Section 4.1, followed

Experiments

In this section, extensive experiments are conducted to evaluate the proposed approach by using COVID-19 Twitter datasets. First, COVID-19 dataset collection and pre-processing are described. Second, the proposed approach is compared against six state-of-the-art baselines in terms of precision, recall, and F1 score. Third, quantitative analytical results and conduct ablation studies are presented following the experimental results. Finally, a case study is given to illustrate the effectiveness

Conclusion and future work

In this paper, an end-to-end model is presented to simultaneously extract concern and concern relations from the social media dataset of COVID-19. GCN and Bi-LSTM are jointly combined to learn sequential and regional dependency features from tweets. In order to capture more features of model input, the influence of graph structure for concern and relation extraction is explored. The sequential and regional features from the dataset are concatenated, enabling the embedding vectors to represent

CRediT authorship contribution statement

Jingli Shi: Writing – original draft, Conceptualization, Data curation, Methodology, Software. Weihua Li: Conceptualization, Project administration, Writing – review & editing, Supervision. Sira Yongchareon: Writing – review, Formal analysis, Resources, Validation. Yi Yang: Writing – review, Project administration, Resources. Quan Bai: Formal analysis, Project administration, Writing – review, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by the Callaghan Innovation  [CSITR1902, 2020], New Zealand’s Innovation Agency.

References (60)

  • ChandrasekaranR. et al.

    Topics, trends, and sentiments of tweets about the COVID-19 pandemic: Temporal infoveillance study

    Journal of Medical Internet Research

    (2020)
  • ChenL. et al.

    In the eyes of the beholder: Sentiment and topic analyses on social media use of neutral and controversial terms for COVID-19

    (2020)
  • ChewC. et al.

    Pandemics in the age of Twitter: Content analysis of tweets during the 2009 H1N1 outbreak

    PLoS One

    (2010)
  • Culotta, A., & Sorensen, J. (2004). Dependency tree kernels for relation extraction. In Proceedings of the 42nd annual...
  • DamianoA. et al.

    A content analysis of coronavirus tweets in the United States just prior to the pandemic declaration

    Cyberpsychology, Behavior and Social Networking

    (2020)
  • Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized...
  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for...
  • Eberts, M., & Ulges, A. (2020). Span-Based Joint Entity and Relation Extraction with Transformer Pre-Training. In...
  • Fu, T.-J., Li, P.-H., & Ma, W.-Y. (2019). GraphRel: Modeling text as relational graphs for joint entity and relation...
  • Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the...
  • HouZ. et al.

    Assessment of public attention, risk perception, emotional and behavioural responses to the COVID-19 outbreak: Social media surveillance in China

    (2020)
  • HuY. et al.

    Context-aware influence diffusion in online social networks

  • HuangZ. et al.

    Bidirectional LSTM-CRF models for sequence tagging

    (2015)
  • JahanbinK. et al.

    Using Twitter and web news mining to predict COVID-19 outbreak

    Asian Pacific Journal of Tropical Medicine

    (2020)
  • KassabL. et al.

    On nonnegative matrix and tensor decompositions for COVID-19 Twitter dynamics

    (2020)
  • Katiyar, A., & Cardie, C. (2017). Going out on a limb: Joint extraction of entity mentions and relations without...
  • Kaveh-YazdyF. et al.

    Track Iran’s national COVID-19 response committee’s major concerns using two-stage unsupervised topic modeling

    International Journal of Medical Informatics

    (2020)
  • KilleenB.D. et al.

    A county-level dataset for informing the United States’ response to COVID-19

    Hospital

    (2020)
  • KimE.H.-J. et al.

    Topic-based content and sentiment analysis of Ebola virus on Twitter and in the news

    Journal of Information Science

    (2016)
  • Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of...
  • Cited by (6)

    View full text