Fighting The Infodemic: Advanced Techniques For Misinformation Management

Multi-lingual COVID19 Misinformation DATASET

Dataset Characteristics: Multi-lingual, COVID-19, News, Fact-Checking Number of Instances: 45,904 News Category Yes
Attribute Characteristics: Categorical, Integer, Real Tweets Included: 90,000 News Link Yes
Date From 2019-12-12 to 2022-10-10 Missing Values? Yes News Publishers Included: 56
languages Included: 123 Fact-Checking Sites Included: 235 Progress: Weekly Update

We have created the largest multilingual COVID-19 misinformation dataset as part of our project. This dataset consists of news metadata and surrounding social text, including Tweets, likes, retweets, and more.

Data Collection Process:

  1. We extensively researched and cross-checked various fact-checking resources to collect fake news. These resources include the Google Fact Check API, Poynter Institute, and IFCN, among others. We collected news claims and fact-checked articles from these sources on a weekly basis.
  2. To balance the proportion of real news and fake news, we added trustworthy news articles to the dataset. We identified reliable news sites through Media Bias/Fact Check and manually cross-checked the articles from these sources.
  3. We enriched the dataset by including the relevant social context of the fact-checked news articles. This involved tracking the social engagement and user reactions to the news on platforms like Twitter, using relevant keywords and data from Google Fact Check API and Poynter.

Dataset Characteristics:

  • Number of Instances: 45,904
  • Languages Included: 52
  • Tweets Included: 90,000
  • News Publishers Included: 56
  • Fact-Checking Sites Included: 235
  • Progress: Weekly Update

Fact-Checking Tool: We have also developed a powerful fact-checking tool that leverages a pre-trained BERT model with 97% accuracy in classifying fake or real news. The tool enables users to debunk multi-lingual COVID-19 fake news and search for relevant news records based on their queries. It aims to provide verified news and combat the spread of misinformation. The fact-checking tool is accessible on our project websi