Fighting The Infodemic: Advanced Techniques For Misinformation Management
Menu
Multi-lingual COVID19 Misinformation DATASET
Dataset Characteristics: | Multi-lingual, COVID-19, News, Fact-Checking | Number of Instances: | 45,904 | News Category | Yes | |
---|---|---|---|---|---|---|
Attribute Characteristics: | Categorical, Integer, Real | Tweets Included: | 90,000 | News Link | Yes | |
Date | From 2019-12-12 to 2022-10-10 | Missing Values? | Yes | News Publishers Included: | 56 | |
languages Included: | 123 | Fact-Checking Sites Included: | 235 | Progress: | Weekly Update |
We have created the largest multilingual COVID-19 misinformation dataset as part of our project. This dataset consists of news metadata and surrounding social text, including Tweets, likes, retweets, and more.
Data Collection Process:
- We extensively researched and cross-checked various fact-checking resources to collect fake news. These resources include the Google Fact Check API, Poynter Institute, and IFCN, among others. We collected news claims and fact-checked articles from these sources on a weekly basis.
- To balance the proportion of real news and fake news, we added trustworthy news articles to the dataset. We identified reliable news sites through Media Bias/Fact Check and manually cross-checked the articles from these sources.
- We enriched the dataset by including the relevant social context of the fact-checked news articles. This involved tracking the social engagement and user reactions to the news on platforms like Twitter, using relevant keywords and data from Google Fact Check API and Poynter.
Dataset Characteristics:
- Number of Instances: 45,904
- Languages Included: 52
- Tweets Included: 90,000
- News Publishers Included: 56
- Fact-Checking Sites Included: 235
- Progress: Weekly Update
Fact-Checking Tool: We have also developed a powerful fact-checking tool that leverages a pre-trained BERT model with 97% accuracy in classifying fake or real news. The tool enables users to debunk multi-lingual COVID-19 fake news and search for relevant news records based on their queries. It aims to provide verified news and combat the spread of misinformation. The fact-checking tool is accessible on our project websi