The paper titled 'From Twitter to Reddit: Exploring Data Sources for Computational Counterspeech' focuses on computational approaches to counterspeech from the perspective of computer science.
Counterspeech datasets are collected from social media platforms like Twitter, Youtube, and Reddit using keywords, hashtags, and pre-defined counterspeech accounts. Some counterspeech is created by crowd workers or counterspeech writing experts.
Most datasets offer binary annotations of counterspeech/non-counterspeech, while some provide annotations of different types of counterspeech. Datasets cover hate incidents related to phenomena like islamophobia and prejudice during COVID-19.
The datasets are primarily in English with a few targeting multilingual aspects such as Italian, French, German, and Tamil. The paper is available on arxiv under CC BY-SA 4.0 DEED license.