paper
arXiv cs.CL
November 18th, 2025 at 5:00 AM

ToxSyn: Reducing Bias in Hate Speech Detection via Synthetic Minority Data in Brazilian Portuguese

arXiv:2506.10245v2 Announce Type: replace Abstract: The development of robust hate speech detection systems remains limited by the lack of large-scale, fine-grained training data, especially for languages beyond English. Existing corpora typically rely on coarse toxic/non-toxic labels, and the few that capture hate directed at specific minority groups critically lack the non-toxic counterexamples (i.e., benign text about minorities) required to distinguish genuine hate from mere discussion. We introduce ToxSyn, the first Portuguese large-scale corpus explicitly designed for multi-label hate speech detection across nine protected minority groups. Generated via a controllable four-stage pipeline, ToxSyn includes discourse-type annotations to capture rhetorical strategies of toxic language, such as sarcasm or dehumanization. Crucially, it systematically includes the non-toxic counterexamples absent in all other public datasets. Our experiments reveal a catastrophic, mutual generalization failure between social-media domains and ToxSyn: models trained on social media struggle to generalize to minority-specific contexts, and vice-versa. This finding indicates they are distinct tasks and exposes summary metrics like Macro F1 can be unreliable indicators of true model behavior, as they completely mask model failure. We publicly release ToxSyn at HuggingFace to foster reproducible research on synthetic data generation and benchmark progress in hate-speech detection for low- and mid-resource languages.

#ai
#research
#product

Score: 2.80

Engagement proxy: 0

Canonical link: https://arxiv.org/abs/2506.10245