paper
arXiv cs.AI
November 18th, 2025 at 5:00 AM

GreatSplicing: A Semantically Rich Splicing Dataset

arXiv:2310.10070v3 Announce Type: replace-cross Abstract: In existing splicing forgery datasets, the insufficient semantic variety of spliced regions causes trained detection models to overfit semantic features rather than learn genuine splicing traces. Meanwhile, the lack of a reasonable benchmark dataset has led to inconsistent experimental settings across existing detection methods. To address these issues, we propose GreatSplicing, a manually created, large-scale, high-quality splicing dataset. GreatSplicing comprises 5,000 spliced images and covers spliced regions across 335 distinct semantic categories, enabling detection models to learn splicing traces more effectively. Empirical results show that detection models trained on GreatSplicing achieve low misidentification rates and stronger cross-dataset generalization compared to existing datasets. GreatSplicing is now publicly available for research purposes at the following link.

#ai
#research

Score: 2.80

Engagement proxy: 0

Canonical link: https://arxiv.org/abs/2310.10070