DFBench: The Speech Deepfake Detection Benchmark 2025

DFBench provides a standardized evaluation for audio deepfake detection systems. This leaderboard focuses on speech deepfake detection, e.g. the output of text-to-speech and speech-to-speech models.

Objectives:

  • Allow fair comparison between deepfake detection models on unseen test data (no fine tuning on the test data possible)
  • Advance the state-of-the-art in synthetic media identification

Leaderboard Speech Deepfake Detection

Rank Model Accuracy Accuracy on Real Accuracy on Fake Accuracy on WAV Accuracy on FLAC Accuracy on MP3 Accuracy on WEBM Accuracy on M4A Accuracy on OGG
1 AASIST + Wav2Vec2 89.0 84.5 93.5 90.9 90.3 89.9 87.2 86.9 89.0
2 TCM-ADD + Wav2Vec2 88.8 85.4 92.2 89.9 89.0 89.9 87.0 87.3 89.7
3 Nes2Net + Wav2Vec2 85.3 75.5 95.0 87.4 85.6 86.2 82.2 84.4 86.1
4 XLS + Wav2Vec2 81.3 68.4 94.3 82.7 81.5 82.4 79.5 80.0 82.0
5 AASIST + HuBERT 78.6 59.8 97.4 81.5 81.4 79.4 76.0 75.4 77.8
6 XLS + HuBERT 77.6 57.3 97.8 79.7 80.6 77.9 75.9 74.1 77.3
7 MHFA + Data2Vec 76.2 59.3 93.1 78.5 76.6 77.9 75.2 73.2 76.0
8 MHFA + WavLM 75.2 62.3 88.1 75.1 77.2 75.7 74.7 71.7 76.8
9 AASIST + Data2Vec 75.2 55.0 95.3 75.7 77.6 76.5 73.0 72.6 75.6
10 XLS + Data2Vec 73.1 49.6 96.5 73.6 74.7 74.1 71.5 71.0 73.5
11 MHFA + Wav2Vec2 72.5 54.0 90.9 73.9 73.9 73.9 70.1 70.5 72.7

The Leaderboard is updated upon validation of new submissions. All results are evaluated on the official test dataset.