DFBench: The Speech Deepfake Detection Benchmark 2025
DFBench provides a standardized evaluation for audio deepfake detection systems. This leaderboard focuses on speech deepfake detection, e.g. the output of text-to-speech and speech-to-speech models.
Objectives:
- Allow fair comparison between deepfake detection models on unseen test data (no fine tuning on the test data possible)
- Advance the state-of-the-art in synthetic media identification
Leaderboard Speech Deepfake Detection
Rank | Model | Accuracy | Accuracy on Real | Accuracy on Fake | Accuracy on WAV | Accuracy on FLAC | Accuracy on MP3 | Accuracy on WEBM | Accuracy on M4A | Accuracy on OGG |
---|---|---|---|---|---|---|---|---|---|---|
1 | AASIST + Wav2Vec2 | 89.0 | 84.5 | 93.5 | 90.9 | 90.3 | 89.9 | 87.2 | 86.9 | 89.0 |
2 | TCM-ADD + Wav2Vec2 | 88.8 | 85.4 | 92.2 | 89.9 | 89.0 | 89.9 | 87.0 | 87.3 | 89.7 |
3 | Nes2Net + Wav2Vec2 | 85.3 | 75.5 | 95.0 | 87.4 | 85.6 | 86.2 | 82.2 | 84.4 | 86.1 |
4 | XLS + Wav2Vec2 | 81.3 | 68.4 | 94.3 | 82.7 | 81.5 | 82.4 | 79.5 | 80.0 | 82.0 |
5 | AASIST + HuBERT | 78.6 | 59.8 | 97.4 | 81.5 | 81.4 | 79.4 | 76.0 | 75.4 | 77.8 |
6 | XLS + HuBERT | 77.6 | 57.3 | 97.8 | 79.7 | 80.6 | 77.9 | 75.9 | 74.1 | 77.3 |
7 | MHFA + Data2Vec | 76.2 | 59.3 | 93.1 | 78.5 | 76.6 | 77.9 | 75.2 | 73.2 | 76.0 |
8 | MHFA + WavLM | 75.2 | 62.3 | 88.1 | 75.1 | 77.2 | 75.7 | 74.7 | 71.7 | 76.8 |
9 | AASIST + Data2Vec | 75.2 | 55.0 | 95.3 | 75.7 | 77.6 | 76.5 | 73.0 | 72.6 | 75.6 |
10 | XLS + Data2Vec | 73.1 | 49.6 | 96.5 | 73.6 | 74.7 | 74.1 | 71.5 | 71.0 | 73.5 |
11 | MHFA + Wav2Vec2 | 72.5 | 54.0 | 90.9 | 73.9 | 73.9 | 73.9 | 70.1 | 70.5 | 72.7 |
The Leaderboard is updated upon validation of new submissions. All results are evaluated on the official test dataset.
Model Submission Process
Official Benchmark Test Dataset: DFBench/DFBench_Speech25
The test dataset comprises 4524 audio files. The test data is unlabeled. Each audio file is either:
- Real: An authentic, unmodified audio file
- Fake: AI-generated or synthetically modified content Since there are no labels, you cannot (and should not) train your model on the test data.
Submission Requirements
File Format
Submit predictions as a CSV file with the following structure: filename,label
.
filename
: Exact filename as provided in the datasetlabel
: Binary classification result (real
orfake
)
For example:
filename,label
0.mp3,fake
1.ogg,real
2.flac,fake
...
4523.wav,fake
Submission Process
- Generate predictions for all 4524 test audio files
- Format results according to specification above
- Send your CSV file submission to: submission@df-bench.com. The name of the file should correspond to the leaderboard model name, e.g.
Model_This_name.csv
will be included asModel This name
in the leaderboard.
Evaluation Timeline
- Submissions are processed within 5-7 business days
- Approved submissions are added to the public leaderboard
Notes
- Each research group may submit one set of scores per month
- All submissions undergo automated validation before leaderboard inclusion
- The authors reserve the right to not publish or to remove a submission at their discretion
- Submissions may be excluded if found to violate ethical guidelines, contain malicious content, or appear fraudulent
- Benchmark maintainers may adjust evaluation protocols as the dataset and task evolve
- No warranties are provided regarding benchmark results, which are intended strictly for research and comparison purposes
For technical inquiries regarding the evaluation process, please contact the benchmark maintainers through the submission email.