DPDFNet-8 BVC — Pretrained vs Fine-Tuned
200-sample room-acoustic test set · DNSMOS P.835 scenario benchmark · 16 kHz and 8 kHz G.711
Test Set — 200 Samples (Room-Acoustic)
| Metric | Pretrained 16k | Fine-tuned 16k | Pretrained 8k | Fine-tuned 8k |
| Near SI-SDR ↑ | +10.68 dB | +13.31 dB | +7.86 dB | +11.15 dB |
| Far SI-SDR ↓ | −20.65 dB | −26.17 dB | −21.96 dB | −26.24 dB |
| SI-SDR improvement | +5.68 dB | +8.31 dB | +2.87 dB | +6.15 dB |
| PESQ-WB ↑ | 1.749 | 2.244 | 1.527 | 2.036 |
| STOI ↑ | 0.876 | 0.916 | 0.812 | 0.910 |
DNSMOS P.835 — 16 kHz (SIG / BAK / OVL)
| Scenario | No processing | Pretrained | Fine-tuned ep20 |
| Drama 9 dB (~5 m) | 1.68 / 1.21 / 1.30 | 3.12 / 3.88 / 2.81 | 3.37 / 4.04 / 3.10 |
| Call center 1.0 m | 3.40 / 3.09 / 2.61 | 3.43 / 4.08 / 3.15 | 3.53 / 4.12 / 3.24 |
| Call center 0.5 m | 3.41 / 2.94 / 2.54 | 3.38 / 3.84 / 3.00 | 3.45 / 3.99 / 3.11 |
DNSMOS P.835 — 8 kHz Round-Trip (SIG / BAK / OVL)
| Scenario | Pretrained | Fine-tuned ep20 | SIG drop (pretrained) | SIG drop (ep20) |
| Drama 9 dB | 3.16 / 3.89 / 2.85 | 3.24 / 4.01 / 2.96 | +0.04 | −0.12 |
| Call center 1.0 m | 2.98 / 4.05 / 2.74 | 3.45 / 4.10 / 3.17 | −0.45 | −0.07 |
| Call center 0.5 m | 2.99 / 3.76 / 2.60 | 3.33 / 3.96 / 3.00 | −0.38 | −0.12 |
Key finding: Fine-tuning improves every metric at both 16 kHz and 8 kHz. Far SI-SDR drops 5.5 dB (3× better suppression). PESQ-WB gains +0.5 MOS — perceptually meaningful. Pretrained SIG collapses 0.38–0.45 MOS under 8 kHz narrowband; fine-tuned drops only 0.07–0.12.
8 kHz robustness: Fine-tuned STOI drops only 0.006 under 8 kHz vs 0.064 for pretrained — the model already handles G.711 telephony well from 50% round-trip augmentation during training.