Background Voice Cancellation — Trained by Ruiqiang Huang

Ruiqiang Huang trained a Background Voice Cancellation model on top of DPDFNet-8 — 20 epochs on room-acoustic call center data (Libri2Mix + pyroomacoustics), raising SI-SDR improvement from +5.7 dB to +8.3 dB and reducing background voice leakage by 5.5 dB over the pretrained baseline. Evaluated at both native 16 kHz wideband VoIP and 8 kHz G.711 narrowband telephony.

Results Report

DPDFNet-8 BVC — Pretrained vs Fine-Tuned

200-sample room-acoustic test set · DNSMOS P.835 scenario benchmark · 16 kHz and 8 kHz G.711

Test Set — 200 Samples (Room-Acoustic)

MetricPretrained 16kFine-tuned 16kPretrained 8kFine-tuned 8k
Near SI-SDR ↑+10.68 dB+13.31 dB+7.86 dB+11.15 dB
Far SI-SDR ↓−20.65 dB−26.17 dB−21.96 dB−26.24 dB
SI-SDR improvement+5.68 dB+8.31 dB+2.87 dB+6.15 dB
PESQ-WB ↑1.7492.2441.5272.036
STOI ↑0.8760.9160.8120.910

DNSMOS P.835 — 16 kHz (SIG / BAK / OVL)

ScenarioNo processingPretrainedFine-tuned ep20
Drama 9 dB (~5 m)1.68 / 1.21 / 1.303.12 / 3.88 / 2.813.37 / 4.04 / 3.10
Call center 1.0 m3.40 / 3.09 / 2.613.43 / 4.08 / 3.153.53 / 4.12 / 3.24
Call center 0.5 m3.41 / 2.94 / 2.543.38 / 3.84 / 3.003.45 / 3.99 / 3.11

DNSMOS P.835 — 8 kHz Round-Trip (SIG / BAK / OVL)

ScenarioPretrainedFine-tuned ep20SIG drop (pretrained)SIG drop (ep20)
Drama 9 dB3.16 / 3.89 / 2.853.24 / 4.01 / 2.96+0.04−0.12
Call center 1.0 m2.98 / 4.05 / 2.743.45 / 4.10 / 3.17−0.45−0.07
Call center 0.5 m2.99 / 3.76 / 2.603.33 / 3.96 / 3.00−0.38−0.12
Key finding: Fine-tuning improves every metric at both 16 kHz and 8 kHz. Far SI-SDR drops 5.5 dB (3× better suppression). PESQ-WB gains +0.5 MOS — perceptually meaningful. Pretrained SIG collapses 0.38–0.45 MOS under 8 kHz narrowband; fine-tuned drops only 0.07–0.12.
8 kHz robustness: Fine-tuned STOI drops only 0.006 under 8 kHz vs 0.064 for pretrained — the model already handles G.711 telephony well from 50% round-trip augmentation during training.
Listening Test

Author: Ruiqiang Huang  ·  Audio generated by scripts/generate_web_samples.py --n 5 --seed 42  ·  RNNoise via librnnoise (xiph.org)  ·  DPDFNet-8 (3.54M params)  ·  Config: configs/dpdfnet8_general_voip.yaml

Mixture
RNNoise — classic NS, no BVC
DPDFNet-8 pretrained — no BVC
DPDFNet-8 fine-tuned ep20 — BVC
Target (foreground only)