Detail publikace

Dereverberation and Beamforming in Far-Field Speaker Recognition

MOŠNER, L. MATĚJKA, P. NOVOTNÝ, O. ČERNOCKÝ, J.

Originální název

Dereverberation and Beamforming in Far-Field Speaker Recognition

Anglický název

Dereverberation and Beamforming in Far-Field Speaker Recognition

Jazyk

en

Originální abstrakt

This paper deals with far-field speaker recognition. On a corpus of NIST SRE 2010 data retransmitted in a real room with multiple microphones, we first demonstrate how room acoustics cause significant degradation of state-of-the-art ivector based speaker recognition system. We then investigate several techniques to improve the performances ranging from probabilistic linear discriminant analysis (PLDA) re-training, through dereverberation, to beamforming. We found that weighted prediction error (WPE) based dereverberation combined with generalized eigenvalue beamformer with powerspectral density (PSD) weighting masks generated by neural networks (NN) provides results approaching the clean closemicrophone setup. Further improvement was obtained by re-training PLDA or the mask-generating NNs on simulated target data. The work shows that a speaker recognition system working robustly in the far-field scenario can be developed.

Anglický abstrakt

This paper deals with far-field speaker recognition. On a corpus of NIST SRE 2010 data retransmitted in a real room with multiple microphones, we first demonstrate how room acoustics cause significant degradation of state-of-the-art ivector based speaker recognition system. We then investigate several techniques to improve the performances ranging from probabilistic linear discriminant analysis (PLDA) re-training, through dereverberation, to beamforming. We found that weighted prediction error (WPE) based dereverberation combined with generalized eigenvalue beamformer with powerspectral density (PSD) weighting masks generated by neural networks (NN) provides results approaching the clean closemicrophone setup. Further improvement was obtained by re-training PLDA or the mask-generating NNs on simulated target data. The work shows that a speaker recognition system working robustly in the far-field scenario can be developed.

Dokumenty

BibTex


@inproceedings{BUT155039,
  author="Ladislav {Mošner} and Pavel {Matějka} and Ondřej {Novotný} and Jan {Černocký}",
  title="Dereverberation and Beamforming in Far-Field Speaker Recognition",
  annote="This paper deals with far-field speaker recognition. On a corpus of NIST SRE 2010
data retransmitted in a real room with multiple microphones, we first demonstrate
how room acoustics cause significant degradation of state-of-the-art ivector
based speaker recognition system. We then investigate several techniques to
improve the performances ranging from probabilistic linear discriminant analysis
(PLDA) re-training, through dereverberation, to beamforming. We found that
weighted prediction error (WPE) based dereverberation combined with generalized
eigenvalue beamformer with powerspectral density (PSD) weighting masks generated
by neural networks (NN) provides results approaching the clean closemicrophone
setup. Further improvement was obtained by re-training PLDA or the
mask-generating NNs on simulated target data. The work shows that a speaker
recognition system working robustly in the far-field scenario can be developed.",
  address="IEEE Signal Processing Society",
  booktitle="Proceedings of ICASSP 2018",
  chapter="155039",
  doi="10.1109/ICASSP.2018.8462365",
  edition="NEUVEDEN",
  howpublished="electronic, physical medium",
  institution="IEEE Signal Processing Society",
  year="2018",
  month="april",
  pages="5254--5258",
  publisher="IEEE Signal Processing Society",
  type="conference paper"
}