Detail publikace

On the use of X-vectors for Robust Speaker Recognition

NOVOTNÝ, O. PLCHOT, O. MATĚJKA, P. MOŠNER, L. GLEMBEK, O.

Originální název

On the use of X-vectors for Robust Speaker Recognition

Anglický název

On the use of X-vectors for Robust Speaker Recognition

Jazyk

en

Originální abstrakt

Text-independent speaker verification (SV) is currently in the process of embracing DNN modeling in every stage of SV system. Slowly, the DNN-based approaches such as end-to-end modelling and systems based on DNN embeddings start to be competitive even in challenging and diverse channel conditions of recent NIST SREs. Domain adaptation and the need for a large amount of training data are still a challenge for current discriminative systems and (unlike with generative models), we see significant gains from data augmentation, simulation and other techniques designed to overcome lack of training data. We present an analysis of a SV system based on DNN embeddings (x-vectors) and focus on robustness across diverse data domains such as standard telephone and microphone conversations, both in clean, noisy and reverberant environments. We also evaluate the system on challenging far-field data created by re-transmitting a subset of NIST SRE 2008 and 2010 microphone interviews. We compare our results with the stateof- the-art i-vector system. In general, we were able to achieve better performance with the DNN-based systems, but most importantly, we have confirmed the robustness of such systems across multiple data domains.

Anglický abstrakt

Text-independent speaker verification (SV) is currently in the process of embracing DNN modeling in every stage of SV system. Slowly, the DNN-based approaches such as end-to-end modelling and systems based on DNN embeddings start to be competitive even in challenging and diverse channel conditions of recent NIST SREs. Domain adaptation and the need for a large amount of training data are still a challenge for current discriminative systems and (unlike with generative models), we see significant gains from data augmentation, simulation and other techniques designed to overcome lack of training data. We present an analysis of a SV system based on DNN embeddings (x-vectors) and focus on robustness across diverse data domains such as standard telephone and microphone conversations, both in clean, noisy and reverberant environments. We also evaluate the system on challenging far-field data created by re-transmitting a subset of NIST SRE 2008 and 2010 microphone interviews. We compare our results with the stateof- the-art i-vector system. In general, we were able to achieve better performance with the DNN-based systems, but most importantly, we have confirmed the robustness of such systems across multiple data domains.

Dokumenty

BibTex


@inproceedings{BUT155075,
  author="Ondřej {Novotný} and Oldřich {Plchot} and Pavel {Matějka} and Ladislav {Mošner} and Ondřej {Glembek}",
  title="On the use of X-vectors for Robust Speaker Recognition",
  annote="Text-independent speaker verification (SV) is currently in the process of
embracing DNN modeling in every stage of SV system. Slowly, the DNN-based
approaches such as end-to-end modelling and systems based on DNN embeddings start
to be competitive even in challenging and diverse channel conditions of recent
NIST SREs. Domain adaptation and the need for a large amount of training data are
still a challenge for current discriminative systems and (unlike with generative
models), we see significant gains from data augmentation, simulation and other
techniques designed to overcome lack of training data. We present an analysis of
a SV system based on DNN embeddings (x-vectors) and focus on robustness across
diverse data domains such as standard telephone and microphone conversations,
both in clean, noisy and reverberant environments. We also evaluate the system on
challenging far-field data created by re-transmitting a subset of NIST SRE 2008
and 2010 microphone interviews. We compare our results with the stateof- the-art
i-vector system. In general, we were able to achieve better performance with the
DNN-based systems, but most importantly, we have confirmed the robustness of such
systems across multiple data domains.",
  address="International Speech Communication Association",
  booktitle="Proceedings of Odyssey 2018",
  chapter="155075",
  doi="10.21437/Odyssey.2018-24",
  edition="NEUVEDEN",
  howpublished="online",
  institution="International Speech Communication Association",
  number="6",
  year="2018",
  month="june",
  pages="168--175",
  publisher="International Speech Communication Association",
  type="conference paper"
}