Detail publikace

Analysis Of DNN Approaches To Speaker Identification

MATĚJKA, P. GLEMBEK, O. NOVOTNÝ, O. PLCHOT, O. GRÉZL, F. BURGET, L. ČERNOCKÝ, J.

Originální název

Analysis Of DNN Approaches To Speaker Identification

Anglický název

Analysis Of DNN Approaches To Speaker Identification

Jazyk

en

Originální abstrakt

This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization and we analyze the usage of BN and MFCC features (and their concatenation) in the two stages. We also show the effect of using full-covariance GMM models, and, as a contrast, we compare the result to the recent DNN-alignment approach. On the NIST SRE2010, telephone condition, we show 60% relative gain over the traditional MFCC baseline for EER (and similar for the NIST DCF metrics), resulting in 0.94% EER.

Anglický abstrakt

This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization and we analyze the usage of BN and MFCC features (and their concatenation) in the two stages. We also show the effect of using full-covariance GMM models, and, as a contrast, we compare the result to the recent DNN-alignment approach. On the NIST SRE2010, telephone condition, we show 60% relative gain over the traditional MFCC baseline for EER (and similar for the NIST DCF metrics), resulting in 0.94% EER.

Dokumenty

BibTex


@inproceedings{BUT130927,
  author="Pavel {Matějka} and Ondřej {Glembek} and Ondřej {Novotný} and Oldřich {Plchot} and František {Grézl} and Lukáš {Burget} and Jan {Černocký}",
  title="Analysis Of DNN Approaches To Speaker Identification",
  annote="This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN)
features together with the traditional MFCC features in the task of
i-vector-based speaker recognition. We decouple the sufficient statistics
extraction by using separate GMM models for frame alignment, and for statistics
normalization and we analyze the usage of BN and MFCC features (and their
concatenation) in the two stages. We also show the effect of using
full-covariance GMM models, and, as a contrast, we compare the result to the
recent DNN-alignment approach. On the NIST SRE2010, telephone condition, we show
60% relative gain over the traditional MFCC baseline for EER (and similar for the
NIST DCF metrics), resulting in 0.94% EER.",
  address="IEEE Signal Processing Society",
  booktitle="Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016",
  chapter="130927",
  doi="10.1109/ICASSP.2016.7472649",
  edition="NEUVEDEN",
  howpublished="electronic, physical medium",
  institution="IEEE Signal Processing Society",
  year="2016",
  month="march",
  pages="5100--5104",
  publisher="IEEE Signal Processing Society",
  type="conference paper"
}