Detail publikace

DNN Based Embeddings for Language Recognition

LOZANO DÍEZ, A. PLCHOT, O. MATĚJKA, P. GONZALEZ-RODRIGUEZ, J.

Originální název

DNN Based Embeddings for Language Recognition

Anglický název

DNN Based Embeddings for Language Recognition

Jazyk

en

Originální abstrakt

In this work, we present a language identification (LID) system based on embeddings. In our case, an embedding is a fixed-length vector (similar to i-vector) that represents the whole utterance, but unlike i-vector it is designed to contain mostly information relevant to the target task (LID). In order to obtain these embeddings, we train a deep neural network (DNN) with sequence summarization layer to classify languages. In particular, we trained a DNN based on bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) layers, whose frame-by-frame outputs are summarized into mean and standard deviation statistics. After this pooling layer, we add two fully connected layers whose outputs correspond to embeddings. Finally, we add a softmax output layer and train the whole network with multi-class cross-entropy objective to discriminate between languages. We report our results on NIST LRE 2015 and we compare the performance of embeddings and corresponding i-vectors both modeled by Gaussian Linear Classifier (GLC). Using only embeddings resulted in comparable performance to i-vectors and by performing score-level fusion we achieved 7.3% relative improvement over the baseline.

Anglický abstrakt

In this work, we present a language identification (LID) system based on embeddings. In our case, an embedding is a fixed-length vector (similar to i-vector) that represents the whole utterance, but unlike i-vector it is designed to contain mostly information relevant to the target task (LID). In order to obtain these embeddings, we train a deep neural network (DNN) with sequence summarization layer to classify languages. In particular, we trained a DNN based on bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) layers, whose frame-by-frame outputs are summarized into mean and standard deviation statistics. After this pooling layer, we add two fully connected layers whose outputs correspond to embeddings. Finally, we add a softmax output layer and train the whole network with multi-class cross-entropy objective to discriminate between languages. We report our results on NIST LRE 2015 and we compare the performance of embeddings and corresponding i-vectors both modeled by Gaussian Linear Classifier (GLC). Using only embeddings resulted in comparable performance to i-vectors and by performing score-level fusion we achieved 7.3% relative improvement over the baseline.

Dokumenty

BibTex


@inproceedings{BUT155045,
  author="Alicia {Lozano Díez} and Oldřich {Plchot} and Pavel {Matějka} and Joaquin {Gonzalez-Rodriguez}",
  title="DNN Based Embeddings for Language Recognition",
  annote="In this work, we present a language identification (LID) system based on
embeddings. In our case, an embedding is a fixed-length vector (similar to
i-vector) that represents the whole utterance, but unlike i-vector it is designed
to contain mostly information relevant to the target task (LID). In order to
obtain these embeddings, we train a deep neural network (DNN) with sequence
summarization layer to classify languages. In particular, we trained a DNN based
on bidirectional long short-term memory (BLSTM) recurrent neural network (RNN)
layers, whose frame-by-frame outputs are summarized into mean and standard
deviation statistics. After this pooling layer, we add two fully connected layers
whose outputs correspond to embeddings. Finally, we add a softmax output layer
and train the whole network with multi-class cross-entropy objective to
discriminate between languages. We report our results on NIST LRE 2015 and we
compare the performance of embeddings and corresponding i-vectors both modeled by
Gaussian Linear Classifier (GLC). Using only embeddings resulted in comparable
performance to i-vectors and by performing score-level fusion we achieved 7.3%
relative improvement over the baseline.",
  address="IEEE Signal Processing Society",
  booktitle="Proceedings of ICASSP 2018",
  chapter="155045",
  doi="10.1109/ICASSP.2018.8462403",
  edition="NEUVEDEN",
  howpublished="electronic, physical medium",
  institution="IEEE Signal Processing Society",
  year="2018",
  month="april",
  pages="5184--5188",
  publisher="IEEE Signal Processing Society",
  type="conference paper"
}