Detail publikace

End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA

ROHDIN, J. SILNOVA, A. DIEZ SÁNCHEZ, M. PLCHOT, O. MATĚJKA, P. BURGET, L.

Originální název

End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA

Anglický název

End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA

Jazyk

en

Originální abstrakt

Recently, several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto- end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Anglický abstrakt

Recently, several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of endto- end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Dokumenty

BibTex


@inproceedings{BUT155046,
  author="Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Oldřich {Plchot} and Pavel {Matějka} and Lukáš {Burget}",
  title="End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA",
  annote="Recently, several end-to-end speaker verification systems based on deep neural
networks (DNNs) have been proposed. These systems have been proven to be
competitive for text-dependent tasks as well as for text-independent tasks with
short utterances. However, for text-independent tasks with longer utterances,
end-to-end systems are still outperformed by standard i-vector + PLDA systems. In
this work, we develop an end-to-end speaker verification system that is
initialized to mimic an i-vector + PLDA baseline. The system is then further
trained in an end-to-end manner but regularized so that it does not deviate too
far from the initial system. In this way we mitigate overfitting which normally
limits the performance of endto- end systems. The proposed system outperforms the
i-vector + PLDA baseline on both long and short duration utterances.",
  address="IEEE Signal Processing Society",
  booktitle="Proceedings of ICASSP",
  chapter="155046",
  doi="10.1109/ICASSP.2018.8461958",
  edition="NEUVEDEN",
  howpublished="electronic, physical medium",
  institution="IEEE Signal Processing Society",
  year="2018",
  month="april",
  pages="4874--4878",
  publisher="IEEE Signal Processing Society",
  type="conference paper"
}