Detail publikace

End-to-end DNN based text-independent speaker recognition for long and short utterances

ROHDIN, J. SILNOVA, A. DIEZ SÁNCHEZ, M. PLCHOT, O. MATĚJKA, P. BURGET, L. GLEMBEK, O.

Originální název

End-to-end DNN based text-independent speaker recognition for long and short utterances

Anglický název

End-to-end DNN based text-independent speaker recognition for long and short utterances

Jazyk

en

Originální abstrakt

Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we present an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of end-to-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Anglický abstrakt

Recently several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we present an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. The system is then further trained in an end-to-end manner but regularized so that it does not deviate too far from the initial system. In this way we mitigate overfitting which normally limits the performance of end-to-end systems. The proposed system outperforms the i-vector + PLDA baseline on both long and short duration utterances.

Dokumenty

BibTex


@article{BUT158088,
  author="Johan Andréas {Rohdin} and Anna {Silnova} and Mireia {Diez Sánchez} and Oldřich {Plchot} and Pavel {Matějka} and Lukáš {Burget} and Ondřej {Glembek}",
  title="End-to-end DNN based text-independent speaker recognition for long and short utterances",
  annote="Recently several end-to-end speaker verification systems based on deep neural
networks (DNNs) have been proposed. These systems have been proven to be
competitive for text-dependent tasks as well as for text-independent tasks with
short utterances. However, for text-independent tasks with longer utterances,
end-to-end systems are still outperformed by standard i-vector + PLDA systems. In
this work, we present an end-to-end speaker verification system that is
initialized to mimic an i-vector + PLDA baseline. The system is then further
trained in an end-to-end manner but regularized so that it does not deviate too
far from the initial system. In this way we mitigate overfitting which normally
limits the performance of end-to-end systems. The proposed system outperforms the
i-vector + PLDA baseline on both long and short duration utterances.",
  address="NEUVEDEN",
  chapter="158088",
  doi="10.1016/j.csl.2019.06.002",
  edition="NEUVEDEN",
  howpublished="online",
  institution="NEUVEDEN",
  number="59",
  volume="2020",
  year="2019",
  month="june",
  pages="22--35",
  publisher="NEUVEDEN",
  type="journal article in Web of Science"
}