Publication detail

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

ALAM, J. BOULIANNE, G. BURGET, L. DAHMANE, M. DIEZ SÁNCHEZ, M. GLEMBEK, O. LALONDE, M. LOZANO DÍEZ, A. MATĚJKA, P. MIZERA, P. MOŠNER, L. NOISEUX, C. MONTEIRO, J. NOVOTNÝ, O. PLCHOT, O. ROHDIN, J. SILNOVA, A. SLAVÍČEK, J. STAFYLAKIS, T. ST-CHARLES, P. WANG, S. ZEINALI, H.

Original Title

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

English Title

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

Type

conference paper

Language

en

Original Abstract

We present a condensed description and analysis of the joint submission of ABC team for NIST SRE 2019, by BUT, CRIM, Phonexia, Omilia and UAM. We concentrate on challenges that arose during development and we analyze the results obtained on the evaluation data and on our development sets. The conversational telephone speech (CMN2) condition is challenging for current state-of-the-art systems, mainly due to the language mismatch between training and test data. We show that a combination of adversarial domain adaptation, backend adaptation and score normalization can mitigate this mismatch. On the VAST condition, we demonstrate the importance of deploying diarization when dealing with multi-speaker utterances and the drastic improvements that can be obtained by combining audio and visual modalities.

English abstract

We present a condensed description and analysis of the joint submission of ABC team for NIST SRE 2019, by BUT, CRIM, Phonexia, Omilia and UAM. We concentrate on challenges that arose during development and we analyze the results obtained on the evaluation data and on our development sets. The conversational telephone speech (CMN2) condition is challenging for current state-of-the-art systems, mainly due to the language mismatch between training and test data. We show that a combination of adversarial domain adaptation, backend adaptation and score normalization can mitigate this mismatch. On the VAST condition, we demonstrate the importance of deploying diarization when dealing with multi-speaker utterances and the drastic improvements that can be obtained by combining audio and visual modalities.

Keywords

speaker verification, NIST SRE, CMN, VAST, system fusion.

Released

01.11.2020

Publisher

International Speech Communication Association

Location

Tokyo

Pages from

289

Pages to

295

Pages count

7

URL

BibTex


@inproceedings{BUT164070,
  author="Jahangir {Alam} and Gilles {Boulianne} and Lukáš {Burget} and Mireia {Diez Sánchez} and Ondřej {Glembek} and Alicia {Lozano Díez} and Pavel {Matějka} and Ladislav {Mošner} and Ondřej {Novotný} and Oldřich {Plchot} and Johan Andréas {Rohdin} and Anna {Silnova} and Themos {Stafylakis} and Shuai {Wang} and Hossein {Zeinali}",
  title="Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge",
  annote="We present a condensed description and analysis of the joint submission of ABC
team for NIST SRE 2019, by BUT, CRIM, Phonexia, Omilia and UAM. We concentrate on
challenges that arose during development and we analyze the results obtained on
the evaluation data and on our development sets. The conversational telephone
speech (CMN2) condition is challenging for current state-of-the-art systems,
mainly due to the language mismatch between training and test data. We show that
a combination of adversarial domain adaptation, backend adaptation and score
normalization can mitigate this mismatch. On the VAST condition, we demonstrate
the importance of deploying diarization when dealing with multi-speaker
utterances and the drastic improvements that can be obtained by combining audio
and visual modalities.",
  address="International Speech Communication Association",
  booktitle="Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop",
  chapter="164070",
  doi="10.21437/Odyssey.2020-41",
  edition="NEUVEDEN",
  howpublished="online",
  institution="International Speech Communication Association",
  year="2020",
  month="november",
  pages="289--295",
  publisher="International Speech Communication Association",
  type="conference paper"
}