Detail publikace

BUT System Description to VoxCeleb Speaker Recognition Challenge 2019

ZEINALI, H. WANG, S. SILNOVA, A. MATĚJKA, P. PLCHOT, O.

Originální název

BUT System Description to VoxCeleb Speaker Recognition Challenge 2019

Anglický název

BUT System Description to VoxCeleb Speaker Recognition Challenge 2019

Jazyk

en

Originální abstrakt

In this report, we describe the submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019. We also provide a brief analysis of different systems on VoxCeleb-1 test sets. Submitted systems for both Fixed and Open conditions are a fusion of 4 Convolutional Neural Network (CNN) topologies. The first and second networks have ResNet34 topology and use twodimensional CNNs. The last two networks are one-dimensional CNN and are based on the x-vector extraction topology. Some of the networks are fine-tuned using additive margin angular softmax. Kaldi FBanks and Kaldi PLPs were used as features. The difference between Fixed and Open systems lies in the used training data and fusion strategy. The best systems for Fixed and Open conditions achieved 1.42 % and 1.26 % ERR on the challenge evaluation set respectively.

Anglický abstrakt

In this report, we describe the submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019. We also provide a brief analysis of different systems on VoxCeleb-1 test sets. Submitted systems for both Fixed and Open conditions are a fusion of 4 Convolutional Neural Network (CNN) topologies. The first and second networks have ResNet34 topology and use twodimensional CNNs. The last two networks are one-dimensional CNN and are based on the x-vector extraction topology. Some of the networks are fine-tuned using additive margin angular softmax. Kaldi FBanks and Kaldi PLPs were used as features. The difference between Fixed and Open systems lies in the used training data and fusion strategy. The best systems for Fixed and Open conditions achieved 1.42 % and 1.26 % ERR on the challenge evaluation set respectively.

Dokumenty

BibTex


@inproceedings{BUT168476,
  author="Hossein {Zeinali} and Shuai {Wang} and Anna {Silnova} and Pavel {Matějka} and Oldřich {Plchot}",
  title="BUT System Description to VoxCeleb Speaker Recognition Challenge 2019",
  annote="In this report, we describe the submission of Brno University of Technology (BUT)
team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019. We also provide
a brief analysis of different systems on VoxCeleb-1 test sets. Submitted systems
for both Fixed and Open conditions are a fusion of 4 Convolutional Neural Network
(CNN) topologies. The first and second networks have ResNet34 topology and use
twodimensional CNNs. The last two networks are one-dimensional CNN and are based
on the x-vector extraction topology. Some of the networks are fine-tuned using
additive margin angular softmax. Kaldi FBanks and Kaldi PLPs were used as
features. The difference between Fixed and Open systems lies in the used training
data and fusion strategy. The best systems for Fixed and Open conditions achieved
1.42 % and 1.26 % ERR on the challenge evaluation set respectively.",
  address="NEUVEDEN",
  booktitle="Proceedings of The VoxCeleb Challange Workshop 2019",
  chapter="168476",
  edition="NEUVEDEN",
  howpublished="online",
  institution="NEUVEDEN",
  year="2019",
  month="september",
  pages="1--4",
  publisher="NEUVEDEN",
  type="conference paper"
}