Detail publikace

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

Originální název

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

Anglický název

Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

Jazyk

en

Originální abstrakt

This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing phone-state posteriors and the second making use of a compressive NN layer. They are combined with three different QbE detectors: while the Gaussian mixture model/hidden Markov model (GMM/HMM) and dynamic time warping (DTW) both work on continuous feature vectors, the third one, based on weighted finite-state transducers (WFST), processes phone lattices.

Anglický abstrakt

This article investigates query-by-example (QbE) spoken term detection (STD), in which the query is not entered as text, but selected in speech data or spoken. Two feature extractors based on neural networks (NN) are introduced: the first producing phone-state posteriors and the second making use of a compressive NN layer. They are combined with three different QbE detectors: while the Gaussian mixture model/hidden Markov model (GMM/HMM) and dynamic time warping (DTW) both work on continuous feature vectors, the third one, based on weighted finite-state transducers (WFST), processes phone lattices.

BibTex


@article{BUT97057,
  author="Javier {Tejedor} and Michal {Fapšo} and Igor {Szőke} and Jan {Černocký} and František {Grézl}",
  title="Comparison of methods for language-dependent and language-independent query-by-example spoken term detection",
  annote="This article investigates query-by-example (QbE) spoken term detection (STD), in
which the query is not entered as text, but selected in speech data or spoken.
Two feature extractors based on neural networks (NN) are introduced: the first
producing phone-state posteriors and the second making use of a compressive NN
layer. They are combined with three different QbE detectors: while the Gaussian
mixture model/hidden Markov model (GMM/HMM) and dynamic time warping (DTW) both
work on continuous feature vectors, the third one, based on weighted finite-state
transducers (WFST), processes phone lattices.",
  address="Association for Computing Machinery",
  booktitle="ACM Transactions on Information Systems (TOIS)",
  chapter="97057",
  doi="10.1145/2328967.2328971",
  edition="NEUVEDEN",
  howpublished="print",
  institution="Association for Computing Machinery",
  number="30",
  volume="2012",
  year="2012",
  month="august",
  pages="1--34",
  publisher="Association for Computing Machinery",
  type="journal article - other"
}