Publication detail

Automatic Language Identification using Phoneme and Automatically Derived Unit Strings

MATĚJKA, P., SZŐKE, I., SCHWARZ, P., ČERNOCKÝ, J.

Original Title

Automatic Language Identification using Phoneme and Automatically Derived Unit Strings

English Title

Automatic Language Identification using Phoneme and Automatically Derived Unit Strings

Type

conference paper

Language

en

Original Abstract

Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by an Ergodic HMM (EHMM) are compared. The phoneme recognizers were trained on 6 languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using the EHMM-derived units.

English abstract

Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by an Ergodic HMM (EHMM) are compared. The phoneme recognizers were trained on 6 languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using the EHMM-derived units.

Keywords

language identificaton, phoneme recognizer, speech processing, ergodic hidden Markov model

RIV year

2004

Released

08.09.2004

Publisher

Springer

Location

Brno

ISBN

3-540-23049-1

Book

Proceedings of 7th International Conference Text,Speech and Dialoque 2004

Pages from

147

Pages to

154

Pages count

8

Documents

BibTex


@inproceedings{BUT11955,
  author="Pavel {Matějka} and Igor {Szőke} and Petr {Schwarz} and Jan {Černocký}",
  title="Automatic Language Identification using Phoneme and Automatically Derived Unit Strings",
  annote="Language identification (LID) based on phono-tactic modeling is presented in this paper.
Approaches using phoneme strings and strings of units automatically derived by  an Ergodic HMM
(EHMM)  are compared. The  phoneme recognizers were trained on 6  languages from OGI
multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The
results show superiority of Czech phoneme recognizer while used in LID and promising trends using the EHMM-derived units.",
  address="Springer",
  booktitle="Proceedings of 7th International Conference Text,Speech and Dialoque 2004",
  chapter="11955",
  institution="Springer",
  year="2004",
  month="september",
  pages="147",
  publisher="Springer",
  type="conference paper"
}