Publication detail

Automatic Language Identification using Phoneme and Automatically Derived Unit Strings

MATĚJKA, P., SZŐKE, I., SCHWARZ, P., ČERNOCKÝ, J.

Original Title

Automatic Language Identification using Phoneme and Automatically Derived Unit Strings

English Title

Automatic Language Identification using Phoneme and Automatically Derived Unit Strings

Type

journal article - other

Language

en

Original Abstract

Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by  an Ergodic HMM (EHMM)  are compared. The  phoneme recognizers were trained on 6  languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using
the EHMM-derived units.

English abstract

Language identification (LID) based on phono-tactic modeling is presented in this paper. Approaches using phoneme strings and strings of units automatically derived by  an Ergodic HMM (EHMM)  are compared. The  phoneme recognizers were trained on 6  languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID results are obtained on 4 languages. The results show superiority of Czech phoneme recognizer while used in LID and promising trends using
the EHMM-derived units.

Keywords

language identificaton, phoneme recognizer, speech processing, ergodic hidden Markov model

RIV year

2004

Released

08.09.2004

ISBN

0302-9743

Periodical

Lecture Notes in Computer Science

Year of study

2004

Number

3206

State

DE

Pages from

147

Pages to

154

Pages count

8

URL

Documents

BibTex


@article{BUT45738,
  author="Pavel {Matějka} and Igor {Szőke} and Petr {Schwarz} and Jan {Černocký}",
  title="Automatic Language Identification using Phoneme and Automatically Derived Unit Strings",
  annote="Language identification (LID) based on phono-tactic modeling is
presented in this paper. Approaches using phoneme strings and strings
of units automatically derived by  an Ergodic HMM (EHMM)  are
compared. The  phoneme recognizers were trained on 6 
languages from OGI multi-language-corpus and Czech SpeechDat-E. The LID
results are obtained on 4 languages. The results show superiority of
Czech phoneme recognizer while used in LID and promising trends using
the EHMM-derived units.", booktitle="Lecture Notes in Computer Science", chapter="45738", journal="Lecture Notes in Computer Science", number="3206", volume="2004", year="2004", month="september", pages="147", type="journal article - other" }