Publication detail

Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources

BENEŠ, K. IRIE, K. BECK, E. SCHLÜTER, R. NEY, H.

Original Title

Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources

English Title

Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources

Type

conference paper

Language

en

Original Abstract

Classically, automatic speech recognition (ASR) models are decomposed into acoustic models and language models (LM). LMs usually exploit the linguistic structure on a purely textual level and usually contribute strongly to an ASR systems performance. LMs are estimated on large amounts of textual data covering the target domain. However, most utterances cover more speci c topics, e.g. in uencing the vocabulary used. Therefore, it's desirable to have the LM adjusted to an utterance's topic. Previous work achieves this by crawling extra data from the web or by using signi cant amounts of previous speech data to train topic-speci c LM on. We propose a way of adapting the LM directly using the target utterance to be recognized. The corresponding adaptation needs to be done in an unsupervised or automatically supervised way based on the speech input. To deal with corresponding errors robustly, we employ topic encodings from the recently proposed Subspace Multinomial Model. This model also avoids any need of explicit topic labelling during training or recognition, making the proposed method straight-forward to use. We demonstrate the performance of the method on the Librispeech corpus, which consists of read ction books, and we discuss it's behaviour qualitatively.

English abstract

Classically, automatic speech recognition (ASR) models are decomposed into acoustic models and language models (LM). LMs usually exploit the linguistic structure on a purely textual level and usually contribute strongly to an ASR systems performance. LMs are estimated on large amounts of textual data covering the target domain. However, most utterances cover more speci c topics, e.g. in uencing the vocabulary used. Therefore, it's desirable to have the LM adjusted to an utterance's topic. Previous work achieves this by crawling extra data from the web or by using signi cant amounts of previous speech data to train topic-speci c LM on. We propose a way of adapting the LM directly using the target utterance to be recognized. The corresponding adaptation needs to be done in an unsupervised or automatically supervised way based on the speech input. To deal with corresponding errors robustly, we employ topic encodings from the recently proposed Subspace Multinomial Model. This model also avoids any need of explicit topic labelling during training or recognition, making the proposed method straight-forward to use. We demonstrate the performance of the method on the Librispeech corpus, which consists of read ction books, and we discuss it's behaviour qualitatively.

Keywords

speech recognition

Released

18.03.2019

Publisher

DEGA Head office, Deutsche Gesellschaft für Akustik

Location

Rostock

ISBN

978-3-939296-14-0

Book

Proceedings of DAGA 2019

Edition

NEUVEDEN

Edition number

NEUVEDEN

Pages from

954

Pages to

957

Pages count

4

URL

Documents

BibTex


@inproceedings{BUT160005,
  author="Karel {Beneš}",
  title="Unsupervised Language Model Adaptation for Speech Recognition with no Extra Resources",
  annote="Classically, automatic speech recognition (ASR) models are decomposed into
acoustic models and language models (LM). LMs usually exploit the linguistic
structure on a purely textual level and usually contribute strongly to an ASR
systems performance. LMs are estimated on large amounts of textual data covering
the target domain. However, most utterances cover more specic topics, e.g. in
uencing the vocabulary used. Therefore, it's desirable to have the LM adjusted to
an utterance's topic. Previous work achieves this by crawling extra data from the
web or by using signicant amounts of previous speech data to train topic-specic
LM on. We propose a way of adapting the LM directly using the target utterance to
be recognized. The corresponding adaptation needs to be done in an unsupervised
or automatically supervised way based on the speech input. To deal with
corresponding errors robustly, we employ topic encodings from the recently
proposed Subspace Multinomial Model. This model also avoids any need of explicit
topic labelling during training or recognition, making the proposed method
straight-forward to use. We demonstrate the performance of the method on the
Librispeech corpus, which consists of read ction books, and we discuss it's
behaviour qualitatively.",
  address="DEGA Head office, Deutsche Gesellschaft für Akustik",
  booktitle="Proceedings of DAGA 2019",
  chapter="160005",
  edition="NEUVEDEN",
  howpublished="online",
  institution="DEGA Head office, Deutsche Gesellschaft für Akustik",
  year="2019",
  month="march",
  pages="954--957",
  publisher="DEGA Head office, Deutsche Gesellschaft für Akustik",
  type="conference paper"
}