Detail publikace

iVector-Based Discriminative Adaptation for Automatic Speech Recognition

Originální název

iVector-Based Discriminative Adaptation for Automatic Speech Recognition

Anglický název

iVector-Based Discriminative Adaptation for Automatic Speech Recognition

Jazyk

en

Originální abstrakt

The iVector is a low-dimensional fixed-length representation of information about speaker and acoustic environment. To utilize iVectors for adaptation, region dependent linear transforms (RDLT) are discriminatively trained using the MPE criterion on large amounts of annotated data to extract the relevant information from iVectors and to compensate speech features. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well-tuned RDLT system with standard CMLLR adaptation we reached an 0.8% additive absolute WER improvement.

Anglický abstrakt

The iVector is a low-dimensional fixed-length representation of information about speaker and acoustic environment. To utilize iVectors for adaptation, region dependent linear transforms (RDLT) are discriminatively trained using the MPE criterion on large amounts of annotated data to extract the relevant information from iVectors and to compensate speech features. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well-tuned RDLT system with standard CMLLR adaptation we reached an 0.8% additive absolute WER improvement.

BibTex


@inproceedings{BUT76442,
  author="Martin {Karafiát} and Lukáš {Burget} and Pavel {Matějka} and Ondřej {Glembek} and Jan {Černocký}",
  title="iVector-Based Discriminative Adaptation for Automatic Speech Recognition",
  annote="The iVector is a low-dimensional fixed-length representation of information about
speaker and acoustic environment. To utilize iVectors for adaptation, region
dependent linear transforms (RDLT) are discriminatively trained using the MPE
criterion on large amounts of annotated data to extract the relevant information
from iVectors and to compensate speech features. The approach was tested on
standard CTS data. We found it to be complementary to common adaptation
techniques. On a well-tuned RDLT system with standard CMLLR adaptation we reached
an 0.8% additive absolute WER improvement.",
  address="IEEE Signal Processing Society",
  booktitle="Proceedings of ASRU 2011",
  chapter="76442",
  edition="NEUVEDEN",
  howpublished="print",
  institution="IEEE Signal Processing Society",
  year="2011",
  month="december",
  pages="152--157",
  publisher="IEEE Signal Processing Society",
  type="conference paper"
}