Detail publikace

Combination of MFCC and TRAP features for LVCSR of meeting data

KARAFIÁT, M., GRÉZL, F., BURGET, L.

Originální název

Combination of MFCC and TRAP features for LVCSR of meeting data

Anglický název

Combination of MFCC and TRAP features for LVCSR of meeting data

Jazyk

en

Originální abstrakt

he aim of this work is to examine TempoRAl Patterns (TRAPs) based feature extraction for the task of large vocabulary continuous speech recognition (LVCSR). Previously, TRAPs based features were mainly used in conjunction with hybrid NN-HMM recognition system (the conectionist approach). In this work, we use Tandem-TRAPS system to generate speech features, which are then used as an input for a standard GMM-HMM system. This approach allows for more precise modeling of phonetic context (context dependent models), which is important for LVCSR. Experiments are carried out on ICSI meetings database. For TRAPS processing, it is shown that use of frequency differentiation and local operators can significantly improve recognition performance. Performances obtained with TRAPs based features and convetional MFCC features are compared. Although stand-alone TRAPs based features never outperform MFCC in our experiments, we have reported an improvement over MFCC when TRAPs based features and MFCC features are combined together. The combined features are created by concatenation of the original feature streams followed by Heteroscedastic Linear Discriminant Analysis to perform decorelation and dimensionality reduction. Compared to previous works, the big advantage is brought by HLDA which combines the two feature streams optimally without strong assumptions imposed on data by previously used transforms (as PCA and LDA)

Anglický abstrakt

he aim of this work is to examine TempoRAl Patterns (TRAPs) based feature extraction for the task of large vocabulary continuous speech recognition (LVCSR). Previously, TRAPs based features were mainly used in conjunction with hybrid NN-HMM recognition system (the conectionist approach). In this work, we use Tandem-TRAPS system to generate speech features, which are then used as an input for a standard GMM-HMM system. This approach allows for more precise modeling of phonetic context (context dependent models), which is important for LVCSR. Experiments are carried out on ICSI meetings database. For TRAPS processing, it is shown that use of frequency differentiation and local operators can significantly improve recognition performance. Performances obtained with TRAPs based features and convetional MFCC features are compared. Although stand-alone TRAPs based features never outperform MFCC in our experiments, we have reported an improvement over MFCC when TRAPs based features and MFCC features are combined together. The combined features are created by concatenation of the original feature streams followed by Heteroscedastic Linear Discriminant Analysis to perform decorelation and dimensionality reduction. Compared to previous works, the big advantage is brought by HLDA which combines the two feature streams optimally without strong assumptions imposed on data by previously used transforms (as PCA and LDA)

Dokumenty

BibTex


@misc{BUT63339,
  author="Martin {Karafiát} and František {Grézl} and Lukáš {Burget}",
  title="Combination of MFCC and TRAP features for LVCSR of meeting data",
  annote="he aim of this work is to examine TempoRAl Patterns (TRAPs) based
feature extraction for the task of large vocabulary continuous speech
recognition (LVCSR). Previously, TRAPs based features were mainly used
in conjunction with hybrid NN-HMM recognition system (the conectionist
approach). In this work, we use Tandem-TRAPS system to generate speech
features, which are then used as an input for a standard GMM-HMM
system. This approach allows for more precise modeling of phonetic
context (context dependent models), which is important for LVCSR.
Experiments are carried out on ICSI meetings database. For TRAPS
processing, it is shown that use of frequency differentiation and local
operators can significantly improve recognition performance.
Performances obtained with TRAPs based features and convetional MFCC
features are compared. Although stand-alone TRAPs based features never
outperform MFCC in our experiments, we have reported an improvement
over MFCC when TRAPs based features and MFCC features are combined
together. The combined features are created by concatenation of the
original feature streams followed by Heteroscedastic Linear
Discriminant Analysis to perform decorelation and dimensionality
reduction. Compared to previous works, the big advantage is brought by
HLDA which combines the two feature streams optimally without strong
assumptions imposed on data by previously used transforms (as PCA and
LDA) 
", chapter="63339", year="2004", month="december", type="presentation" }