Detail publikace

Modeling of Spectra and Temporal Trajectories in Speech Processing, PhD thesis

MOTLÍČEK, P.

Originální název

Typ

dizertace

Jazyk

angličtina

Originální abstrakt

This work investigates the application of spectral and temporal speech processing algorithms developed for feature extraction in Automatic Speech Recognition (ASR) and for very low bit-rate speech coding. In the first part of the thesis, various spectral processing feature extraction techniques are investigated for robust parameterization of speech. We are especially focused on all-pole modeling based techniques that use, as the major processing block, autoregressive model to suppress speaker-dependent details in the auditory spectrum. Such techniques that use the model spectrum are advantageous as opposed to directly using signal auditory spectrum. The model spectrum can be represented by various types of parameters that have different properties (decorrelation property, quantization, robustness on additive and convolutive noise, ...). We show that even though cepstrum-based speech features are mostly used for ASR, the best recognition performances are achieved using decorrelated and normalized Line Spectral Frequencies (LSFs). Furthermore, frequency selective and discrete all-pole modeling approaches are studied and their efficient properties on final speech features are presented. We take also into account feature normalization techniques and mention their influence on extracted speech features. The most significant experimental results are achieved on well-known SpeechDat-Car databases, which consist of speech data recorded in real environments. We also show the practical use of all-pole model based features in a front-end module developed for Distributed Speech Recognition (DSR) system for cellular telephony. Some benefits across various noisy environments in this application are achieved. Further, the work investigates inclusion of temporal information as an additional speech parameter in ASR. We primarily concentrate on the use of frequency-localized temporal trajectories (patterns). TempoRAl Patterns (TRAPs) are used to estimate phonetic class-posteriors independently in each frequency band. These class-posteriors are merged to create a set of features used in a speech recognizer. First, we describe a novel approach to extract these frequency-localized temporal patterns from speech. As opposed to the conventional technique, our algorithm is fully designed in temporal-domain, which has several benefits for other processing. Then, we investigate the possible use of standard static feature extraction techniques to model the temporal trajectories. Various linear transforms are taken into account. Frequency components of frequency-localized TRAPs are referred to as modulation spectral components. In our work we also study the effect of discarding the higher modulation spectral components on ASR. In the last part, the use of spectral and temporal processing techniques for Very Low Bit-Rate (VLBR) speech coding is investigated. The speech coding is one of the most dominant real applications in speech processing. We study the properties of autoregressive model-based features employed in a speech coder. Our coding approach is based on a proper selection of speech units automatically derived by an Automatic Language Independent Speech Processing (ALISP) tool. The benefits of this approach are that we achieve rates of several hundred bits per second, and that we do not need transcribed speech database for derivation of speech units. Novel techniques for modeling temporal trajectories of speech are proposed to reduce the

Klíčová slova

automatic speech processing, speech recognition, features for speech recognition, autoregressive modeling, all-pole modeling, temporal filtering, neural networks, speech coding, very low bit-rate speech coding

Autoři

MOTLÍČEK, P.

Vydáno

1. 9. 2003

Nakladatel

Faculty of Information Technology BUT

Místo

Brno

Strany od

Strany do

138

Strany počet

138

URL

http://www.fit.vutbr.cz/~motlicek/publi/2003/thesis/root.pdf, http://www.fit.vutbr.cz/~motlicek/publi/2003/thesis/root.ps.gz

BibTex

@phdthesis{BUT66695,
  author="Petr {Motlíček}",
  title="Modeling of Spectra and Temporal Trajectories in Speech Processing, PhD thesis",
  publisher="Faculty of Information Technology BUT",
  address="Brno",
  pages="1--138",
  year="2003",
  url="http://www.fit.vutbr.cz/~motlicek/publi/2003/thesis/root.pdf, http://www.fit.vutbr.cz/~motlicek/publi/2003/thesis/root.ps.gz"
}

VUT

Fakulty

Vysokoškolské ústavy

Součásti

Modeling of Spectra and Temporal Trajectories in Speech Processing, PhD thesis