Modern Methods of Speech Processing
FIT-MZDAcad. year: 2019/2020
From simple systems to stochastic modelling. Hidden Markov
models. Large vocabulary continuous speech recognition. Language
models. Speech production, speech perception: time and frequency.
Data-driven methods for feature extraction. Speech databases.
Excitation in speech coding, CELP. Speaker identification.
Learning outcomes of the course unit
This course allows students to implement simple speech processinga
pplications, as for example voice command of a process. However, first
of all it enables them to join the development of complex systems for
speech recognition and coding systems, using modern methods, in
academic and industrial environments.
basic knowledge of digitial signal processing, having attended a basic course on speech processing is advantageous.
Recommended optional programme components
Recommended or required reading
Moore, B.C.J., : An introduction to the psychology of hearing, Academic Press, 1989 (EN)
Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998 (EN)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, Academic Press, 1990 (EN)
Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998 (EN)
Dutoit, T.: An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997 (EN)
Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995 (EN)
Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000 (EN)
Texty z http://www.fit.vutbr.cz/~cernocky/speech/ (EN)
Planned learning activities and teaching methods
Assesment methods and criteria linked to learning outcomes
Language of instruction
We will mention methods currently implemented in industrial
applications (such as mobile phones or commercially available
recognizers) but will not promissing methods existing so far only in
laboratories. Attention will be paid to techniques derived using data
and inspired by human autition and speech production.
Specification of controlled education, way of implementation and compensation for absences
attending the course is not checked, the evaluation of the course is upon the results of exam or final report.
Type of course unit
39 hours, optionally
Teacher / Lecturer
- Review of notions: signal vectors and parameter matrices, basic statistics.
- Stochastic modeling of parameters, modeling of
time by state sequences.
- Hidden Markov models: basic structure, training.
- Recognition of speech using HMM: Viterbi search, token
- Pronunciation dictionaries and language models.
- Speech production and derived parameters: LPC, Log area ratios, line spectral pairs.
- Speech perception and derived parameters: Mel-frequency cepstral coefficients, Perceptual linear prediction.
- Temporal properties of hearing - RASTA filtering.
- Training the feature extractor on the data - linear discriminant analysis.
- Speech databases: standards, contents, speakers, annotations.
- Vocoders and modeling of the excitation: multi-pulse and stochastic excitations (GSM coding).
- CELP coding: long-term predictor, codebooks. Very low bit-rate coders.
- Current methods of speaker identification and verification.
Guided consultation in combined form of studies
26 hours, optionally
Teacher / Lecturer
eLearning: currently opened course