Course detail

Modern Methods of Speech Processing

FIT-MZDAcad. year: 2020/2021

From simple systems to stochastic modelling. Hidden Markov models. Large vocabulary continuous speech recognition. Language models. Speech production, speech perception: time and frequency. Data-driven methods for feature extraction. Speech databases. Excitation in speech coding, CELP. Speaker identification.

Language of instruction

Czech

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

prof. Dr. Ing. Jan Černocký

Department

Department of Computer Graphics and Multimedia (UPGM)

Learning outcomes of the course unit

This course allows students to implement simple speech processinga pplications, as for example voice command of a process. However, first of all it enables them to join the development of complex systems for speech recognition and coding systems, using modern methods, in academic and industrial environments.

Prerequisites

basic knowledge of digitial signal processing, having attended a basic course on speech processing is advantageous.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Not applicable.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

We will mention methods currently implemented in industrial applications (such as mobile phones or commercially available recognizers) but will not promissing methods existing so far only in laboratories. Attention will be paid to techniques derived using data and inspired by human autition and speech production.

Specification of controlled education, way of implementation and compensation for absences

attending the course is not checked, the evaluation of the course is upon the results of exam or final report.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Recommended reading

Moore, B.C.J., : An introduction to the psychology of hearing, Academic Press, 1989
Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998
Fukunaga, K.: Introduction to Statistical Pattern Recognition, Academic Press, 1990
Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998
Dutoit, T.: An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997
Ben Gold, Nelson Morgan, Dan Ellis: Speech and Audio Signal Processing: Processing and Perception of Speech and Music Hardcover, Wiley-Interscience; 2nd Edition, 2011.
Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995
Dong Yu, Li Deng: Automatic Speech Recognition: A Deep Learning Approach, Springer, 2014.
Homayoon Beigi: Fundamentals of Speaker Recognition, Springer, 2011
Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000
Daniel Jurafsky, James H. Martin: SPEECH & LANGUAGE PROCESSING, 2nd edition, Prentice Hall, 2008.
Texts from http://www.fit.vutbr.cz/~cernocky/speech/

Classification of course in study plans

Programme VTI-DR-4 Doctoral
branch DVI4 , any year of study, winter semester, elective
Programme VTI-DR-4 Doctoral
branch DVI4 , any year of study, winter semester, elective
Programme VTI-DR-4 Doctoral
branch DVI4 , any year of study, winter semester, elective
Programme VTI-DR-4 Doctoral
branch DVI4 , any year of study, winter semester, elective

Type of course unit

Lecture

39 hours, optionally

Teacher / Lecturer

prof. Dr. Ing. Jan Černocký

Syllabus

Review of notions: signal vectors and parameter matrices, basic statistics.
Stochastic modeling of parameters, modeling of time by state sequences.
Hidden Markov models: basic structure, training.
Recognition of speech using HMM: Viterbi search, token passing.
Pronunciation dictionaries and language models.
Speech production and derived parameters: LPC, Log area ratios, line spectral pairs.
Speech perception and derived parameters: Mel-frequency cepstral coefficients, Perceptual linear prediction.
Temporal properties of hearing - RASTA filtering.
Training the feature extractor on the data - linear discriminant analysis.
Speech databases: standards, contents, speakers, annotations.
Vocoders and modeling of the excitation: multi-pulse and stochastic excitations (GSM coding).
CELP coding: long-term predictor, codebooks. Very low bit-rate coders.
Current methods of speaker identification and verification.

Guided consultation in combined form of studies

26 hours, optionally

Teacher / Lecturer

prof. Dr. Ing. Jan Černocký

VUT

Faculties

University Institutes

Parts

Modern Methods of Speech Processing

Type of course unit