Course detail

Speech Processing

FEKT-MPC-ZREAcad. year: 2020/2021

The subject gives a comprehensive view of the solution of speech processing occurring in verbal communication. First, speech production, its perception, human auditory system and process of hearing are introduced. Then segmental and suprasegmental parameters that are frequently used in speech analysis are discussed. Furthermore, all important areas of speech processing are mentioned, especially speech analysis, pattern recognition, speech synthesis and coding. The method of pitch analysis, prosody modelling, emotion analysis, analysis of pathological voice, speech de-identification and speech watermarking are added. Attention is also paid to one-channel and multi-channel speech enhancement methods and noise cancellation. In the end, subjective and objective methods of assessing the quality and intelligibility of speech are introduced.

Language of instruction

Czech

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

prof. Ing. Zdeněk Smékal, CSc.

Department

Department of Telecommunications (UTKO)

Learning outcomes of the course unit

On completion of the course, students are able to:
- describe vocal and auditory tract, and the way of speech production and its perception
- analyse speech using most common segmental and suprasegmental parameters
- apply cepstral and linear predictive analysis
- use machine learning in the field of speech processing (speech recognition, speaker recognition, speech pathology identification, emotion detection, etc.)
- design and implement text-to-speech system based on concatenation synthesis
- model vocal tract and perform speech coding
- use objective and subjective tests of speech quality and intelligibility assessment
- enhance speech using one- and multiple-channel methods
- design speech watermarking and de-identification system
- process/analyse speech signals using Matlab environment

Prerequisites

The knowledge on the Bachelor´s degree level is requested. Furthermore, the knowledge of digital signal processing methods and algorithms is required. Moreover, the students should have basics in Matlab programming.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

All lectures of the course are available to students on e-learning and are prepared in the form of presentations. The lectures are supplemented with video and audio samples, which were obtained from research projects. In computer exercises the students will design their own speech analysis systems. At the end of the computer exercises they will have to demonstrate their knowledge solving a given project.

Assesment methods and criteria linked to learning outcomes

Computer lab exercises are mandatory for successfully passing this course and the students have to obtain the required credits. In computer laboratories they can get 30 points of 100 points. The remaining 70 points can be obtained by successfully passing the final exam.

Course curriculum

1. Speech production and its perception. Auditory systems and process of hearing
2. Speech signal analysis, segmental and suprasegmental parameters I, fundamental frequency analysis
3. Speech signal analysis, segmental and suprasegmental parameters II
4. Speech signal analysis III, pattern recognition (classification based on distances)
5. Pattern recognition (statistical classifiers)
6. Speech synthesis, text-to-speech systems, prosody modelling
7. Speech coding and its transmission
8. Objective and subjective methods of speech quality and intelligibility assessment
9. One- and multiple-channel speech enhancement methods
10. Emotion analysis and its application
11. Neurodegenerative disorders analysis
12. Speech watermarking, speech de-identification

Work placements

Not applicable.

Aims

The aim of the course is to give a comprehensive overview of speech communication in information and telecommunication systems. It is intended for students who want to learn the basic and advanced techniques of speech processing, analysis, synthesis, and speech coding. Apart from the basic principles of speaker identification the students will become familiar with problems of separating speech from noisy background, with principles of automatic speech recognition, and with applications in health monitoring systems. In addition, the students will analyse speech in real time in computer lab exercises.

Specification of controlled education, way of implementation and compensation for absences

The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

PSUTKA, J.; MÜLLER, L.; MATOUŠEK, J.; RADOVÁ, V. Mluvíme s počítačem česky. 1. vyd. Praha: Academia, 2006. ISBN 978-80-200-1309-5. (CS)
SMÉKAL, Z. Zpracování řeči. Brno: Vysoké učení technické v Brně, 2012. s. 1-171. ISBN: 978-80-214-4896-4. (CS)

Type of course unit

Lecture

26 hod., optionally

Teacher / Lecturer

prof. Ing. Zdeněk Smékal, CSc.

Laboratory exercise

39 hod., compulsory

Teacher / Lecturer

doc. Ing. Jiří Mekyska, Ph.D.

Elearning

eLearning: currently opened course

VUT

Faculties

University Institutes

Parts

Speech Processing

Type of course unit