Detail publikace

PHONEME RECOGNITION OF MEETINGS USING AUDIO-VISUAL DATA

MOTLÍČEK, P., BURGET, L., ČERNOCKÝ, J.

Originální název

PHONEME RECOGNITION OF MEETINGS USING AUDIO-VISUAL DATA

Anglický název

PHONEME RECOGNITION OF MEETINGS USING AUDIO-VISUAL DATA

Jazyk

en

Originální abstrakt

The movement of speaker's faces are known to convey visual information that can improve speech intelligibility especially in case of somehow corrupted or noisy data. Therefore, availability of visual data could be exploited to enhance automatic speech recognition task. This paper demonstrates the use of visual parameters extracted from video for automatic recognition of context-independent phoneme strings from meeting data. Encouraged by the good performance of audio-visual systems utilized to work with "visually clean" data (limited variation in the speaker's frontal pose, lighting conditions, background, etc.), we investigate their efficiency in non-ideal conditions which are introduced by meeting audio-visual data employed in our experiments. A major issue is the phoneme recognition task based on combination of the audio and visual data so that the best use can be made of the two modalities together.

Anglický abstrakt

The movement of speaker's faces are known to convey visual information that can improve speech intelligibility especially in case of somehow corrupted or noisy data. Therefore, availability of visual data could be exploited to enhance automatic speech recognition task. This paper demonstrates the use of visual parameters extracted from video for automatic recognition of context-independent phoneme strings from meeting data. Encouraged by the good performance of audio-visual systems utilized to work with "visually clean" data (limited variation in the speaker's frontal pose, lighting conditions, background, etc.), we investigate their efficiency in non-ideal conditions which are introduced by meeting audio-visual data employed in our experiments. A major issue is the phoneme recognition task based on combination of the audio and visual data so that the best use can be made of the two modalities together.

Dokumenty

BibTex


@misc{BUT60054,
  author="Petr {Motlíček} and Lukáš {Burget} and Jan {Černocký}",
  title="PHONEME RECOGNITION OF MEETINGS USING AUDIO-VISUAL DATA",
  annote="The movement of speaker's faces are known to convey visual information
that can improve speech intelligibility especially in case of somehow
corrupted or noisy data. Therefore, availability of visual data could
be exploited to enhance automatic speech recognition task. This paper
demonstrates the use of visual parameters extracted from video for
automatic recognition of context-independent phoneme strings from
meeting data. Encouraged by the good performance of audio-visual
systems utilized to work with "visually clean" data (limited variation
in the speaker's frontal pose, lighting conditions,
background, etc.), we investigate their efficiency in non-ideal
conditions which are introduced by meeting audio-visual data employed
in our experiments. A major issue is the phoneme recognition task based
on combination of the audio and visual data so that the best use can be
made of the two modalities together. 

", booktitle="AMI Workshop", chapter="60054", year="2004", month="june", type="abstract" }