Detail publikace

Non-parametric Speaker Turn Segmentation of Meeting Data

MOTLÍČEK, P., BURGET, L., ČERNOCKÝ, J.

Originální název

Non-parametric Speaker Turn Segmentation of Meeting Data

Anglický název

Non-parametric Speaker Turn Segmentation of Meeting Data

Jazyk

en

Originální abstrakt

An extension of conventional speaker segmentation framework is presented for a scenario in which a number of microphones record the activity of speakers present at a meeting (one microphone per speaker). Although each microphone can receive speech from both the participant wearing the microphone (local speech) and other participants (cross-talk), the recorded audio can be broadly classified in three ways: local speech, cross-talk, and silence. This paper proposes a technique which takes into account cross-correlations, values of its maxima, and energy differences as features to identify and segment speaker turns. In particular, we have used classical cross-correlation functions, time smoothing and in part temporal constraints to sharpen and disambiguate timing differences between microphone channels that may be dominated by noise and reverberation. Experimental results show that proposed technique can be successively used for speaker segmentation of data collected from a number of different setups.

Anglický abstrakt

An extension of conventional speaker segmentation framework is presented for a scenario in which a number of microphones record the activity of speakers present at a meeting (one microphone per speaker). Although each microphone can receive speech from both the participant wearing the microphone (local speech) and other participants (cross-talk), the recorded audio can be broadly classified in three ways: local speech, cross-talk, and silence. This paper proposes a technique which takes into account cross-correlations, values of its maxima, and energy differences as features to identify and segment speaker turns. In particular, we have used classical cross-correlation functions, time smoothing and in part temporal constraints to sharpen and disambiguate timing differences between microphone channels that may be dominated by noise and reverberation. Experimental results show that proposed technique can be successively used for speaker segmentation of data collected from a number of different setups.

Dokumenty

BibTex


@inproceedings{BUT18288,
  author="Petr {Motlíček} and Lukáš {Burget} and Jan {Černocký}",
  title="Non-parametric Speaker Turn Segmentation of Meeting Data",
  annote="An extension of conventional speaker segmentation framework is presented for a scenario in which a number of microphones record the activity of speakers present at a meeting (one microphone per speaker). Although each microphone can receive speech from both the participant wearing the microphone (local speech) and other participants (cross-talk), the recorded audio can be broadly classified in three ways: local speech, cross-talk, and silence. This paper proposes a technique which takes into account cross-correlations, values of its maxima, and energy differences as features to identify and segment speaker turns. In particular, we have used classical cross-correlation functions, time smoothing and in part temporal constraints to sharpen and disambiguate timing differences between microphone channels that may be dominated by noise and reverberation. Experimental results show that proposed technique can be successively used for speaker segmentation of data collected from a number of different setups.",
  address="International Speech Communication Association",
  booktitle="Interspeech'2005 - Eurospeech - 9th European Conference on Speech Communication and Technology",
  chapter="18288",
  institution="International Speech Communication Association",
  journal="5th European Conference EUROSPEECH 97",
  number="9",
  year="2005",
  month="september",
  pages="657--660",
  publisher="International Speech Communication Association",
  type="conference paper"
}