Detail publikace

Voice activity detection in video mediated communication from gaze

Originální název

Voice activity detection in video mediated communication from gaze

Anglický název

Voice activity detection in video mediated communication from gaze

Jazyk

en

Originální abstrakt

This paper discuses prediction of active speaker in multi-party video mediated communication from gaze data. In the explored setting, we predict voice activity of participants in one room based on gaze recordings of a single participant in another room. The two rooms were connected by high definition and low delay audio and video links and the participants engaged in different activities ranging from casual discussion to simple casual games. We treat the task as classification problem. We evaluate different types of features and parameter setting in the context of Support Vector Machine classification framework. The results show that the speaker activity can be correctly predicted with the proposed approach in 90 % of the time for which the gaze data are available.

Anglický abstrakt

This paper discuses prediction of active speaker in multi-party video mediated communication from gaze data. In the explored setting, we predict voice activity of participants in one room based on gaze recordings of a single participant in another room. The two rooms were connected by high definition and low delay audio and video links and the participants engaged in different activities ranging from casual discussion to simple casual games. We treat the task as classification problem. We evaluate different types of features and parameter setting in the context of Support Vector Machine classification framework. The results show that the speaker activity can be correctly predicted with the proposed approach in 90 % of the time for which the gaze data are available.

BibTex


@inproceedings{BUT91461,
  author="Michal {Hradiš} and Shahram {Eivazi} and Roman {Bednařík}",
  title="Voice activity detection in video mediated communication from gaze",
  annote="This paper discuses prediction of active speaker in multi-party video mediated
communication from gaze data. In the explored setting, we predict voice activity
of participants in one room based on gaze recordings of a single participant in
another room. The two rooms were connected by high definition and low delay audio
and video links and the participants engaged in different activities ranging from
casual discussion to simple casual games. We treat the task as classification
problem. We evaluate different types of features and parameter setting in the
context of Support Vector Machine classification framework. The results show that
the speaker activity can be correctly predicted with the proposed approach in 90
% of the time for which the gaze data are available.",
  address="Association for Computing Machinery",
  booktitle="ETRA '12 Proceedings of the Symposium on Eye Tracking Research and Applications",
  chapter="91461",
  doi="10.1145/2168556.2168628",
  edition="NEUVEDEN",
  howpublished="print",
  institution="Association for Computing Machinery",
  year="2012",
  month="march",
  pages="329--332",
  publisher="Association for Computing Machinery",
  type="conference paper"
}