Detail publikace

Brno University of Technology at MediaEval 2011 Genre Tagging Task

Originální název

Brno University of Technology at MediaEval 2011 Genre Tagging Task

Anglický název

Brno University of Technology at MediaEval 2011 Genre Tagging Task

Jazyk

en

Originální abstrakt

This paper briefly describes our approach to the video genre tagging task which was a part of MediaEval 2011. We focused mainly on visual and audio information, and we exploited metadata and automatic speech transcripts only in a very basic way. Our approach relied on classification and on classifier fusion to combine different sources of information. We did not use any additional training data except the very small exemplary set provided by MediaEval (only 246 videos). The best performance was achieved by metadata alone. Combination with the other sources of information did not improve results in the submitted runs. This was achieved later by choosing more suitable weights in fusion. Excluding the metadata, audio and video gave better results than speech transcripts. Using classifiers for 345 semantic classes from TRECVID 2011 semantic indexing (SIN) task to project the data worked better than classifying directly from video and audio features.

Anglický abstrakt

This paper briefly describes our approach to the video genre tagging task which was a part of MediaEval 2011. We focused mainly on visual and audio information, and we exploited metadata and automatic speech transcripts only in a very basic way. Our approach relied on classification and on classifier fusion to combine different sources of information. We did not use any additional training data except the very small exemplary set provided by MediaEval (only 246 videos). The best performance was achieved by metadata alone. Combination with the other sources of information did not improve results in the submitted runs. This was achieved later by choosing more suitable weights in fusion. Excluding the metadata, audio and video gave better results than speech transcripts. Using classifiers for 345 semantic classes from TRECVID 2011 semantic indexing (SIN) task to project the data worked better than classifying directly from video and audio features.

BibTex


@inproceedings{BUT91115,
  author="Michal {Hradiš} and Ivo {Řezníček} and Kamil {Behúň}",
  title="Brno University of Technology at MediaEval 2011 Genre Tagging Task",
  annote="This paper briefly describes our approach to the video genre tagging task which
was a part of MediaEval 2011. We focused mainly on visual and audio information,
and we exploited metadata and automatic speech transcripts 
only in a very basic way. Our approach relied on classification and on classifier
fusion to combine 
different sources of information. We did not use any additional training data
except the very small
exemplary set provided by MediaEval (only 246 videos). The best performance was
achieved by metadata alone.
Combination with the other sources of information did not improve results in the
submitted runs. This was achieved
later by choosing more suitable weights in fusion. Excluding the metadata, 
audio and video gave better results than speech transcripts. Using classifiers
for 345 semantic classes
from TRECVID 2011 semantic indexing (SIN) task to project the data worked better
than classifying directly from video and audio features.",
  address="CEUR-WS.org",
  booktitle="Working Notes Proceedings of the MediaEval 2011 Workshop",
  chapter="91115",
  edition="NEUVEDEN",
  howpublished="online",
  institution="CEUR-WS.org",
  number="9",
  year="2011",
  month="september",
  pages="1--2",
  publisher="CEUR-WS.org",
  type="conference paper"
}