Detail publikace

Semantic Class Detectors in Video Genre Recognition

Originální název

Semantic Class Detectors in Video Genre Recognition

Anglický název

Semantic Class Detectors in Video Genre Recognition

Jazyk

en

Originální abstrakt

This paper presents our approach to video genre recognition which we developed for MediaEval 2011 evaluation. We treat the genre recognition task as a classification problem. We encode visual information in standard way using local features and Bag of Word representation. Audio channel is parameterized in similar way starting from its spectrogram. Further,  we exploit available automatic speech transcripts and user generated meta-data for which we compute BOW representations as well. It is reasonable to expect that semantic content of a video is strongly related to its genre, and if this semantic information was available it would make genre recognition simpler and more reliable. To this end, we used annotations for 345 semantic classes from TRECVID 2011 semantic indexing task to train semantic class detectors. Responses of these detectors were then used as features for genre recognition. The paper explains the approach in detail, it shows relative performance of the individual features and their combinations measured on MediaEval 2011 genre recognition dataset, and it sketches possible future research. The results show that, although, meta-data is more informative compared to the content-based features, results are improved by adding content-based information to the meta-data. Despite the fact that the semantic detectors were trained on completely different dataset, using them as feature extractors on the target dataset provides better result than the original low-level audio and video features.

Anglický abstrakt

This paper presents our approach to video genre recognition which we developed for MediaEval 2011 evaluation. We treat the genre recognition task as a classification problem. We encode visual information in standard way using local features and Bag of Word representation. Audio channel is parameterized in similar way starting from its spectrogram. Further,  we exploit available automatic speech transcripts and user generated meta-data for which we compute BOW representations as well. It is reasonable to expect that semantic content of a video is strongly related to its genre, and if this semantic information was available it would make genre recognition simpler and more reliable. To this end, we used annotations for 345 semantic classes from TRECVID 2011 semantic indexing task to train semantic class detectors. Responses of these detectors were then used as features for genre recognition. The paper explains the approach in detail, it shows relative performance of the individual features and their combinations measured on MediaEval 2011 genre recognition dataset, and it sketches possible future research. The results show that, although, meta-data is more informative compared to the content-based features, results are improved by adding content-based information to the meta-data. Despite the fact that the semantic detectors were trained on completely different dataset, using them as feature extractors on the target dataset provides better result than the original low-level audio and video features.

BibTex


@inproceedings{BUT91447,
  author="Michal {Hradiš} and Ivo {Řezníček} and Kamil {Behúň}",
  title="Semantic Class Detectors in Video Genre Recognition",
  annote="This paper presents our approach to video genre recognition which we developed
for MediaEval 2011 evaluation. We treat the genre recognition task as
a classification problem. We encode visual information in standard way using
local features and Bag of Word representation. Audio channel is parameterized in
similar way starting from its spectrogram. Further,  we exploit available
automatic speech transcripts and user generated meta-data for which we compute
BOW representations as well. It is reasonable to expect that semantic content of
a video is strongly related to its genre, and if this semantic information was
available it would make genre recognition simpler and more reliable. To this end,
we used annotations for 345 semantic classes from TRECVID 2011 semantic indexing
task to train semantic class detectors. Responses of these detectors were then
used as features for genre recognition. The paper explains the approach in
detail, it shows relative performance of the individual features and their
combinations measured on MediaEval 2011 genre recognition dataset, and it
sketches possible future research. The results show that, although, meta-data is
more informative compared to the content-based features, results are improved by
adding content-based information to the meta-data. Despite the fact that the
semantic detectors were trained on completely different dataset, using them as
feature extractors on the target dataset provides better result than the original
low-level audio and video features.",
  address="SciTePress - Science and Technology Publications",
  booktitle="Proceedings of VISAPP 2012",
  chapter="91447",
  edition="NEUVEDEN",
  howpublished="print",
  institution="SciTePress - Science and Technology Publications",
  year="2012",
  month="february",
  pages="640--646",
  publisher="SciTePress - Science and Technology Publications",
  type="conference paper"
}