Publication detail

BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition

SOCHOR, J. HEROUT, A. HAVEL, J.

Original Title

BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition

English Title

BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition

Type

conference paper

Language

en

Original Abstract

We are dealing with the problem of fine-grained vehicle make&model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itself - and feeding it into the deep convolutional neural network boosts the recognition performance considerably. This additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.

English abstract

We are dealing with the problem of fine-grained vehicle make&model recognition and verification. Our contribution is showing that extracting additional data from the video stream - besides the vehicle image itself - and feeding it into the deep convolutional neural network boosts the recognition performance considerably. This additional information includes: 3D vehicle bounding box used for "unpacking" the vehicle image, its rasterized low-resolution shape, and information about the 3D vehicle orientation. Experiments show that adding such information decreases classification error by 26% (the accuracy is improved from 0.772 to 0.832) and boosts verification average precision by 208% (0.378 to 0.785) compared to baseline pure CNN without any input modifications. Also, the pure baseline CNN outperforms the recent state of the art solution by 0.081. We provide an annotated set "BoxCars" of surveillance vehicle images augmented by various automatically extracted auxiliary information. Our approach and the dataset can considerably improve the performance of traffic surveillance systems.

Keywords

Fine-grained recognition, vehicles, CNN, input modification

Released

10.03.2016

Publisher

IEEE Computer Society

Location

Las Vegas

ISBN

978-1-4673-8851-1

Book

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Edition

NEUVEDEN

Edition number

NEUVEDEN

Pages from

3006

Pages to

3015

Pages count

10

URL

Documents

BibTex


@inproceedings{BUT130949,
  author="Jakub {Sochor} and Adam {Herout} and Jiří {Havel}",
  title="BoxCars: 3D Boxes as CNN Input for Improved Fine-Grained Vehicle Recognition",
  annote="We are dealing with the problem of fine-grained vehicle make&model recognition
and verification. Our contribution is showing that extracting additional data
from the video stream - besides the vehicle image itself - and feeding it into
the deep convolutional neural network boosts the recognition performance
considerably. This additional information includes: 3D vehicle bounding box used
for "unpacking" the vehicle image, its rasterized low-resolution shape, and
information about the 3D vehicle orientation. Experiments show that adding such
information decreases classification error by 26% (the accuracy is improved from
0.772 to 0.832) and boosts verification average precision by 208% (0.378 to
0.785) compared to baseline pure CNN without any input modifications. Also, the
pure baseline CNN outperforms the recent state of the art solution by 0.081. We
provide an annotated set "BoxCars" of surveillance vehicle images augmented by
various automatically extracted auxiliary information. Our approach and the
dataset can considerably improve the performance of traffic surveillance
systems.",
  address="IEEE Computer Society",
  booktitle="The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
  chapter="130949",
  doi="10.1109/CVPR.2016.328",
  edition="NEUVEDEN",
  howpublished="online",
  institution="IEEE Computer Society",
  number="6",
  year="2016",
  month="march",
  pages="3006--3015",
  publisher="IEEE Computer Society",
  type="conference paper"
}