Detail publikace

GPU Optimization of Convolution for Large 3-D Real Images

Originální název

GPU Optimization of Convolution for Large 3-D Real Images

Anglický název

GPU Optimization of Convolution for Large 3-D Real Images

Jazyk

en

Originální abstrakt

In this paper, we propose a method for computing convolution of large 3-D images with respect to real signals. The convolution is performed in a frequency domain using a convolution theorem. Due to properties of real signals, the algorithm can be optimized so that both time and the memory consumption are halved when compared to complex signals of the same size. Convolution is decomposed in a frequency domain using the decimation in frequency (DIF) algorithm. The algorithm is accelerated on a graphics hardware by means of the CUDA parallel computing model, achieving up to 10x speedup with a single GPU over an optimized implementation on a quad-core CPU.

Anglický abstrakt

In this paper, we propose a method for computing convolution of large 3-D images with respect to real signals. The convolution is performed in a frequency domain using a convolution theorem. Due to properties of real signals, the algorithm can be optimized so that both time and the memory consumption are halved when compared to complex signals of the same size. Convolution is decomposed in a frequency domain using the decimation in frequency (DIF) algorithm. The algorithm is accelerated on a graphics hardware by means of the CUDA parallel computing model, achieving up to 10x speedup with a single GPU over an optimized implementation on a quad-core CPU.

BibTex


@inproceedings{BUT97536,
  author="Pavel {Karas} and David {Svoboda} and Pavel {Zemčík}",
  title="GPU Optimization of Convolution for Large 3-D Real Images",
  annote="In this paper, we propose a method for computing convolution of large 3-D images
with respect to real signals. The convolution is performed in a frequency domain
using a convolution theorem. Due to properties of real signals, the algorithm can
be optimized so that both time and the memory consumption are halved when
compared to complex signals of the same size. Convolution is decomposed in
a frequency domain using the decimation in frequency (DIF) algorithm. The
algorithm is accelerated on a graphics hardware by means of the CUDA parallel
computing model, achieving up to 10x speedup with a single GPU over an optimized
implementation on a quad-core CPU.",
  address="Springer Verlag",
  booktitle="Proceedings of ACVIS 2012",
  chapter="97536",
  doi="10.1007/978-3-642-33140-4_6",
  edition="NEUVEDEN",
  howpublished="print",
  institution="Springer Verlag",
  year="2012",
  month="september",
  pages="59--71",
  publisher="Springer Verlag",
  type="conference paper"
}