Publication detail

Low Level Source Code Optimizing for Single/Multi/core Digital Signal Processors

FRÝZA, T. MEGO, R.

Original Title

Low Level Source Code Optimizing for Single/Multi/core Digital Signal Processors

English Title

Low Level Source Code Optimizing for Single/Multi/core Digital Signal Processors

Type

conference paper

Language

en

Original Abstract

Paper presents the optimized implementation of the digital signal processing algorithms (real and complex Fast Fourier Transforms) for the specific hardware architecture. The algorithms' source codes were optimized at low level, while all redundant operations (e.g. branching instructions) were avoided. Contrary to results compiled from the high level codes, time consuming load/store operations were considerably eliminated as well and temporal data were stored in the general purpose registers. Contrary to other implementations, the several calls of the identical functions (but with shared data) provide a~reducing of the processor idle states. The TMS320C6748 and TMS320C6678 digital signal processors with the Very Long Instruction Word architecture were used for the implementation of proposed functions. The average duration of FFT optimized functions is between five CPU cycles for four real values and 44 CPU cycles for sixteen real values, respectively.

English abstract

Paper presents the optimized implementation of the digital signal processing algorithms (real and complex Fast Fourier Transforms) for the specific hardware architecture. The algorithms' source codes were optimized at low level, while all redundant operations (e.g. branching instructions) were avoided. Contrary to results compiled from the high level codes, time consuming load/store operations were considerably eliminated as well and temporal data were stored in the general purpose registers. Contrary to other implementations, the several calls of the identical functions (but with shared data) provide a~reducing of the processor idle states. The TMS320C6748 and TMS320C6678 digital signal processors with the Very Long Instruction Word architecture were used for the implementation of proposed functions. The average duration of FFT optimized functions is between five CPU cycles for four real values and 44 CPU cycles for sixteen real values, respectively.

Keywords

digital signal processors; parallel architectures; floating-point arithmetic; Fourier transforms; discrete cosine transforms; high performance computing

RIV year

2013

Released

17.04.2013

ISBN

978-1-4673-5517-9

Book

MAREW 2013

Pages from

294

Pages to

297

Pages count

4

BibTex


@inproceedings{BUT100358,
  author="Tomáš {Frýza} and Roman {Mego}",
  title="Low Level Source Code Optimizing for Single/Multi/core Digital Signal Processors",
  annote="Paper presents the optimized implementation of the digital signal processing algorithms (real and complex Fast Fourier Transforms) for the specific hardware architecture. The algorithms' source codes were optimized at low level, while all redundant operations (e.g. branching instructions) were avoided. Contrary to results compiled from the high level codes, time consuming load/store operations were considerably eliminated as well and temporal data were stored in the general purpose registers. Contrary to other implementations, the several calls of the identical functions (but with shared data) provide a~reducing of the processor idle states. The TMS320C6748 and TMS320C6678 digital signal processors with the Very Long Instruction Word architecture were used for the implementation of proposed functions. The average duration of FFT optimized functions is between five CPU cycles for four real values and 44 CPU cycles for sixteen real values, respectively.",
  booktitle="MAREW 2013",
  chapter="100358",
  howpublished="online",
  year="2013",
  month="april",
  pages="294--297",
  type="conference paper"
}