Publication detail

Composite Data Type Recovery in a Retargetable Decompilation

MATULA, P. KOLÁŘ, D.

Original Title

Composite Data Type Recovery in a Retargetable Decompilation

English Title

Composite Data Type Recovery in a Retargetable Decompilation

Type

conference paper

Language

en

Original Abstract

Retargetable decompilation is a reverse engineering technique performing a transformation of a platform-dependent binary file into a high level language (HLL) representation. Despite its complexity, several new decompilers or other similar binary analysis utilities have been developed in recent years. They are not yet advanced enough to serve as a standalone tool, but combined with the traditional disassemblers, they allow much faster understanding of the analysed machine code. To achieve the necessary quality, many advanced analyses must be performed on the input file. One of the toughest, but most rewarding, is the data type reconstruction analysis. It aims to assign each object (i.e. variables, functions' parameters and returns) with a high level type, preferably the same as in the original source code. This paper presents the composite data type analysis used by the retargetable decompiler developed within the Lissom project at FIT BUT. Its basic principles are based on an existing state of the art approach. However, we design a whole new patter creation and recognition algorithm, which is both retargetable and suitable for operating on code in SSA form. Moreover, we devise a new pattern aggregation rules that increase the quality of recovered data types. The solution is tested on several real world applications compiled for three different architectures.

English abstract

Retargetable decompilation is a reverse engineering technique performing a transformation of a platform-dependent binary file into a high level language (HLL) representation. Despite its complexity, several new decompilers or other similar binary analysis utilities have been developed in recent years. They are not yet advanced enough to serve as a standalone tool, but combined with the traditional disassemblers, they allow much faster understanding of the analysed machine code. To achieve the necessary quality, many advanced analyses must be performed on the input file. One of the toughest, but most rewarding, is the data type reconstruction analysis. It aims to assign each object (i.e. variables, functions' parameters and returns) with a high level type, preferably the same as in the original source code. This paper presents the composite data type analysis used by the retargetable decompiler developed within the Lissom project at FIT BUT. Its basic principles are based on an existing state of the art approach. However, we design a whole new patter creation and recognition algorithm, which is both retargetable and suitable for operating on code in SSA form. Moreover, we devise a new pattern aggregation rules that increase the quality of recovered data types. The solution is tested on several real world applications compiled for three different architectures.

Keywords

decompilation, reverse engineering, executable analysis, Lissom, data types, data type reconstruction

RIV year

2014

Released

17.10.2014

Publisher

NOVPRESS s.r.o.

Location

Telč

ISBN

978-80-214-5022-6

Book

Proceedings of the 9th Doctoral Workshop on Mathematical and Engineering Methods in Computer Science

Edition

NEUVEDEN

Edition number

NEUVEDEN

Pages from

63

Pages to

76

Pages count

12

Documents

BibTex


@inproceedings{BUT111654,
  author="Peter {Matula} and Dušan {Kolář}",
  title="Composite Data Type Recovery in a Retargetable Decompilation",
  annote="Retargetable decompilation is a reverse engineering technique performing
a transformation of a platform-dependent binary file into a high level language
(HLL) representation. Despite its complexity, several new decompilers or other
similar binary analysis utilities have been developed in recent years. They are
not yet advanced enough to serve as a standalone tool, but combined with the
traditional disassemblers, they allow much faster understanding of the analysed
machine code. To achieve the necessary quality, many advanced analyses must be
performed on the input file. One of the toughest, but most rewarding, is the data
type reconstruction analysis. It aims to assign each object (i.e. variables,
functions' parameters and returns) with a high level type, preferably the same as
in the original source code.
This paper presents the composite data type analysis used by the retargetable
decompiler developed within the Lissom project at FIT BUT. Its basic principles
are based on an existing state of the art approach. However, we design a whole
new patter creation and recognition algorithm, which is both retargetable and
suitable for operating on code in SSA form. Moreover, we devise a new pattern
aggregation rules that increase the quality of recovered data types. The solution
is tested on several real world applications compiled for three different
architectures.",
  address="NOVPRESS s.r.o.",
  booktitle="Proceedings of the 9th Doctoral Workshop on Mathematical and Engineering Methods in Computer Science",
  chapter="111654",
  edition="NEUVEDEN",
  howpublished="print",
  institution="NOVPRESS s.r.o.",
  year="2014",
  month="october",
  pages="63--76",
  publisher="NOVPRESS s.r.o.",
  type="conference paper"
}