Publication detail

Reconstruction of Instruction Idioms in a Retargetable Decompiler: Revisited

KŘOUSTEK, J. POKORNÝ, F. KOLÁŘ, D.

Original Title

Reconstruction of Instruction Idioms in a Retargetable Decompiler: Revisited

English Title

Reconstruction of Instruction Idioms in a Retargetable Decompiler: Revisited

Type

journal article in Web of Science

Language

en

Original Abstract

Retargetable executable-code decompilation is a one of the most complicated reverse-engineering tasks. Among others, it involves de-optimization of compiler-optimized code. One type of such an optimization is usage of so-called instruction idioms. These idioms are used to produce faster or even smaller executable files. On the other hand, decompilation of instruction idioms without any advanced analysis produces almost unreadable high-level language code that may confuse the user of the decompiler. In this paper, we revisit and extend the previous approach of instruction-idioms detection used in a retargetable decompiler developed within the Lissom project. The previous approach was based on detection of instruction idioms in a very-early phase of decompilation (a front-end part) and it was inaccurate for architectures with a complex instruction set (e.g. Intel x86). The novel approach is based on delaying detection of idioms and reconstruction of code to the later phase (a middle-end part). For this purpose, we use the LLVM optimizer and we implement this analysis as a new pass in this tool. According to experimental results, this new approach significantly outperforms the previous approach as well as the other commercial solutions.

English abstract

Retargetable executable-code decompilation is a one of the most complicated reverse-engineering tasks. Among others, it involves de-optimization of compiler-optimized code. One type of such an optimization is usage of so-called instruction idioms. These idioms are used to produce faster or even smaller executable files. On the other hand, decompilation of instruction idioms without any advanced analysis produces almost unreadable high-level language code that may confuse the user of the decompiler. In this paper, we revisit and extend the previous approach of instruction-idioms detection used in a retargetable decompiler developed within the Lissom project. The previous approach was based on detection of instruction idioms in a very-early phase of decompilation (a front-end part) and it was inaccurate for architectures with a complex instruction set (e.g. Intel x86). The novel approach is based on delaying detection of idioms and reconstruction of code to the later phase (a middle-end part). For this purpose, we use the LLVM optimizer and we implement this analysis as a new pass in this tool. According to experimental results, this new approach significantly outperforms the previous approach as well as the other commercial solutions.

Keywords

compiler optimizations, reverse engineering, decompiler, Lissom, instruction idioms, LLVM, LLVM IR

RIV year

2014

Released

01.10.2014

Publisher

NEUVEDEN

Location

NEUVEDEN

Pages from

1337

Pages to

1359

Pages count

22

URL

Documents

BibTex


@article{BUT111511,
  author="Jakub {Křoustek} and Fridolín {Pokorný} and Dušan {Kolář}",
  title="Reconstruction of Instruction Idioms in a Retargetable Decompiler: Revisited",
  annote="Retargetable executable-code decompilation is a one of the most complicated
reverse-engineering tasks. Among others, it involves de-optimization of
compiler-optimized code. One type of such an optimization is usage of so-called
instruction idioms. These idioms are used to produce faster or even smaller
executable files. On the other hand, decompilation of instruction idioms without
any advanced analysis produces almost unreadable high-level language code that
may confuse the user of the decompiler. 

In this paper, we revisit and extend the previous approach of instruction-idioms
detection used in a retargetable decompiler developed within the Lissom project.
The previous approach was based on detection of instruction idioms in
a very-early phase of decompilation (a front-end part) and it was inaccurate for
architectures with a complex instruction set (e.g. Intel x86). The novel approach
is based on delaying detection of idioms and reconstruction of code to the later
phase (a middle-end part). For this purpose, we use the LLVM optimizer and we
implement this analysis as a new pass in this tool. According to experimental
results, this new approach significantly outperforms the previous approach as
well as the other commercial solutions.",
  address="NEUVEDEN",
  chapter="111511",
  doi="10.2298/CSIS131203076K",
  edition="NEUVEDEN",
  howpublished="print",
  institution="NEUVEDEN",
  number="4",
  volume="11",
  year="2014",
  month="october",
  pages="1337--1359",
  publisher="NEUVEDEN",
  type="journal article in Web of Science"
}