Publication detail

Pattern Matching in YARA: Improved Aho-Corasick Algorithm

REGÉCIOVÁ, D. KOLÁŘ, D. MILKOVIČ, M.

Original Title

Pattern Matching in YARA: Improved Aho-Corasick Algorithm

English Title

Pattern Matching in YARA: Improved Aho-Corasick Algorithm

Type

journal article in Web of Science

Language

en

Original Abstract

YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings, and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process. In this paper, we analyze the true nature of regular expressions in YARA and its implementation. We discovered several reasons regular expressions can, in a fact, slow down scanning based on the nature of the used algorithm, Aho-Corasick. We proposed a new version of this algorithm and we implemented it in the original version of this tool. The experiments are presented, proving the speed of pattern matching with regular expressions can be indeed improved.

English abstract

YARA is a tool for pattern matching used by malware analysts all over the world. YARA can scan files, as well as process memory. It allows us to define sequences of symbols as text strings, hexadecimal strings, and regular expressions. However, the use of regular expressions is limited because of the concern that it can slow down the scanning process. In this paper, we analyze the true nature of regular expressions in YARA and its implementation. We discovered several reasons regular expressions can, in a fact, slow down scanning based on the nature of the used algorithm, Aho-Corasick. We proposed a new version of this algorithm and we implemented it in the original version of this tool. The experiments are presented, proving the speed of pattern matching with regular expressions can be indeed improved.

Keywords

Aho-Corasick algorithm, pattern matching, regular expressions, YARA

Released

21.04.2021

Publisher

NEUVEDEN

Location

NEUVEDEN

ISBN

2169-3536

Periodical

IEEE Access

Year of study

9

Number

1

State

US

Pages from

62857

Pages to

62866

Pages count

10

URL

Documents

BibTex


@article{BUT171395,
  author="Dominika {Regéciová} and Dušan {Kolář} and Marek {Milkovič}",
  title="Pattern Matching in YARA: Improved Aho-Corasick Algorithm",
  annote="YARA is a tool for pattern matching used by malware analysts all over the world.
YARA can scan files, as well as process memory. It allows us to define sequences
of symbols as text strings, hexadecimal strings, and regular expressions.
However, the use of regular expressions is limited because of the concern that it
can slow down the scanning process.
In this paper, we analyze the true nature of regular expressions in YARA and its
implementation.
We discovered several reasons regular expressions can, in a fact, slow down
scanning based on the nature of the used algorithm, Aho-Corasick. We proposed
a new version of this algorithm and we implemented it in the original version of
this tool.
The experiments are presented, proving the speed of pattern matching with regular
expressions can be indeed improved.",
  address="NEUVEDEN",
  chapter="171395",
  doi="10.1109/ACCESS.2021.3074801",
  edition="NEUVEDEN",
  howpublished="online",
  institution="NEUVEDEN",
  number="1",
  volume="9",
  year="2021",
  month="april",
  pages="62857--62866",
  publisher="NEUVEDEN",
  type="journal article in Web of Science"
}