Publication detail

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

HLOSTA, M. STRÍŽ, R. KUPČÍK, J. ZENDULKA, J. HRUŠKA, T.

Original Title

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

English Title

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Type

journal article - other

Language

en

Original Abstract

Imbalance in data classification is a frequently discussed problem that is not well handled by classical classification techniques. The problem we tackled was to learn binary classification model from large data with accuracy constraint for the minority class. We propose a new meta-learning method that creates initial models using cost-sensitive learning by logistic regression and uses these models as initial chromosomes for genetic algorithm. The method has been successfully tested on a large real-world data set from our internet security research. Experiments prove that our method always leads to better results than usage of logistic regression or genetic algorithm alone. Moreover, this method produces easily understandable classification model.

English abstract

Imbalance in data classification is a frequently discussed problem that is not well handled by classical classification techniques. The problem we tackled was to learn binary classification model from large data with accuracy constraint for the minority class. We propose a new meta-learning method that creates initial models using cost-sensitive learning by logistic regression and uses these models as initial chromosomes for genetic algorithm. The method has been successfully tested on a large real-world data set from our internet security research. Experiments prove that our method always leads to better results than usage of logistic regression or genetic algorithm alone. Moreover, this method produces easily understandable classification model.

Keywords

Imbalanced data, classification, genetic algorithm, logistic regression

RIV year

2013

Released

18.05.2013

Publisher

NEUVEDEN

Location

NEUVEDEN

ISBN

2010-3700

Periodical

International Journal of Machine Learning and Computing

Year of study

2013

Number

3

State

SG

Pages from

214

Pages to

218

Pages count

5

URL

Documents

BibTex


@article{BUT103468,
  author="Martin {Hlosta} and Rostislav {Stríž} and Jan {Kupčík} and Jaroslav {Zendulka} and Tomáš {Hruška}",
  title="Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm",
  annote="Imbalance in data classification is a frequently discussed problem that is not
well handled by classical classification techniques. The problem we tackled was
to learn binary classification model from large data with accuracy constraint for
the minority class. We propose a new meta-learning method that creates initial
models using cost-sensitive learning by logistic regression and uses these models
as initial chromosomes for genetic algorithm. The method has been successfully
tested on a large real-world data set from our internet security research.
Experiments prove that our method always leads to better results than usage of
logistic regression or genetic algorithm alone. Moreover, this method produces
easily understandable classification model.",
  address="NEUVEDEN",
  chapter="103468",
  edition="NEUVEDEN",
  howpublished="print",
  institution="NEUVEDEN",
  number="3",
  volume="2013",
  year="2013",
  month="may",
  pages="214--218",
  publisher="NEUVEDEN",
  type="journal article - other"
}