Publication detail

Interactive Mining on Hierarchical Data

CHMELAŘ, P. STRYKA, L.

Original Title

Interactive Mining on Hierarchical Data

English Title

Interactive Mining on Hierarchical Data

Type

conference paper

Language

en

Original Abstract

In this paper, we propose a framework for interactive, iterative, and intuitive mining of multilevel association, characterization and classification rules on data organized in multi-level conceptual hierarchies. This framework is called OLAM SE (Self Explaining On-Line Analytical Mining) and it is proposed as an extension of OLAP or as an alternative to Han's OLAM. OLAM processes data stored in data cubes structure of which is based on a given conceptual hierarchy. OLAM SE determines minimum support value from user defined cover value of data with usage of entropy coding principle. It also automatically determines the maximum threshold to avoid explaining knowledge that is obvious and so potentially uninteresting. Major part of data is thus described by frequent patterns. The presentation of results is inspired by UML diagram notation. It contains a graph nodes of which are frequent data sets represented as packages including sub packages - data classes or items. Edges represent relations or patterns between packages. This representation could be applicable for characterization and nonnaďve Bayesian classification process as well. Patterns can be interactively explored by the user, who gets a detailed view of attractive ones. She can intuitively drive the more detailed knowledge obtaining process.

English abstract

In this paper, we propose a framework for interactive, iterative, and intuitive mining of multilevel association, characterization and classification rules on data organized in multi-level conceptual hierarchies. This framework is called OLAM SE (Self Explaining On-Line Analytical Mining) and it is proposed as an extension of OLAP or as an alternative to Han's OLAM. OLAM processes data stored in data cubes structure of which is based on a given conceptual hierarchy. OLAM SE determines minimum support value from user defined cover value of data with usage of entropy coding principle. It also automatically determines the maximum threshold to avoid explaining knowledge that is obvious and so potentially uninteresting. Major part of data is thus described by frequent patterns. The presentation of results is inspired by UML diagram notation. It contains a graph nodes of which are frequent data sets represented as packages including sub packages - data classes or items. Edges represent relations or patterns between packages. This representation could be applicable for characterization and nonnaďve Bayesian classification process as well. Patterns can be interactively explored by the user, who gets a detailed view of attractive ones. She can intuitively drive the more detailed knowledge obtaining process.

Keywords

interactive, intuitive, on-line, data mining, OLAP, data warehouse, association, characterization, classification, nonnaďve Bayessian classification, uml notation based presentation

RIV year

2007

Released

26.04.2007

Publisher

Brno University of Technology

Location

Brno

ISBN

978-80-214-3410-3

Book

Proceedings of the 13th Conference STUDENT EEICT 2007 Volume 4

Pages from

410

Pages to

414

Pages count

5

Documents

BibTex


@inproceedings{BUT26104,
  author="Petr {Chmelař} and Lukáš {Stryka}",
  title="Interactive Mining on Hierarchical Data",
  annote="In this paper, we propose a framework for interactive, iterative, and intuitive
mining of multilevel association, characterization and classification rules on
data organized in multi-level conceptual hierarchies. This framework is called
OLAM SE (Self Explaining On-Line Analytical Mining) and it is proposed as an
extension of OLAP or as an alternative to Han's OLAM. OLAM processes data stored
in data cubes structure of which is based on a given conceptual hierarchy. OLAM
SE determines minimum support value from user defined cover value of data with
usage of entropy coding principle. It also automatically determines the maximum
threshold to avoid explaining knowledge that is obvious and so potentially
uninteresting. Major part of data is thus described by frequent patterns. The
presentation of results is inspired by UML diagram notation. It contains a graph
nodes of which are frequent data sets represented as packages including sub
packages - data classes or items. Edges represent relations or patterns between
packages. This representation could be applicable for characterization and
nonnaďve Bayesian classification process as well. Patterns can be interactively
explored by the user, who gets a detailed view of attractive ones. She can
intuitively drive the more detailed knowledge obtaining process.",
  address="Brno University of Technology",
  booktitle="Proceedings of the 13th Conference STUDENT EEICT 2007 Volume 4",
  chapter="26104",
  howpublished="print",
  institution="Brno University of Technology",
  year="2007",
  month="april",
  pages="410--414",
  publisher="Brno University of Technology",
  type="conference paper"
}