Detail publikace

Simplified Progressive Data Mining

Originální název

Simplified Progressive Data Mining

Anglický název

Simplified Progressive Data Mining

Jazyk

en

Originální abstrakt

There are huge amounts of data stored in databases, but it is very difficult to make decisions based on this data. We propose the OLAM SE system (Self Explaining On-Line Analytical Mining) that is similar to the Han's OLAM [5] in the idea of interactive data mining. The contribution is to simplify on-line analytical data mining to professionals, who understand their data but want more significant, interesting and useful information. It is done by shielding internal concepts (associations, classifications, characterizations) and thresholds (supports, confidences) from the user and by a simple graphical interface that suggests most relevant items. OLAM SE determines minimum support value from required cover of data with usage of entropy coding principle. This is automatically applied on the structure based on given conceptual hierarchy where present. We also determine the maximum threshold to avoid explaining knowledge that is obvious. Major part of data is thus described by frequent patterns. The presentation of results is realized using diagram notation similar to UML. In fact, it is a visual graph which nodes are frequent data sets presented as packages including sub packages - data concepts or items. Edges represent links or patterns between them. These patterns can be progressively explored by the user, who gets a detailed view of patterns which are attractive to him. Other possibly interesting sets are offered to the user without any other action. This is well suitable for characterization and descriptive classification equivalent to normal Bayes.

Anglický abstrakt

There are huge amounts of data stored in databases, but it is very difficult to make decisions based on this data. We propose the OLAM SE system (Self Explaining On-Line Analytical Mining) that is similar to the Han's OLAM [5] in the idea of interactive data mining. The contribution is to simplify on-line analytical data mining to professionals, who understand their data but want more significant, interesting and useful information. It is done by shielding internal concepts (associations, classifications, characterizations) and thresholds (supports, confidences) from the user and by a simple graphical interface that suggests most relevant items. OLAM SE determines minimum support value from required cover of data with usage of entropy coding principle. This is automatically applied on the structure based on given conceptual hierarchy where present. We also determine the maximum threshold to avoid explaining knowledge that is obvious. Major part of data is thus described by frequent patterns. The presentation of results is realized using diagram notation similar to UML. In fact, it is a visual graph which nodes are frequent data sets presented as packages including sub packages - data concepts or items. Edges represent links or patterns between them. These patterns can be progressively explored by the user, who gets a detailed view of patterns which are attractive to him. Other possibly interesting sets are offered to the user without any other action. This is well suitable for characterization and descriptive classification equivalent to normal Bayes.

BibTex


@inproceedings{BUT25331,
  author="Lukáš {Stryka} and Petr {Chmelař}",
  title="Simplified Progressive Data Mining",
  annote="There are huge amounts of data stored in databases, but it is very difficult to
make decisions based on this data. We propose the OLAM SE system (Self Explaining
On-Line Analytical Mining) that is similar to the Han's OLAM [5] in the idea of
interactive data mining. The contribution is to simplify on-line analytical data
mining to professionals, who understand their data but want more significant,
interesting and useful information. It is done by shielding internal concepts
(associations, classifications, characterizations) and thresholds (supports,
confidences) from the user and by a simple graphical interface that suggests most
relevant items. 

OLAM SE determines minimum support value from required cover of data with usage
of entropy coding principle. This is automatically applied on the structure based
on given conceptual hierarchy where present. We also determine the maximum
threshold to avoid explaining knowledge that is obvious. Major part of data is
thus described by frequent patterns. 

The presentation of results is realized using diagram notation similar to UML. In
fact, it is a visual graph which nodes are frequent data sets presented as
packages including sub packages - data concepts or items. Edges represent links
or patterns between them. These patterns can be progressively explored by the
user, who gets a detailed view of patterns which are attractive to him. Other
possibly interesting sets are offered to the user without any other action. This
is well suitable for characterization and descriptive classification equivalent
to normal Bayes.",
  address="Wroclaw University of Technology",
  booktitle="Proceedings of the 16th International Conference on Systems Science",
  chapter="25331",
  howpublished="print",
  institution="Wroclaw University of Technology",
  year="2007",
  month="september",
  pages="378--387",
  publisher="Wroclaw University of Technology",
  type="conference paper"
}