Course detail

Knowledge Discovery in Databases

FIT-ZZNAcad. year: 2018/2019

Basic concepts concerning knowledge discovery in data, relation of knowledge discovery and data mining. Data sources for knowledge discovery. Principles and techniques of data preprocessing for mining. Systems for knowledge discovery in data, data mining query languages. Data mining techniques  association rules, classification and prediction, clustering. Mining unconventional data - data streams, time series and sequences, graphs, spatial and spatio-temporal data, multimedia. Text and web mining. Working-out a data mining project by means of an available data mining tool.

Learning outcomes of the course unit

  • Students get a broad, yet in-depth overview of the field of data mining and knowledge discovery.
  • They are able both to use and to develop knowledge discovery tools.

  • Student learns terminology in Czech ane English language.
  • Student gains experience in solving projects in a small team.
  • Student improves his ability to present and defend the results of projects.

Prerequisites

  • Basic knowledge of probability and statistics.
  • Knowledge of database technology at a bachelor subject level. 

Co-requisites

Not applicable.

Recommended optional programme components

Not applicable.

Recommended or required reading

  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, 703 p., ISBN 978-0-12-381479-1.
  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Second Edition. Elsevier Inc., 2006, 770 p., ISBN 1-55860-901-3.
  • Bishop, C.M: Pattern Recognition and Machine Learning. Springer, 2006, 738 p. ISBN 0387310738.
  • Zendulka, J. a kol.: Získávání znalostí z databází. FIT VUT v Brně, 160 s., 2009. (elektronicky)

  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Third Edition. Morgan Kaufmann Publishers, 2012, 703 p., ISBN 978-0-12-381479-1.
  • Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Second Edition. Elsevier Inc., 2006, 770 p., ISBN 1-55860-901-3.  

 

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

A mid-term test, formulation of a data mining task, presentation of the project.
Exam prerequisites:
Duty credit consists of working-out the project, defending project results and of obtaining at least 24 points for activities during semester.

Language of instruction

Czech

Work placements

Not applicable.

Course curriculum

    Syllabus of lectures:
    1. Introduction - motivation, fundamental concepts, data source and knowledge types.
    2. Data Preparation - characteristics of data.
    3. Data Preparation - methods.
    4. Data Warehouse and OLAP Technology for knowledge discovery.
    5. Mining frequent patterns and associations - basic concepts, efficient and scalable frequent itemset mining methods.
    6. Multi-level association rules, association mining and correlation analysis, constraint-based association rules.
    7. Classification and prediction - basic concepts, decision tree, Bayesian classification, rule-based classification.
    8. Classification by means of neural networks, SVM classifier, other classification methods, prediction.
    9. Cluster analysis - basic concepts, types of data in cluster analysis, partitioning and hierarchical methods.
    10. Other clustering methods. Mining in biological data.
    11. Introduction to mining data stream, time-series and sequence data.
    12. Introduction to mining in graphs, spatio-temporal data, moving object data and multimédia data. 
    13. Text mining, mining the Web.

    Syllabus - others, projects and individual work of students:
    • Working-out a data mining project by means of an available data mining tool.

Aims

To familiarize students with knowledge discovery in data sources, to explain useful knowledge types and the steps of the knowledge discovery process, and to familiarize them with techniques, algorithms and tools used in the process.

Specification of controlled education, way of implementation and compensation for absences

A mid-term test, formulation of a data mining task, presentation of the project. The minimal number of points which can be obtained from the final exam is 20. Otherwise, no points will be assigned to the student.

Classification of course in study plans

  • Programme IT-MGR-2 Master's

    branch MPV , any year of study, winter semester, 5 credits, compulsory-optional
    branch MBS , any year of study, winter semester, 5 credits, compulsory-optional
    branch MMI , any year of study, winter semester, 5 credits, optional
    branch MMM , any year of study, winter semester, 5 credits, optional
    branch MBI , 2. year of study, winter semester, 5 credits, compulsory
    branch MGM , 2. year of study, winter semester, 5 credits, optional
    branch MSK , 2. year of study, winter semester, 5 credits, compulsory-optional
    branch MIS , 2. year of study, winter semester, 5 credits, compulsory-optional
    branch MIN , 2. year of study, winter semester, 5 credits, compulsory

Type of course unit

 

Lecture

39 hours, optionally

Teacher / Lecturer

Syllabus


  1. Introduction - motivation, fundamental concepts, data source and knowledge types.

  2. Data Preparation - characteristics of data.
  3. Data Preparation - methods.

  4. Data Warehouse and OLAP Technology for knowledge discovery.
  5. Mining frequent patterns and associations - basic concepts, efficient and scalable frequent itemset mining methods.
  6. Multi-level association rules, association mining and correlation analysis, constraint-based association rules.
  7. Classification and prediction - basic concepts, decision tree, Bayesian classification, rule-based classification.
  8. Classification by means of neural networks, SVM classifier, other classification methods, prediction.
  9. Cluster analysis - basic concepts, types of data in cluster analysis, partitioning and hierarchical methods.
  10. Other clustering methods. Mining in biological data.
  11. Introduction to mining data stream, time-series and sequence data.
  12. Introduction to mining in graphs, spatio-temporal data, moving object data and multimédia data. 

  13. Text mining, mining the Web.

Project

13 hours, compulsory

Teacher / Lecturer

Syllabus


  • Working-out a data mining project by means of an available data mining tool.

eLearning