Parallel Data Processing
FEKT-MPC-PZPAcad. year: 2020/2021
Parallelization using CPU. Parallelization using GPU (matrix operations, deep learning algorithms). Technologies: Apache Spark, Hadoop, Kafka, Cassandra. Distributed computations for operations: data transformation, aggregation, classification, regression, clustering, frequent patterns, optimization. Data streaming – basic operations, state operations, monitoring. Further technologies for distributed computations.
Learning outcomes of the course unit
Students have skills of design and implementation of various forms of parallel systems to solve big data challenge. They will learn techniques for the parallelization of computations using CPU and GPU and further they will learn techniques for distributed computations. Students will control technologies Apache Spark, Kafka, Cassandra to solve distributed data processing with using data operations: data transformations, aggregation, classification, regression, clustering, frequent patterns.
Recommended optional programme components
Recommended or required reading
Holubová, Irena, et al. Big Data a NoSQL databáze. Grada, 2015. (EN)
BARLAS, Gerassimos. Multicore and gpu programming: an integrated approach. ISBN 9780124171374 (EN)
Planned learning activities and teaching methods
Teachning methods include lectures, computer laboratories and practical laboratories. Course is taking advantage of e-learning (Moodle) system. Students have to write a single project/assignment during the course.
Assesment methods and criteria linked to learning outcomes
Language of instruction
1. Parallelization using common processors - CPU architecture, threads, parallel loops
2. Parallelization using graphics processors - GPU architecture, basic operations
3. Parallelization using graphics processors - matrix operations
4. Parallelization using graphics processors - deeplearning algorithms
5. Distributed computations - Apache Spark, Hadoop
6. Distributed computations - basic operations (data loading, transformations, aggregation)
7. Distributed computations - machine learning (classification, regression)
8. Distributed computations - machine learning (clustering, frequent patterns)
9. Distributed computations - Kafka, Cassandra
10. Distributed computations - streaming data (basic operations)
11. Distributed computations - streaming data (state operations, monitoring)
12. Distributed computations - optimization
13. Further technologies for distributed computations - FPGA, super computers, Apache Flink, Blockchain
14. Final exam
The goal of the course is to introduce parallelization for data analysis with using common processors, graphic processors and distributed systems.
Specification of controlled education, way of implementation and compensation for absences
The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.