Course detail

Parallel Data Processing

FEKT-MPC-PZPAcad. year: 2020/2021

Parallelization using CPU. Parallelization using GPU (matrix operations, deep learning algorithms). Technologies: Apache Spark, Hadoop, Kafka, Cassandra. Distributed computations for operations: data transformation, aggregation, classification, regression, clustering, frequent patterns, optimization. Data streaming – basic operations, state operations, monitoring. Further technologies for distributed computations.

Learning outcomes of the course unit

Students have skills of design and implementation of various forms of parallel systems to solve big data challenge. They will learn techniques for the parallelization of computations using CPU and GPU and further they will learn techniques for distributed computations. Students will control technologies Apache Spark, Kafka, Cassandra to solve distributed data processing with using data operations: data transformations, aggregation, classification, regression, clustering, frequent patterns.

Prerequisites

Not applicable.

Co-requisites

Not applicable.

Recommended optional programme components

Not applicable.

Recommended or required reading

Holubová, Irena, et al. Big Data a NoSQL databáze. Grada, 2015. (EN)
BARLAS, Gerassimos. Multicore and gpu programming: an integrated approach. ISBN 9780124171374 (EN)

Planned learning activities and teaching methods

Teachning methods include lectures, computer laboratories and practical laboratories. Course is taking advantage of e-learning (Moodle) system. Students have to write a single project/assignment during the course.

Assesment methods and criteria linked to learning outcomes

final exam

Language of instruction

Czech

Work placements

Not applicable.

Course curriculum

1. Introduction
2. CPU Parallel Computing
3. GPU Introduction
4. GPU Memory
5. GPU Synchronization
6. GPU Parallel Patterns
7. GPU Matrix Operations and Streams
8. Spark Introduction
9. Spark Advanced Operations
10. Spark Machine Learning
11. Spark Streaming
12. Other Parallel Technologies
13. Overview and Discussion
14. Final exam

Aims

The goal of the course is to introduce parallelization for data analysis with using common processors, graphic processors and distributed systems.

Specification of controlled education, way of implementation and compensation for absences

The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.

Classification of course in study plans

  • Programme MPC-AUD Master's

    specialization AUDM-TECH , 2. year of study, winter semester, 6 credits, compulsory-optional

  • Programme MPC-TIT Master's, 2. year of study, winter semester, 6 credits, compulsory-optional

Type of course unit

 

Lecture

26 hours, optionally

Teacher / Lecturer

Exercise in computer lab

26 hours, compulsory

Teacher / Lecturer

Project

13 hours, optionally

Teacher / Lecturer

eLearning