Course detail

Parallel Computations on GPU

FIT-PCGAcad. year: 2019/2020

The course covers the architecture and programming of graphics processing units by the NVidia and partially AMD. First, the architecture of GPUs is studied in detail. Then, the model of the program execution using hierarchical thread organisation and the SIMT model is discussed. Next, the memory hierarchy and synchronization techniques are described. After that, the course explains novel techniques of dynamic parallelism and data-flow processing concluded by practical usage of multi-GPU systems in environments with shared (NVLink) and distributed (MPI) memory. The second part of the course is devoted to high level programming techniques and libraries based on the OpenACC technology.

Language of instruction

Czech

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

prof. Ing. Jiří Jaroš, Ph.D.

Department

Department of Computer Systems (UPSY)

Learning outcomes of the course unit

Knowledge of the parallel programming on GPUs in the area of general purpose computing, orientation in the area of accelerated systems, libraries and tools.
Understanding of hardware limitations having impact on the efficiency of software solutions.

Prerequisites

Knowledge gained in courses AVS and partially in PRL and PPP.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Assessment of two projects, 14 hours in total and, computer laboratories and a midterm examination.
Exam prerequisites:

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

To familiarize yourself with the architecture and programming of graphics processing unit in the area of general purpose computuing using the NVidia libraries and OpenACC standard. To learn how to design and implement accelerated programs exploiting the potential of GPUs. To gain knowledge about the available libraries for programming on GPUs.

Specification of controlled education, way of implementation and compensation for absences

Missed labs can be substituted in alternative dates.
There will be a place for missed labs in the last week of the semester.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Type of course unit

Lecture

26 hod., optionally

Teacher / Lecturer

prof. Ing. Jiří Jaroš, Ph.D.

Syllabus

Architecture of graphics processing units.
CUDA programming model, tread execution.
CUDA memory hierarchy.
Synchronization and reduction.
Dynamic parallelism and unified memory.
Design and optimization of GPU algorithms.
Stream processing, computation-communication overlapping.
Multi-GPU systems.
Nvidia Thrust library.
OpenACC basics.
OpenACC memory management.
Code optimization with OpenACC.
Libraries and tools for GPU programming.

Exercise in computer lab

12 hod., compulsory

Teacher / Lecturer

Ing. Kristián Kadlubiak

Syllabus

CUDA: Memory transfers, simple kernels
CUDA: Shared memory
CUDA: Texture and constant memory
CUDA: Dynamic parallelism and unified memory.
OpenACC: basic techniques.
OpenACC: advanced techniques.

Project

14 hod., compulsory

Teacher / Lecturer

Ing. Kristián Kadlubiak

Syllabus

Development of an application in Nvidia CUDA
Development of an application in OpenACC

VUT

Faculties

University Institutes

Parts

Parallel Computations on GPU

Type of course unit