Get in Touch

Course Outline

Introduction

  • Defining GPU programming.
  • The benefits of using GPU programming.
  • Challenges and trade-offs in GPU programming.
  • Overview of GPU programming frameworks.
  • Selecting the appropriate framework for your application.

OpenCL

  • Understanding OpenCL.
  • Advantages and disadvantages of OpenCL.
  • Configuring the OpenCL development environment.
  • Developing a basic OpenCL program for vector addition.
  • Using the OpenCL API for device queries, memory management, data transfer, kernel launching, and thread synchronization.
  • Writing OpenCL C kernels for device execution and data manipulation.
  • Applying OpenCL built-in functions, variables, and libraries for standard operations.
  • Optimizing memory access and transfers using OpenCL memory spaces (global, local, constant, private).
  • Managing parallelism via the OpenCL execution model (work-items, work-groups, ND-ranges).
  • Debugging and testing OpenCL applications with tools like CodeXL.
  • Optimizing OpenCL performance through coalescing, caching, prefetching, and profiling.

CUDA

  • Understanding CUDA.
  • Advantages and disadvantages of CUDA.
  • Configuring the CUDA development environment.
  • Developing a basic CUDA program for vector addition.
  • Using the CUDA API for device queries, memory management, data transfer, kernel launching, and thread synchronization.
  • Writing CUDA C/C++ kernels for device execution and data manipulation.
  • Applying CUDA built-in functions, variables, and libraries for standard operations.
  • Optimizing memory access and transfers using CUDA memory spaces (global, shared, constant, local).
  • Managing parallelism via the CUDA execution model (threads, blocks, grids).
  • Debugging and testing CUDA applications with tools like CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
  • Optimizing CUDA performance through coalescing, caching, prefetching, and profiling.

ROCm

  • Understanding ROCm.
  • Advantages and disadvantages of ROCm.
  • Configuring the ROCm development environment.
  • Developing a basic ROCm program for vector addition.
  • Using the ROCm API for device queries, memory management, data transfer, kernel launching, and thread synchronization.
  • Writing ROCm C/C++ kernels for device execution and data manipulation.
  • Applying ROCm built-in functions, variables, and libraries for standard operations.
  • Optimizing memory access and transfers using ROCm memory spaces (global, local, constant, private).
  • Managing parallelism via the ROCm execution model (threads, blocks, grids).
  • Debugging and testing ROCm applications with tools like ROCm Debugger and ROCm Profiler.
  • Optimizing ROCm performance through coalescing, caching, prefetching, and profiling.

Comparison

  • Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm.
  • Evaluating GPU applications through benchmarks and metrics.
  • Adopting best practices and tips for GPU programming.
  • Exploring current and emerging trends and challenges in GPU programming.

Summary and Next Steps

Requirements

  • Proficiency in C/C++ programming and an understanding of parallel computing concepts.
  • Fundamental knowledge of computer architecture and memory hierarchy.
  • Familiarity with command-line interfaces and code editing tools.

Target Audience

  • Developers interested in mastering diverse GPU programming frameworks to compare their features, performance, and compatibility.
  • Developers aiming to write portable and scalable code compatible with various platforms and devices.
  • Programmers seeking to understand the complexities, trade-offs, and optimization challenges inherent in GPU programming.
 28 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories