Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- Defining GPU programming.
- The benefits of using GPU programming.
- Challenges and trade-offs in GPU programming.
- Overview of GPU programming frameworks.
- Selecting the appropriate framework for your application.
OpenCL
- Understanding OpenCL.
- Advantages and disadvantages of OpenCL.
- Configuring the OpenCL development environment.
- Developing a basic OpenCL program for vector addition.
- Using the OpenCL API for device queries, memory management, data transfer, kernel launching, and thread synchronization.
- Writing OpenCL C kernels for device execution and data manipulation.
- Applying OpenCL built-in functions, variables, and libraries for standard operations.
- Optimizing memory access and transfers using OpenCL memory spaces (global, local, constant, private).
- Managing parallelism via the OpenCL execution model (work-items, work-groups, ND-ranges).
- Debugging and testing OpenCL applications with tools like CodeXL.
- Optimizing OpenCL performance through coalescing, caching, prefetching, and profiling.
CUDA
- Understanding CUDA.
- Advantages and disadvantages of CUDA.
- Configuring the CUDA development environment.
- Developing a basic CUDA program for vector addition.
- Using the CUDA API for device queries, memory management, data transfer, kernel launching, and thread synchronization.
- Writing CUDA C/C++ kernels for device execution and data manipulation.
- Applying CUDA built-in functions, variables, and libraries for standard operations.
- Optimizing memory access and transfers using CUDA memory spaces (global, shared, constant, local).
- Managing parallelism via the CUDA execution model (threads, blocks, grids).
- Debugging and testing CUDA applications with tools like CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
- Optimizing CUDA performance through coalescing, caching, prefetching, and profiling.
ROCm
- Understanding ROCm.
- Advantages and disadvantages of ROCm.
- Configuring the ROCm development environment.
- Developing a basic ROCm program for vector addition.
- Using the ROCm API for device queries, memory management, data transfer, kernel launching, and thread synchronization.
- Writing ROCm C/C++ kernels for device execution and data manipulation.
- Applying ROCm built-in functions, variables, and libraries for standard operations.
- Optimizing memory access and transfers using ROCm memory spaces (global, local, constant, private).
- Managing parallelism via the ROCm execution model (threads, blocks, grids).
- Debugging and testing ROCm applications with tools like ROCm Debugger and ROCm Profiler.
- Optimizing ROCm performance through coalescing, caching, prefetching, and profiling.
Comparison
- Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm.
- Evaluating GPU applications through benchmarks and metrics.
- Adopting best practices and tips for GPU programming.
- Exploring current and emerging trends and challenges in GPU programming.
Summary and Next Steps
Requirements
- Proficiency in C/C++ programming and an understanding of parallel computing concepts.
- Fundamental knowledge of computer architecture and memory hierarchy.
- Familiarity with command-line interfaces and code editing tools.
Target Audience
- Developers interested in mastering diverse GPU programming frameworks to compare their features, performance, and compatibility.
- Developers aiming to write portable and scalable code compatible with various platforms and devices.
- Programmers seeking to understand the complexities, trade-offs, and optimization challenges inherent in GPU programming.
28 Hours