Fundamentals of Accelerated Computing with CUDA Python

Fundamentals of Accelerated Computing with CUDA PythonFACCPNVNvidiaNV-FACCP1.0At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba: <ul> <li>GPU-accelerate NumPy ufuncs with a few lines of code.</li><li>Configure code parallelization using the CUDA thread hierarchy.</li><li>Write custom CUDA device kernels for maximum performance and flexibility.</li><li>Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.</li></ul><ul> <li>Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations</li><li>NumPy competency, including the use of ndarrays and ufuncs</li><li>No previous knowledge of CUDA programming is required</li></ul>Introduction <ul> <li>Meet the instructor.</li><li>Create an account at https://learn.nvidia.com/join</li></ul>Introduction to CUDA Python with Numba <ul> <li>Begin working with the Numba compiler and CUDA programming in Python.</li><li>Use Numba decorators to GPU-accelerate numerical Python functions.</li><li>Optimize host-to-device and device-to-host memory transfers.</li></ul>Custom CUDA Kernels in Python with Numba <ul> <li>Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.</li><li>Launch massively parallel custom CUDA kernels on the GPU.</li><li>Utilize CUDA atomic operations to avoid race conditions during parallel execution.</li></ul>Multidimensional Grids, and Shared Memory for CUDA Python with Numba <ul> <li>Learn multidimensional grid creation and how to work in parallel on 2D matrices.</li><li>Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.</li></ul>Final Review <ul> <li>Review key learnings and wrap up questions.</li><li>Complete the assessment to earn a certificate.</li><li>Take the workshop survey.</li></ul>At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba: - GPU-accelerate NumPy ufuncs with a few lines of code. - Configure code parallelization using the CUDA thread hierarchy. - Write custom CUDA device kernels for maximum performance and flexibility. - Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.- Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations - NumPy competency, including the use of ndarrays and ufuncs - No previous knowledge of CUDA programming is requiredIntroduction - Meet the instructor. - Create an account at https://learn.nvidia.com/join Introduction to CUDA Python with Numba - Begin working with the Numba compiler and CUDA programming in Python. - Use Numba decorators to GPU-accelerate numerical Python functions. - Optimize host-to-device and device-to-host memory transfers. Custom CUDA Kernels in Python with Numba - Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities. - Launch massively parallel custom CUDA kernels on the GPU. - Utilize CUDA atomic operations to avoid race conditions during parallel execution. Multidimensional Grids, and Shared Memory for CUDA Python with Numba - Learn multidimensional grid creation and how to work in parallel on 2D matrices. - Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices. Final Review - Review key learnings and wrap up questions. - Complete the assessment to earn a certificate. - Take the workshop survey.1 jour500.00500.00500.00500.00500.00420.00500.00690.00