<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE FL_Course SYSTEM "https://www.flane.de/dtd/fl_course095.dtd"><?xml-stylesheet type="text/xsl" href="https://portal.flane.ch/css/xml-course.xsl"?><course productid="34485" language="fr" source="https://portal.flane.ch/swisscom/fr/xml-course/nvidia-faccp" lastchanged="2025-11-11T16:22:08+01:00" parent="https://portal.flane.ch/swisscom/fr/xml-courses"><title>Fundamentals of Accelerated Computing with CUDA Python</title><productcode>FACCP</productcode><vendorcode>NV</vendorcode><vendorname>Nvidia</vendorname><fullproductcode>NV-FACCP</fullproductcode><version>1.0</version><objective>&lt;p&gt;At the conclusion of the workshop, you&amp;rsquo;ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPU-accelerate NumPy ufuncs with a few lines of code.&lt;/li&gt;&lt;li&gt;Configure code parallelization using the CUDA thread hierarchy.&lt;/li&gt;&lt;li&gt;Write custom CUDA device kernels for maximum performance and flexibility.&lt;/li&gt;&lt;li&gt;Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.&lt;/li&gt;&lt;/ul&gt;</objective><essentials>&lt;ul&gt;
&lt;li&gt;Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations&lt;/li&gt;&lt;li&gt;NumPy competency, including the use of ndarrays and ufuncs&lt;/li&gt;&lt;li&gt;No previous knowledge of CUDA programming is required&lt;/li&gt;&lt;/ul&gt;</essentials><outline>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;	
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Meet the instructor.&lt;/li&gt;&lt;li&gt;Create an account at https://learn.nvidia.com/join&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Introduction to CUDA Python with Numba&lt;/strong&gt;	
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Begin working with the Numba compiler and CUDA programming in Python.&lt;/li&gt;&lt;li&gt;Use Numba decorators to GPU-accelerate numerical Python functions.&lt;/li&gt;&lt;li&gt;Optimize host-to-device and device-to-host memory transfers.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Custom CUDA Kernels in Python with Numba&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn CUDA&amp;rsquo;s parallel thread hierarchy and how to extend parallel program possibilities.&lt;/li&gt;&lt;li&gt;Launch massively parallel custom CUDA kernels on the GPU.&lt;/li&gt;&lt;li&gt;Utilize CUDA atomic operations to avoid race conditions during parallel execution.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Multidimensional Grids, and Shared Memory for CUDA Python with Numba&lt;/strong&gt;	
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learn multidimensional grid creation and how to work in parallel on 2D matrices.&lt;/li&gt;&lt;li&gt;Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Final Review&lt;/strong&gt;	
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review key learnings and wrap up questions.&lt;/li&gt;&lt;li&gt;Complete the assessment to earn a certificate.&lt;/li&gt;&lt;li&gt;Take the workshop survey.&lt;/li&gt;&lt;/ul&gt;</outline><objective_plain>At the conclusion of the workshop, you’ll have an understanding of the fundamental tools and techniques for GPU-accelerated Python applications with CUDA and Numba:



- GPU-accelerate NumPy ufuncs with a few lines of code.
- Configure code parallelization using the CUDA thread hierarchy.
- Write custom CUDA device kernels for maximum performance and flexibility.
- Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.</objective_plain><essentials_plain>- Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
- NumPy competency, including the use of ndarrays and ufuncs
- No previous knowledge of CUDA programming is required</essentials_plain><outline_plain>Introduction	



- Meet the instructor.
- Create an account at https://learn.nvidia.com/join
Introduction to CUDA Python with Numba	



- Begin working with the Numba compiler and CUDA programming in Python.
- Use Numba decorators to GPU-accelerate numerical Python functions.
- Optimize host-to-device and device-to-host memory transfers.
Custom CUDA Kernels in Python with Numba



- Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
- Launch massively parallel custom CUDA kernels on the GPU.
- Utilize CUDA atomic operations to avoid race conditions during parallel execution.
Multidimensional Grids, and Shared Memory for CUDA Python with Numba	



- Learn multidimensional grid creation and how to work in parallel on 2D matrices.
- Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.
Final Review	



- Review key learnings and wrap up questions.
- Complete the assessment to earn a certificate.
- Take the workshop survey.</outline_plain><duration unit="d" days="1">1 jour</duration><pricelist><price country="US" currency="USD">500.00</price><price country="DE" currency="EUR">500.00</price><price country="AT" currency="EUR">500.00</price><price country="SE" currency="EUR">500.00</price><price country="SI" currency="EUR">500.00</price><price country="GB" currency="GBP">420.00</price><price country="IT" currency="EUR">500.00</price><price country="CA" currency="CAD">690.00</price></pricelist><miles/></course>