<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE FL_Course SYSTEM "https://www.flane.de/dtd/fl_course095.dtd"><?xml-stylesheet type="text/xsl" href="https://portal.flane.ch/css/xml-course.xsl"?><course productid="34497" language="en" source="https://portal.flane.ch/swisscom/en/xml-course/nvidia-mpbdlnn" lastchanged="2025-07-29T12:18:27+02:00" parent="https://portal.flane.ch/swisscom/en/xml-courses"><title>Model Parallelism: Building and Deploying Large Neural Networks</title><productcode>MPBDLNN</productcode><vendorcode>NV</vendorcode><vendorname>Nvidia</vendorname><fullproductcode>NV-MPBDLNN</fullproductcode><version>1.0</version><objective>&lt;p&gt;In this workshop, participants will learn how to:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Train neural networks across multiple servers&lt;/li&gt;&lt;li&gt;Use techniques such as activation checkpointing, gradient accumulation, and various forms of model parallelism to overcome the challenges associated with large-model memory footprint&lt;/li&gt;&lt;li&gt;Capture and understand training performance characteristics to optimize model architecture&lt;/li&gt;&lt;li&gt;Deploy very large multi-GPU models to production using NVIDIA Triton&amp;trade; Inference Server&lt;/li&gt;&lt;/ul&gt;</objective><essentials>&lt;p&gt;Familiarity with:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Good understanding of PyTorch&lt;/li&gt;&lt;li&gt;Good understanding of deep learning and data parallel training concepts&lt;/li&gt;&lt;li&gt;Practice with deep learning and data parallel are useful, but optional&lt;/li&gt;&lt;/ul&gt;</essentials><outline>&lt;h4&gt;Introduction&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Meet the instructor.&lt;/li&gt;&lt;li&gt;Create an account at courses.nvidia.com/join&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Introduction to Training of Large Models&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Learn about the motivation behind and key challenges of training large models.&lt;/li&gt;&lt;li&gt;Get an overview of the basic techniques and tools needed for large-scale training.&lt;/li&gt;&lt;li&gt;Get an introduction to distributed training and the Slurm job scheduler.&lt;/li&gt;&lt;li&gt;Train a GPT model using data parallelism.&lt;/li&gt;&lt;li&gt;Profile the training process and understand execution performance.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Model Parallelism: Advanced Topics&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Increase the model size using a range of memory-saving techniques.&lt;/li&gt;&lt;li&gt;Get an introduction to tensor and pipeline parallelism.&lt;/li&gt;&lt;li&gt;Go beyond natural language processing and get an introduction to DeepSpeed.&lt;/li&gt;&lt;li&gt;Auto-tune model performance.&lt;/li&gt;&lt;li&gt;Learn about mixture-of-experts models.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Inference of Large Models&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Understand the challenges of deployment associated with large models.&lt;/li&gt;&lt;li&gt;Explore techniques for model reduction.&lt;/li&gt;&lt;li&gt;Learn how to use TensorRT-LLM.&lt;/li&gt;&lt;li&gt;Learn how to use Triton Inference Server.&lt;/li&gt;&lt;li&gt;Understand the process of deploying GPT checkpoint to production.&lt;/li&gt;&lt;li&gt;See an example of prompt engineering.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Final Review&lt;/h4&gt;&lt;ul&gt;
&lt;li&gt;Review key learnings and answer questions.&lt;/li&gt;&lt;li&gt;Complete the assessment and earn a certificate.&lt;/li&gt;&lt;li&gt;Complete the workshop survey.&lt;/li&gt;&lt;/ul&gt;</outline><objective_plain>In this workshop, participants will learn how to:



- Train neural networks across multiple servers
- Use techniques such as activation checkpointing, gradient accumulation, and various forms of model parallelism to overcome the challenges associated with large-model memory footprint
- Capture and understand training performance characteristics to optimize model architecture
- Deploy very large multi-GPU models to production using NVIDIA Triton™ Inference Server</objective_plain><essentials_plain>Familiarity with:



- Good understanding of PyTorch
- Good understanding of deep learning and data parallel training concepts
- Practice with deep learning and data parallel are useful, but optional</essentials_plain><outline_plain>Introduction



- Meet the instructor.
- Create an account at courses.nvidia.com/join
Introduction to Training of Large Models


- Learn about the motivation behind and key challenges of training large models.
- Get an overview of the basic techniques and tools needed for large-scale training.
- Get an introduction to distributed training and the Slurm job scheduler.
- Train a GPT model using data parallelism.
- Profile the training process and understand execution performance.
Model Parallelism: Advanced Topics



- Increase the model size using a range of memory-saving techniques.
- Get an introduction to tensor and pipeline parallelism.
- Go beyond natural language processing and get an introduction to DeepSpeed.
- Auto-tune model performance.
- Learn about mixture-of-experts models.
Inference of Large Models



- Understand the challenges of deployment associated with large models.
- Explore techniques for model reduction.
- Learn how to use TensorRT-LLM.
- Learn how to use Triton Inference Server.
- Understand the process of deploying GPT checkpoint to production.
- See an example of prompt engineering.
Final Review


- Review key learnings and answer questions.
- Complete the assessment and earn a certificate.
- Complete the workshop survey.</outline_plain><duration unit="d" days="1">1 day</duration><pricelist><price country="US" currency="USD">500.00</price><price country="DE" currency="EUR">500.00</price><price country="AT" currency="EUR">500.00</price><price country="SE" currency="EUR">500.00</price><price country="SI" currency="EUR">500.00</price><price country="GB" currency="GBP">420.00</price><price country="IT" currency="EUR">500.00</price><price country="CA" currency="CAD">690.00</price></pricelist><miles/></course>