Enhancing Data Science Outcomes With Efficient Workflow

Enhancing Data Science Outcomes With Efficient WorkflowEDSOEWNVNvidiaNV-EDSOEW1.0<ul> <li>Develop and deploy an accelerated end-to-end data processing pipeline for large datasets</li><li>Scale data science workflows using distributed computing</li><li>Perform DataFrame transformations that take advantage of hardware acceleration and avoid hidden slowdowns</li><li>Enhance machine learning solutions through feature engineering and rapid experimentation</li><li>Improve data processing pipeline performance by optimizing memory management and hardware utilization</li></ul><ul> <li>Basic knowledge of a standard data science workflow on tabular data. To gain an adequate understanding, we recommend this article.</li><li>Knowledge of distributed computing using Dask. To gain an adequate understanding, we recommend the “Get Started” guide from Dask.</li><li>Completion of the DLI’s Fundamentals of Accelerated Data Science course or an ability to manipulate data using cuDF and some experience building machine learning models using cuML.</li></ul>Introduction <ul> <li>Meet the instructor.</li><li>Create an account at courses.nvidia.com/join</li></ul>Advanced Extract, Transform, and Load (ETL) <ul> <li>Learn to process large volumes of data efficiently for downstream analysis:<ul> <li>Discuss current challenges of growing data sizes.</li><li>Perform ETL efficiently on large datasets.</li><li>Discuss hidden slowdowns and perform DataFrame transformations properly.</li><li>Discuss diagnostic tools to monitor and optimize hardware utilization.</li><li>Persist data in a way that’s conducive for downstream analytics.</li></ul></li></ul>Training on Multiple GPUs With PyTorch Distributed Data Parallel (DDP) <ul> <li>Learn how to improve data analysis on large datasets:<ul> <li>Build and compare classification models.</li><li>Perform feature selection based on predictive power of new and existing features.</li><li>Perform hyperparameter tuning.</li><li>Create embeddings using deep learning and clustering on embeddings.</li></ul></li></ul>Deployment <ul> <li>Learn how to deploy and measure the performance of an accelerated data processing pipeline:</li><li>Deploy a data processing pipeline with Triton Inference Server.</li><li>Discuss various tuning parameters to optimize performance.</li></ul>Assessment and Q&A- Develop and deploy an accelerated end-to-end data processing pipeline for large datasets - Scale data science workflows using distributed computing - Perform DataFrame transformations that take advantage of hardware acceleration and avoid hidden slowdowns - Enhance machine learning solutions through feature engineering and rapid experimentation - Improve data processing pipeline performance by optimizing memory management and hardware utilization- Basic knowledge of a standard data science workflow on tabular data. To gain an adequate understanding, we recommend this article. - Knowledge of distributed computing using Dask. To gain an adequate understanding, we recommend the “Get Started” guide from Dask. - Completion of the DLI’s Fundamentals of Accelerated Data Science course or an ability to manipulate data using cuDF and some experience building machine learning models using cuML.Introduction - Meet the instructor. - Create an account at courses.nvidia.com/join Advanced Extract, Transform, and Load (ETL) - Learn to process large volumes of data efficiently for downstream analysis: - Discuss current challenges of growing data sizes. - Perform ETL efficiently on large datasets. - Discuss hidden slowdowns and perform DataFrame transformations properly. - Discuss diagnostic tools to monitor and optimize hardware utilization. - Persist data in a way that’s conducive for downstream analytics. Training on Multiple GPUs With PyTorch Distributed Data Parallel (DDP) - Learn how to improve data analysis on large datasets: - Build and compare classification models. - Perform feature selection based on predictive power of new and existing features. - Perform hyperparameter tuning. - Create embeddings using deep learning and clustering on embeddings. Deployment - Learn how to deploy and measure the performance of an accelerated data processing pipeline: - Deploy a data processing pipeline with Triton Inference Server. - Discuss various tuning parameters to optimize performance. Assessment and Q&A0.5 days500.00500.00500.00500.00500.00420.00500.00690.00