Data Engineering on Google Cloud Platform

Data Engineering on Google Cloud PlatformDEGCPGOGoogleGO-DEGCP3.0<p>This course teaches participants the following skills:</p> <ul> <li>Design and build data processing systems on Google Cloud Platform</li><li>Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow</li><li>Derive business insights from extremely large datasets using Google BigQuery</li><li>Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML</li><li>Leverage unstructured data using Spark and ML APIs on Cloud Dataproc</li><li>Enable instant insights from streaming data</li></ul><p>To get the most of out of this course, participants should have:</p> <ul> <li>Completed <span class="attentionbbcode" title="inactive or disabled course: GO-GCF-BDM">!</span>Google Cloud Fundamentals: Big Data and Machine Learning <span class="fl-prod-pcode">(GCF-BDM)</span> course OR have equivalent experience</li><li>Basic proficiency with common query language such as SQL</li><li>Experience with data modeling, extract, transform, load activities Developing applications using a common programming language such Python</li><li>Familiarity with Machine Learning and/or statistics</li></ul><p>This class is intended for experienced developers who are responsible for managing big data transformations including:</p> <ul> <li>Extracting, Loading, Transforming, cleaning, and validating data</li><li>Designing pipelines and architectures for data processing</li><li>Creating and maintaining machine learning and statistical models</li><li>Querying datasets, visualizing query results and creating reports</li></ul><h5>Module 1: Google Cloud Dataproc Overview</h5><ul> <li>Creating and managing clusters.</li><li>Leveraging custom machine types and preemptible worker nodes.</li><li>Scaling and deleting Clusters.</li><li>Lab: Creating Hadoop Clusters with Google Cloud Dataproc.</li></ul><h5>Module 2: Running Dataproc Jobs</h5><ul> <li>Running Pig and Hive jobs.</li><li>Separation of storage and compute.</li><li>Lab: Running Hadoop and Spark Jobs with Dataproc.</li><li>Lab: Submit and monitor jobs.</li></ul><h5>Module 3: Integrating Dataproc with Google Cloud Platform</h5><ul> <li>Customize cluster with initialization actions.</li><li>BigQuery Support.</li><li>Lab: Leveraging Google Cloud Platform Services.</li></ul><h5>Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs</h5><ul> <li>Google’s Machine Learning APIs.</li><li>Common ML Use Cases.</li><li>Invoking ML APIs.</li><li>Lab: Adding Machine Learning Capabilities to Big Data Analysis.</li></ul><h5>Module 5: Serverless data analysis with BigQuery</h5><ul> <li>What is BigQuery.</li><li>Queries and Functions.</li><li>Lab: Writing queries in BigQuery.</li><li>Loading data into BigQuery.</li><li>Exporting data from BigQuery.</li><li>Lab: Loading and exporting data.</li><li>Nested and repeated fields.</li><li>Querying multiple tables.</li><li>Lab: Complex queries.</li><li>Performance and pricing.</li></ul><h5>Module 6: Serverless, autoscaling data pipelines with Dataflow</h5><ul> <li>The Beam programming model.</li><li>Data pipelines in Beam Python.</li><li>Data pipelines in Beam Java.</li><li>Lab: Writing a Dataflow pipeline.</li><li>Scalable Big Data processing using Beam.</li><li>Lab: MapReduce in Dataflow.</li><li>Incorporating additional data.</li><li>Lab: Side inputs.</li><li>Handling stream data.</li><li>GCP Reference architecture.</li></ul><h5>Module 7: Getting started with Machine Learning</h5><ul> <li>What is machine learning (ML).</li><li>Effective ML: concepts, types.</li><li>ML datasets: generalization.</li><li>Lab: Explore and create ML datasets.</li></ul><h5>Module 8: Building ML models with Tensorflow</h5><ul> <li>Getting started with TensorFlow.</li><li>Lab: Using tf.learn.</li><li>TensorFlow graphs and loops + lab.</li><li>Lab: Using low-level TensorFlow + early stopping.</li><li>Monitoring ML training.</li><li>Lab: Charts and graphs of TensorFlow training.</li></ul><h5>Module 9: Scaling ML models with CloudML</h5><ul> <li>Why Cloud ML?</li><li>Packaging up a TensorFlow model.</li><li>End-to-end training.</li><li>Lab: Run a ML model locally and on cloud.</li></ul><h5>Module 10: Feature Engineering</h5><ul> <li>Creating good features.</li><li>Transforming inputs.</li><li>Synthetic features.</li><li>Preprocessing with Cloud ML.</li><li>Lab: Feature engineering.</li></ul><h5>Module 11: Architecture of streaming analytics pipelines</h5><ul> <li>Stream data processing: Challenges.</li><li>Handling variable data volumes.</li><li>Dealing with unordered/late data.</li><li>Lab: Designing streaming pipeline.</li></ul><h5>Module 12: Ingesting Variable Volumes</h5><ul> <li>What is Cloud Pub/Sub?</li><li>How it works: Topics and Subscriptions.</li><li>Lab: Simulator.</li></ul><h5>Module 13: Implementing streaming pipelines</h5><ul> <li>Challenges in stream processing.</li><li>Handle late data: watermarks, triggers, accumulation.</li><li>Lab: Stream data processing pipeline for live traffic data.</li></ul><h5>Module 14: Streaming analytics and dashboards</h5><ul> <li>Streaming analytics: from data to decisions.</li><li>Querying streaming data with BigQuery.</li><li>What is Google Data Studio?</li><li>Lab: build a real-time dashboard to visualize processed data.</li></ul><h5>Module 15: High throughput and low-latency with Bigtable</h5><ul> <li>What is Cloud Spanner?</li><li>Designing Bigtable schema.</li><li>Ingesting into Bigtable.</li><li>Lab: streaming into Bigtable.</li></ul><h5>Module 1: Google Cloud Dataproc Overview</h5><ul> <li>Creating and managing clusters.</li><li>Leveraging custom machine types and preemptible worker nodes.</li><li>Scaling and deleting Clusters.</li><li>Lab: Creating Hadoop Clusters with Google Cloud Dataproc.</li></ul><h5>Module 2: Running Dataproc Jobs</h5><ul> <li>Running Pig and Hive jobs.</li><li>Separation of storage and compute.</li><li>Lab: Running Hadoop and Spark Jobs with Dataproc.</li><li>Lab: Submit and monitor jobs.</li></ul><h5>Module 3: Integrating Dataproc with Google Cloud Platform</h5><ul> <li>Customize cluster with initialization actions.</li><li>BigQuery Support.</li><li>Lab: Leveraging Google Cloud Platform Services.</li></ul><h5>Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs</h5><ul> <li>Google’s Machine Learning APIs.</li><li>Common ML Use Cases.</li><li>Invoking ML APIs.</li><li>Lab: Adding Machine Learning Capabilities to Big Data Analysis.</li></ul><h5>Module 5: Serverless data analysis with BigQuery</h5><ul> <li>What is BigQuery.</li><li>Queries and Functions.</li><li>Lab: Writing queries in BigQuery.</li><li>Loading data into BigQuery.</li><li>Exporting data from BigQuery.</li><li>Lab: Loading and exporting data.</li><li>Nested and repeated fields.</li><li>Querying multiple tables.</li><li>Lab: Complex queries.</li><li>Performance and pricing.</li></ul><h5>Module 6: Serverless, autoscaling data pipelines with Dataflow</h5><ul> <li>The Beam programming model.</li><li>Data pipelines in Beam Python.</li><li>Data pipelines in Beam Java.</li><li>Lab: Writing a Dataflow pipeline.</li><li>Scalable Big Data processing using Beam.</li><li>Lab: MapReduce in Dataflow.</li><li>Incorporating additional data.</li><li>Lab: Side inputs.</li><li>Handling stream data.</li><li>GCP Reference architecture.</li></ul><h5>Module 7: Getting started with Machine Learning</h5><ul> <li>What is machine learning (ML).</li><li>Effective ML: concepts, types.</li><li>ML datasets: generalization.</li><li>Lab: Explore and create ML datasets.</li></ul><h5>Module 8: Building ML models with Tensorflow</h5><ul> <li>Getting started with TensorFlow.</li><li>Lab: Using tf.learn.</li><li>TensorFlow graphs and loops + lab.</li><li>Lab: Using low-level TensorFlow + early stopping.</li><li>Monitoring ML training.</li><li>Lab: Charts and graphs of TensorFlow training.</li></ul><h5>Module 9: Scaling ML models with CloudML</h5><ul> <li>Why Cloud ML?</li><li>Packaging up a TensorFlow model.</li><li>End-to-end training.</li><li>Lab: Run a ML model locally and on cloud.</li></ul><h5>Module 10: Feature Engineering</h5><ul> <li>Creating good features.</li><li>Transforming inputs.</li><li>Synthetic features.</li><li>Preprocessing with Cloud ML.</li><li>Lab: Feature engineering.</li></ul><h5>Module 11: Architecture of streaming analytics pipelines</h5><ul> <li>Stream data processing: Challenges.</li><li>Handling variable data volumes.</li><li>Dealing with unordered/late data.</li><li>Lab: Designing streaming pipeline.</li></ul><h5>Module 12: Ingesting Variable Volumes</h5><ul> <li>What is Cloud Pub/Sub?</li><li>How it works: Topics and Subscriptions.</li><li>Lab: Simulator.</li></ul><h5>Module 13: Implementing streaming pipelines</h5><ul> <li>Challenges in stream processing.</li><li>Handle late data: watermarks, triggers, accumulation.</li><li>Lab: Stream data processing pipeline for live traffic data.</li></ul><h5>Module 14: Streaming analytics and dashboards</h5><ul> <li>Streaming analytics: from data to decisions.</li><li>Querying streaming data with BigQuery.</li><li>What is Google Data Studio?</li><li>Lab: build a real-time dashboard to visualize processed data.</li></ul><h5>Module 15: High throughput and low-latency with Bigtable</h5><ul> <li>What is Cloud Spanner?</li><li>Designing Bigtable schema.</li><li>Ingesting into Bigtable.</li><li>Lab: streaming into Bigtable.</li></ul>This course teaches participants the following skills: - Design and build data processing systems on Google Cloud Platform - Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow - Derive business insights from extremely large datasets using Google BigQuery - Train, evaluate and predict using machine learning models using Tensorflow and Cloud ML - Leverage unstructured data using Spark and ML APIs on Cloud Dataproc - Enable instant insights from streaming dataTo get the most of out of this course, participants should have: - Completed Google Cloud Fundamentals: Big Data and Machine Learning (GCF-BDM) course OR have equivalent experience - Basic proficiency with common query language such as SQL - Experience with data modeling, extract, transform, load activities Developing applications using a common programming language such Python - Familiarity with Machine Learning and/or statisticsThis class is intended for experienced developers who are responsible for managing big data transformations including: - Extracting, Loading, Transforming, cleaning, and validating data - Designing pipelines and architectures for data processing - Creating and maintaining machine learning and statistical models - Querying datasets, visualizing query results and creating reportsModule 1: Google Cloud Dataproc Overview - Creating and managing clusters. - Leveraging custom machine types and preemptible worker nodes. - Scaling and deleting Clusters. - Lab: Creating Hadoop Clusters with Google Cloud Dataproc. Module 2: Running Dataproc Jobs - Running Pig and Hive jobs. - Separation of storage and compute. - Lab: Running Hadoop and Spark Jobs with Dataproc. - Lab: Submit and monitor jobs. Module 3: Integrating Dataproc with Google Cloud Platform - Customize cluster with initialization actions. - BigQuery Support. - Lab: Leveraging Google Cloud Platform Services. Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs - Google’s Machine Learning APIs. - Common ML Use Cases. - Invoking ML APIs. - Lab: Adding Machine Learning Capabilities to Big Data Analysis. Module 5: Serverless data analysis with BigQuery - What is BigQuery. - Queries and Functions. - Lab: Writing queries in BigQuery. - Loading data into BigQuery. - Exporting data from BigQuery. - Lab: Loading and exporting data. - Nested and repeated fields. - Querying multiple tables. - Lab: Complex queries. - Performance and pricing. Module 6: Serverless, autoscaling data pipelines with Dataflow - The Beam programming model. - Data pipelines in Beam Python. - Data pipelines in Beam Java. - Lab: Writing a Dataflow pipeline. - Scalable Big Data processing using Beam. - Lab: MapReduce in Dataflow. - Incorporating additional data. - Lab: Side inputs. - Handling stream data. - GCP Reference architecture. Module 7: Getting started with Machine Learning - What is machine learning (ML). - Effective ML: concepts, types. - ML datasets: generalization. - Lab: Explore and create ML datasets. Module 8: Building ML models with Tensorflow - Getting started with TensorFlow. - Lab: Using tf.learn. - TensorFlow graphs and loops + lab. - Lab: Using low-level TensorFlow + early stopping. - Monitoring ML training. - Lab: Charts and graphs of TensorFlow training. Module 9: Scaling ML models with CloudML - Why Cloud ML? - Packaging up a TensorFlow model. - End-to-end training. - Lab: Run a ML model locally and on cloud. Module 10: Feature Engineering - Creating good features. - Transforming inputs. - Synthetic features. - Preprocessing with Cloud ML. - Lab: Feature engineering. Module 11: Architecture of streaming analytics pipelines - Stream data processing: Challenges. - Handling variable data volumes. - Dealing with unordered/late data. - Lab: Designing streaming pipeline. Module 12: Ingesting Variable Volumes - What is Cloud Pub/Sub? - How it works: Topics and Subscriptions. - Lab: Simulator. Module 13: Implementing streaming pipelines - Challenges in stream processing. - Handle late data: watermarks, triggers, accumulation. - Lab: Stream data processing pipeline for live traffic data. Module 14: Streaming analytics and dashboards - Streaming analytics: from data to decisions. - Querying streaming data with BigQuery. - What is Google Data Studio? - Lab: build a real-time dashboard to visualize processed data. Module 15: High throughput and low-latency with Bigtable - What is Cloud Spanner? - Designing Bigtable schema. - Ingesting into Bigtable. - Lab: streaming into Bigtable.Module 1: Google Cloud Dataproc Overview - Creating and managing clusters. - Leveraging custom machine types and preemptible worker nodes. - Scaling and deleting Clusters. - Lab: Creating Hadoop Clusters with Google Cloud Dataproc. Module 2: Running Dataproc Jobs - Running Pig and Hive jobs. - Separation of storage and compute. - Lab: Running Hadoop and Spark Jobs with Dataproc. - Lab: Submit and monitor jobs. Module 3: Integrating Dataproc with Google Cloud Platform - Customize cluster with initialization actions. - BigQuery Support. - Lab: Leveraging Google Cloud Platform Services. Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs - Google’s Machine Learning APIs. - Common ML Use Cases. - Invoking ML APIs. - Lab: Adding Machine Learning Capabilities to Big Data Analysis. Module 5: Serverless data analysis with BigQuery - What is BigQuery. - Queries and Functions. - Lab: Writing queries in BigQuery. - Loading data into BigQuery. - Exporting data from BigQuery. - Lab: Loading and exporting data. - Nested and repeated fields. - Querying multiple tables. - Lab: Complex queries. - Performance and pricing. Module 6: Serverless, autoscaling data pipelines with Dataflow - The Beam programming model. - Data pipelines in Beam Python. - Data pipelines in Beam Java. - Lab: Writing a Dataflow pipeline. - Scalable Big Data processing using Beam. - Lab: MapReduce in Dataflow. - Incorporating additional data. - Lab: Side inputs. - Handling stream data. - GCP Reference architecture. Module 7: Getting started with Machine Learning - What is machine learning (ML). - Effective ML: concepts, types. - ML datasets: generalization. - Lab: Explore and create ML datasets. Module 8: Building ML models with Tensorflow - Getting started with TensorFlow. - Lab: Using tf.learn. - TensorFlow graphs and loops + lab. - Lab: Using low-level TensorFlow + early stopping. - Monitoring ML training. - Lab: Charts and graphs of TensorFlow training. Module 9: Scaling ML models with CloudML - Why Cloud ML? - Packaging up a TensorFlow model. - End-to-end training. - Lab: Run a ML model locally and on cloud. Module 10: Feature Engineering - Creating good features. - Transforming inputs. - Synthetic features. - Preprocessing with Cloud ML. - Lab: Feature engineering. Module 11: Architecture of streaming analytics pipelines - Stream data processing: Challenges. - Handling variable data volumes. - Dealing with unordered/late data. - Lab: Designing streaming pipeline. Module 12: Ingesting Variable Volumes - What is Cloud Pub/Sub? - How it works: Topics and Subscriptions. - Lab: Simulator. Module 13: Implementing streaming pipelines - Challenges in stream processing. - Handle late data: watermarks, triggers, accumulation. - Lab: Stream data processing pipeline for live traffic data. Module 14: Streaming analytics and dashboards - Streaming analytics: from data to decisions. - Querying streaming data with BigQuery. - What is Google Data Studio? - Lab: build a real-time dashboard to visualize processed data. Module 15: High throughput and low-latency with Bigtable - What is Cloud Spanner? - Designing Bigtable schema. - Ingesting into Bigtable. - Lab: streaming into Bigtable.4 jours2600.002600.002695.002695.002600.002495.001950.003450.002600.002600.003380.001500.00221000.009020.001950.001950.001950.002600.002640.003445.002990.00