<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE FL_Course SYSTEM "https://www.flane.de/dtd/fl_course095.dtd"><?xml-stylesheet type="text/xsl" href="https://portal.flane.ch/css/xml-course.xsl"?><course productid="18642" language="en" source="https://portal.flane.ch/swisscom/en/xml-course/google-degcp" lastchanged="2025-11-18T18:18:14+01:00" parent="https://portal.flane.ch/swisscom/en/xml-courses"><title>Data Engineering on Google Cloud Platform</title><productcode>DEGCP</productcode><vendorcode>GO</vendorcode><vendorname>Google</vendorname><fullproductcode>GO-DEGCP</fullproductcode><version>3.0</version><objective>&lt;ul&gt;
&lt;li&gt;Design and build data processing systems on Google Cloud.&lt;/li&gt;&lt;li&gt;Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.&lt;/li&gt;&lt;li&gt;Derive business insights from extremely large datasets using BigQuery.&lt;/li&gt;&lt;li&gt;Leverage unstructured data using Spark and ML APIs on Dataproc.&lt;/li&gt;&lt;li&gt;Enable instant insights from streaming data.&lt;/li&gt;&lt;/ul&gt;</objective><essentials>&lt;ul&gt;
&lt;li&gt;Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.&lt;/li&gt;&lt;li&gt;Basic proficiency with a common query language such as SQL.&lt;/li&gt;&lt;li&gt;Experience with data modeling and ETL (extract, transform, load) activities.&lt;/li&gt;&lt;li&gt;Experience developing applications using a common programming language such as Python&lt;/li&gt;&lt;/ul&gt;</essentials><audience>&lt;ul&gt;
&lt;li&gt;Data engineers&lt;/li&gt;&lt;li&gt;Database administrators&lt;/li&gt;&lt;li&gt;System administrators&lt;/li&gt;&lt;/ul&gt;</audience><outline>&lt;h4&gt;Module 01 - Data engineering tasks and components&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The role of a data engineer&lt;/li&gt;&lt;li&gt;Data sources versus data syncs&lt;/li&gt;&lt;li&gt;Data formats&lt;/li&gt;&lt;li&gt;Storage solution options on Google Cloud&lt;/li&gt;&lt;li&gt;Metadata management options on Google Cloud&lt;/li&gt;&lt;li&gt;Share datasets using Analytics Hub&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the role of a data engineer.&lt;/li&gt;&lt;li&gt;Understand the differences between a data source and a data sink.&lt;/li&gt;&lt;li&gt;Explain the different types of data formats.&lt;/li&gt;&lt;li&gt;Explain the storage solution options on Google Cloud.&lt;/li&gt;&lt;li&gt;Learn about the metadata management options on Google Cloud.&lt;/li&gt;&lt;li&gt;Understand how to share datasets with ease using Analytics Hub.&lt;/li&gt;&lt;li&gt;Understand how to load data into BigQuery using the Google Cloud console and/or the gcloud CLI.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Loading Data into BigQuery&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 02 - Data replication and migration&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Replication and migration architecture&lt;/li&gt;&lt;li&gt;The gcloud command line tool&lt;/li&gt;&lt;li&gt;Moving datasets&lt;/li&gt;&lt;li&gt;Datastream&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the baseline Google Cloud data replication and migration architecture.&lt;/li&gt;&lt;li&gt;Understand the options and use cases for the gcloud command line tool.&lt;/li&gt;&lt;li&gt;Explain the functionality and use cases for the Storage Transfer Service.&lt;/li&gt;&lt;li&gt;Explain the functionality and use cases for the Transfer Appliance.&lt;/li&gt;&lt;li&gt;Understand the features and deployment of Datastream.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Datastream: PostgreSQL Replication to BigQuery&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 03 - The extract and load data pipeline pattern&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract and load architecture&lt;/li&gt;&lt;li&gt;The bq command line tool&lt;/li&gt;&lt;li&gt;BigQuery Data Transfer Service&lt;/li&gt;&lt;li&gt;BigLake&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the baseline extract and load architecture diagram.&lt;/li&gt;&lt;li&gt;Understand the options of the bq command line tool.&lt;/li&gt;&lt;li&gt;Explain the functionality and use cases for the BigQuery Data Transfer Service.&lt;/li&gt;&lt;li&gt;Explain the functionality and use cases for BigLake as a non-extract-load pattern.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: BigLake: Qwik Start&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 04 - The extract, load, and transform data pipeline pattern&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract, load, and transform (ELT) architecture&lt;/li&gt;&lt;li&gt;SQL scripting and scheduling with BigQuery&lt;/li&gt;&lt;li&gt;Dataform&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the baseline extract, load, and transform architecture diagram.&lt;/li&gt;&lt;li&gt;Understand a common ELT pipeline on Google Cloud.&lt;/li&gt;&lt;li&gt;Learn about BigQuery&amp;rsquo;s SQL scripting and scheduling capabilities.&lt;/li&gt;&lt;li&gt;Explain the functionality and use cases for Dataform.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Create and Execute a SQL Workflow in Dataform&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 05 - The extract, transform, and load data pipeline pattern&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extract, transform, and load (ETL) architecture&lt;/li&gt;&lt;li&gt;Google Cloud GUI tools for ETL data pipelines&lt;/li&gt;&lt;li&gt;Batch data processing using Dataproc&lt;/li&gt;&lt;li&gt;Streaming data processing options&lt;/li&gt;&lt;li&gt;Bigtable and data pipelines&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the baseline extract, transform, and load architecture diagram.&lt;/li&gt;&lt;li&gt;Learn about the GUI tools on Google Cloud used for ETL data pipelines.&lt;/li&gt;&lt;li&gt;Explain batch data processing using Dataproc.&lt;/li&gt;&lt;li&gt;Learn to use Dataproc Serverless for Spark for ETL.&lt;/li&gt;&lt;li&gt;Explain streaming data processing options.&lt;/li&gt;&lt;li&gt;Explain the role Bigtable plays in data pipelines.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Use Dataproc Serverless for Spark to Load BigQuery&lt;/li&gt;&lt;li&gt;Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 06 - Automation techniques&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automation patterns and options for pipelines&lt;/li&gt;&lt;li&gt;Cloud Scheduler and Workflows&lt;/li&gt;&lt;li&gt;Cloud Composer&lt;/li&gt;&lt;li&gt;Cloud Run functions&lt;/li&gt;&lt;li&gt;Eventarc&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain the automation patterns and options available for pipelines.&lt;/li&gt;&lt;li&gt;Learn about Cloud Scheduler and workflows.&lt;/li&gt;&lt;li&gt;Learn about Cloud Composer.&lt;/li&gt;&lt;li&gt;Learn about Cloud Run functions.&lt;/li&gt;&lt;li&gt;Explain the functionality and automation use cases for Eventarc.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Use Cloud Run Functions to Load BigQuery&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 07 - Introduction to data engineering&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data engineer&amp;rsquo;s role&lt;/li&gt;&lt;li&gt;Data engineering challenges&lt;/li&gt;&lt;li&gt;Introduction to BigQuery&lt;/li&gt;&lt;li&gt;Data lakes and data warehouses&lt;/li&gt;&lt;li&gt;Transactional databases versus data warehouses&lt;/li&gt;&lt;li&gt;Effective partnership with other data teams&lt;/li&gt;&lt;li&gt;Management of data access and governance&lt;/li&gt;&lt;li&gt;Building of production-ready pipelines&lt;/li&gt;&lt;li&gt;Google Cloud customer case study&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.&lt;/li&gt;&lt;li&gt;Review and understand the purpose of a data lake versus a data warehouse, and when to use which.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Using BigQuery to Do Analysis&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 08 - Build a Data Lake&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to data lakes&lt;/li&gt;&lt;li&gt;Data storage and ETL options on Google Cloud&lt;/li&gt;&lt;li&gt;Building of a data lake using Cloud Storage&lt;/li&gt;&lt;li&gt;Secure Cloud Storage&lt;/li&gt;&lt;li&gt;Store all sorts of data types&lt;/li&gt;&lt;li&gt;Cloud SQL as your OLTP system&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.&lt;/li&gt;&lt;li&gt;Explain how to use Cloud SQL for a relational data lake.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Loading Taxi Data into Cloud SQL&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 09 - Build a data warehouse&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The modern data warehouse&lt;/li&gt;&lt;li&gt;Introduction to BigQuery&lt;/li&gt;&lt;li&gt;Get started with BigQuery&lt;/li&gt;&lt;li&gt;Loading of data into BigQuery&lt;/li&gt;&lt;li&gt;Exploration of schemas&lt;/li&gt;&lt;li&gt;Schema design&lt;/li&gt;&lt;li&gt;Nested and repeated fields&lt;/li&gt;&lt;li&gt;Optimization with partitioning and clustering&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discuss requirements of a modern warehouse.&lt;/li&gt;&lt;li&gt;Explain why BigQuery is the scalable data warehousing solution on Google Cloud.&lt;/li&gt;&lt;li&gt;Discuss the core concepts of BigQuery and review options of loading data into BigQuery.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Working with JSON and Array Data in BigQuery&lt;/li&gt;&lt;li&gt;Lab: Partitioned Tables in BigQuery&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 10 - Introduction to building batch data pipelines&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;EL, ELT, ETL&lt;/li&gt;&lt;li&gt;Quality considerations&lt;/li&gt;&lt;li&gt;Ways of executing operations in BigQuery&lt;/li&gt;&lt;li&gt;Shortcomings&lt;/li&gt;&lt;li&gt;ETL to solve data quality issues&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 11 - Execute Spark on Dataproc&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Hadoop ecosystem&lt;/li&gt;&lt;li&gt;Run Hadoop on Dataproc&lt;/li&gt;&lt;li&gt;Cloud Storage instead of HDFS&lt;/li&gt;&lt;li&gt;Optimize Dataproc&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review the Hadoop ecosystem.&lt;/li&gt;&lt;li&gt;Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.&lt;/li&gt;&lt;li&gt;Explain when you would use Cloud Storage instead of HDFS storage.&lt;/li&gt;&lt;li&gt;Explain how to optimize Dataproc jobs.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Running Apache Spark Jobs on Dataproc&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 12 - Serverless data processing with Dataflow&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to Dataflow&lt;/li&gt;&lt;li&gt;Reasons why customers value Dataflow&lt;/li&gt;&lt;li&gt;Dataflow pipelines&lt;/li&gt;&lt;li&gt;Aggregating with GroupByKey and Combine&lt;/li&gt;&lt;li&gt;Side inputs and windows&lt;/li&gt;&lt;li&gt;Dataflow templates&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identify features customers value in Dataflow.&lt;/li&gt;&lt;li&gt;Discuss core concepts in Dataflow.&lt;/li&gt;&lt;li&gt;Review the use of Dataflow templates and SQL.&lt;/li&gt;&lt;li&gt;Write a simple Dataflow pipeline and run it both locally and on the cloud.&lt;/li&gt;&lt;li&gt;Identify Map and Reduce operations, execute the pipeline, and use command line parameters.&lt;/li&gt;&lt;li&gt;Read data from BigQuery into Dataflow and use the output of a pipeline as a side-input to another pipeline.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: A Simple Dataflow Pipeline (Python/Java)&lt;/li&gt;&lt;li&gt;Lab: MapReduce in Beam (Python/Java)&lt;/li&gt;&lt;li&gt;Lab: Side Inputs (Python/Java)&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 13 - Manage data pipelines with Cloud Data Fusion and Cloud Composer&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build batch data pipelines visually with Cloud Data Fusion&lt;ul&gt;
&lt;li&gt;Components&lt;/li&gt;&lt;li&gt;UI overview&lt;/li&gt;&lt;li&gt;Building a pipeline&lt;/li&gt;&lt;li&gt;Exploring data using Wrangler&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Orchestrate work between Google Cloud services with Cloud Composer&lt;ul&gt;
&lt;li&gt;Apache Airflow environment&lt;/li&gt;&lt;li&gt;DAGs and operators&lt;/li&gt;&lt;li&gt;Workflow scheduling&lt;/li&gt;&lt;li&gt;Monitoring and logging&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.&lt;/li&gt;&lt;li&gt;Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.&lt;/li&gt;&lt;li&gt;Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Building and Executing a Pipeline Graph in Data Fusion&lt;/li&gt;&lt;li&gt;Lab: An Introduction to Cloud Composer&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 14 - Introduction to processing streaming data&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Process streaming data&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explain streaming data processing.&lt;/li&gt;&lt;li&gt;Identify the Google Cloud products and tools that can help address streaming data challenges.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 15 - Serverless messaging with Pub/Sub&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduction to Pub/Sub&lt;/li&gt;&lt;li&gt;Pub/Sub push versus pull&lt;/li&gt;&lt;li&gt;Publishing with Pub/Sub code&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Describe the Pub/Sub service.&lt;/li&gt;&lt;li&gt;Explain how Pub/Sub works.&lt;/li&gt;&lt;li&gt;Simulate real-time streaming sensor data using Pub/Sub.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Publish Streaming Data into Pub/Sub&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 16 - Dataflow streaming features&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Steaming data challenges&lt;/li&gt;&lt;li&gt;Dataflow windowing&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Describe the Dataflow service.&lt;/li&gt;&lt;li&gt;Build a stream processing pipeline for live traffic data.&lt;/li&gt;&lt;li&gt;Demonstrate how to handle late data using watermarks, triggers, and accumulation.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Streaming Data Pipelines&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 17 - High-throughput BigQuery and Bigtable streaming features&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Streaming into BigQuery and visualizing results&lt;/li&gt;&lt;li&gt;High-throughput streaming with Bigtable&lt;/li&gt;&lt;li&gt;Optimizing Bigtable performance&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.&lt;/li&gt;&lt;li&gt;Discuss Bigtable as a low-latency solution.&lt;/li&gt;&lt;li&gt;Describe how to architect for Bigtable and how to ingest data into Bigtable.&lt;/li&gt;&lt;li&gt;Highlight performance considerations for the relevant services.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Streaming Analytics and Dashboards&lt;/li&gt;&lt;li&gt;Lab: Generate Personalized Email Content with BigQuery Continuous Queries and Gemini&lt;/li&gt;&lt;li&gt;Lab: Streaming Data Pipelines into Bigtable&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Module 18 - Advanced BigQuery functionality and performance&lt;/h4&gt;&lt;p&gt;
&lt;strong&gt;Topics:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Analytic window functions&lt;/li&gt;&lt;li&gt;GIS functions&lt;/li&gt;&lt;li&gt;Performance considerations&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Objectives:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Review some of BigQuery&amp;rsquo;s advanced analysis capabilities.&lt;/li&gt;&lt;li&gt;Discuss ways to improve query performance.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;Activities:&lt;/strong&gt;
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lab: Optimizing Your BigQuery Queries for Performance&lt;/li&gt;&lt;/ul&gt;</outline><objective_plain>- Design and build data processing systems on Google Cloud.
- Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
- Derive business insights from extremely large datasets using BigQuery.
- Leverage unstructured data using Spark and ML APIs on Dataproc.
- Enable instant insights from streaming data.</objective_plain><essentials_plain>- Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
- Basic proficiency with a common query language such as SQL.
- Experience with data modeling and ETL (extract, transform, load) activities.
- Experience developing applications using a common programming language such as Python</essentials_plain><audience_plain>- Data engineers
- Database administrators
- System administrators</audience_plain><outline_plain>Module 01 - Data engineering tasks and components


Topics:



- The role of a data engineer
- Data sources versus data syncs
- Data formats
- Storage solution options on Google Cloud
- Metadata management options on Google Cloud
- Share datasets using Analytics Hub
Objectives:



- Explain the role of a data engineer.
- Understand the differences between a data source and a data sink.
- Explain the different types of data formats.
- Explain the storage solution options on Google Cloud.
- Learn about the metadata management options on Google Cloud.
- Understand how to share datasets with ease using Analytics Hub.
- Understand how to load data into BigQuery using the Google Cloud console and/or the gcloud CLI.
Activities:



- Lab: Loading Data into BigQuery
Module 02 - Data replication and migration


Topics:



- Replication and migration architecture
- The gcloud command line tool
- Moving datasets
- Datastream
Objectives:



- Explain the baseline Google Cloud data replication and migration architecture.
- Understand the options and use cases for the gcloud command line tool.
- Explain the functionality and use cases for the Storage Transfer Service.
- Explain the functionality and use cases for the Transfer Appliance.
- Understand the features and deployment of Datastream.
Activities:



- Lab: Datastream: PostgreSQL Replication to BigQuery
Module 03 - The extract and load data pipeline pattern


Topics:



- Extract and load architecture
- The bq command line tool
- BigQuery Data Transfer Service
- BigLake
Objectives:



- Explain the baseline extract and load architecture diagram.
- Understand the options of the bq command line tool.
- Explain the functionality and use cases for the BigQuery Data Transfer Service.
- Explain the functionality and use cases for BigLake as a non-extract-load pattern.
Activities:



- Lab: BigLake: Qwik Start
Module 04 - The extract, load, and transform data pipeline pattern


Topics:



- Extract, load, and transform (ELT) architecture
- SQL scripting and scheduling with BigQuery
- Dataform
Objectives:



- Explain the baseline extract, load, and transform architecture diagram.
- Understand a common ELT pipeline on Google Cloud.
- Learn about BigQuery’s SQL scripting and scheduling capabilities.
- Explain the functionality and use cases for Dataform.
Activities:



- Lab: Create and Execute a SQL Workflow in Dataform
Module 05 - The extract, transform, and load data pipeline pattern


Topics:



- Extract, transform, and load (ETL) architecture
- Google Cloud GUI tools for ETL data pipelines
- Batch data processing using Dataproc
- Streaming data processing options
- Bigtable and data pipelines
Objectives:



- Explain the baseline extract, transform, and load architecture diagram.
- Learn about the GUI tools on Google Cloud used for ETL data pipelines.
- Explain batch data processing using Dataproc.
- Learn to use Dataproc Serverless for Spark for ETL.
- Explain streaming data processing options.
- Explain the role Bigtable plays in data pipelines.
Activities:



- Lab: Use Dataproc Serverless for Spark to Load BigQuery
- Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow
Module 06 - Automation techniques


Topics:



- Automation patterns and options for pipelines
- Cloud Scheduler and Workflows
- Cloud Composer
- Cloud Run functions
- Eventarc
Objectives:



- Explain the automation patterns and options available for pipelines.
- Learn about Cloud Scheduler and workflows.
- Learn about Cloud Composer.
- Learn about Cloud Run functions.
- Explain the functionality and automation use cases for Eventarc.
Activities:



- Lab: Use Cloud Run Functions to Load BigQuery
Module 07 - Introduction to data engineering


Topics:



- Data engineer’s role
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Effective partnership with other data teams
- Management of data access and governance
- Building of production-ready pipelines
- Google Cloud customer case study
Objectives:



- Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.
- Review and understand the purpose of a data lake versus a data warehouse, and when to use which.
Activities:



- Lab: Using BigQuery to Do Analysis
Module 08 - Build a Data Lake


Topics:



- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building of a data lake using Cloud Storage
- Secure Cloud Storage
- Store all sorts of data types
- Cloud SQL as your OLTP system
Objectives:



- Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.
- Explain how to use Cloud SQL for a relational data lake.
Activities:



- Lab: Loading Taxi Data into Cloud SQL
Module 09 - Build a data warehouse


Topics:



- The modern data warehouse
- Introduction to BigQuery
- Get started with BigQuery
- Loading of data into BigQuery
- Exploration of schemas
- Schema design
- Nested and repeated fields
- Optimization with partitioning and clustering
Objectives:



- Discuss requirements of a modern warehouse.
- Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
- Discuss the core concepts of BigQuery and review options of loading data into BigQuery.
Activities:



- Lab: Working with JSON and Array Data in BigQuery
- Lab: Partitioned Tables in BigQuery
Module 10 - Introduction to building batch data pipelines


Topics:



- EL, ELT, ETL
- Quality considerations
- Ways of executing operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
Objectives:



- Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.
Module 11 - Execute Spark on Dataproc


Topics:



- The Hadoop ecosystem
- Run Hadoop on Dataproc
- Cloud Storage instead of HDFS
- Optimize Dataproc
Objectives:



- Review the Hadoop ecosystem.
- Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.
- Explain when you would use Cloud Storage instead of HDFS storage.
- Explain how to optimize Dataproc jobs.
Activities:



- Lab: Running Apache Spark Jobs on Dataproc
Module 12 - Serverless data processing with Dataflow


Topics:



- Introduction to Dataflow
- Reasons why customers value Dataflow
- Dataflow pipelines
- Aggregating with GroupByKey and Combine
- Side inputs and windows
- Dataflow templates
Objectives:



- Identify features customers value in Dataflow.
- Discuss core concepts in Dataflow.
- Review the use of Dataflow templates and SQL.
- Write a simple Dataflow pipeline and run it both locally and on the cloud.
- Identify Map and Reduce operations, execute the pipeline, and use command line parameters.
- Read data from BigQuery into Dataflow and use the output of a pipeline as a side-input to another pipeline.
Activities:



- Lab: A Simple Dataflow Pipeline (Python/Java)
- Lab: MapReduce in Beam (Python/Java)
- Lab: Side Inputs (Python/Java)
Module 13 - Manage data pipelines with Cloud Data Fusion and Cloud Composer


Topics:



- Build batch data pipelines visually with Cloud Data Fusion
- Components
- UI overview
- Building a pipeline
- Exploring data using Wrangler
- Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow environment
- DAGs and operators
- Workflow scheduling
- Monitoring and logging
Objectives:



- Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.
- Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.
- Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.
Activities:



- Lab: Building and Executing a Pipeline Graph in Data Fusion
- Lab: An Introduction to Cloud Composer
Module 14 - Introduction to processing streaming data


Topics:



- Process streaming data
Objectives:



- Explain streaming data processing.
- Identify the Google Cloud products and tools that can help address streaming data challenges.
Module 15 - Serverless messaging with Pub/Sub


Topics:



- Introduction to Pub/Sub
- Pub/Sub push versus pull
- Publishing with Pub/Sub code
Objectives:



- Describe the Pub/Sub service.
- Explain how Pub/Sub works.
- Simulate real-time streaming sensor data using Pub/Sub.
Activities:



- Lab: Publish Streaming Data into Pub/Sub
Module 16 - Dataflow streaming features


Topics:



- Steaming data challenges
- Dataflow windowing
Objectives:



- Describe the Dataflow service.
- Build a stream processing pipeline for live traffic data.
- Demonstrate how to handle late data using watermarks, triggers, and accumulation.
Activities:



- Lab: Streaming Data Pipelines
Module 17 - High-throughput BigQuery and Bigtable streaming features


Topics:



- Streaming into BigQuery and visualizing results
- High-throughput streaming with Bigtable
- Optimizing Bigtable performance
Objectives:



- Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.
- Discuss Bigtable as a low-latency solution.
- Describe how to architect for Bigtable and how to ingest data into Bigtable.
- Highlight performance considerations for the relevant services.
Activities:



- Lab: Streaming Analytics and Dashboards
- Lab: Generate Personalized Email Content with BigQuery Continuous Queries and Gemini
- Lab: Streaming Data Pipelines into Bigtable
Module 18 - Advanced BigQuery functionality and performance


Topics:



- Analytic window functions
- GIS functions
- Performance considerations
Objectives:



- Review some of BigQuery’s advanced analysis capabilities.
- Discuss ways to improve query performance.
Activities:



- Lab: Optimizing Your BigQuery Queries for Performance</outline_plain><duration unit="d" days="4">4 days</duration><pricelist><price country="IT" currency="EUR">2600.00</price><price country="DE" currency="EUR">2600.00</price><price country="NL" currency="EUR">2695.00</price><price country="BE" currency="EUR">2695.00</price><price country="AT" currency="EUR">2600.00</price><price country="US" currency="USD">2495.00</price><price country="ES" currency="EUR">1950.00</price><price country="SG" currency="SGD">3450.00</price><price country="SE" currency="EUR">2600.00</price><price country="AE" currency="USD">2600.00</price><price country="CH" currency="CHF">3380.00</price><price country="IN" currency="USD">1500.00</price><price country="RU" currency="RUB">221000.00</price><price country="IL" currency="ILS">9020.00</price><price country="GR" currency="EUR">1950.00</price><price country="MK" currency="EUR">1950.00</price><price country="HU" currency="EUR">1950.00</price><price country="SI" currency="EUR">2600.00</price><price country="GB" currency="GBP">2640.00</price><price country="CA" currency="CAD">3445.00</price><price country="FR" currency="EUR">2990.00</price></pricelist><miles/></course>