{"course":{"productid":18642,"modality":1,"active":true,"language":"en","title":"Data Engineering on Google Cloud Platform","productcode":"DEGCP","vendorcode":"GO","vendorname":"Google","fullproductcode":"GO-DEGCP","courseware":{"has_ekit":false,"has_printkit":true,"language":"en"},"url":"https:\/\/portal.flane.ch\/course\/google-degcp","objective":"<ul>\n<li>Design and build data processing systems on Google Cloud.<\/li><li>Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.<\/li><li>Derive business insights from extremely large datasets using BigQuery.<\/li><li>Leverage unstructured data using Spark and ML APIs on Dataproc.<\/li><li>Enable instant insights from streaming data.<\/li><\/ul>","essentials":"<ul>\n<li>Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.<\/li><li>Basic proficiency with a common query language such as SQL.<\/li><li>Experience with data modeling and ETL (extract, transform, load) activities.<\/li><li>Experience developing applications using a common programming language such as Python<\/li><\/ul>","audience":"<ul>\n<li>Data engineers<\/li><li>Database administrators<\/li><li>System administrators<\/li><\/ul>","outline":"<h4>Module 01 - Data engineering tasks and components<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>The role of a data engineer<\/li><li>Data sources versus data syncs<\/li><li>Data formats<\/li><li>Storage solution options on Google Cloud<\/li><li>Metadata management options on Google Cloud<\/li><li>Share datasets using Analytics Hub<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain the role of a data engineer.<\/li><li>Understand the differences between a data source and a data sink.<\/li><li>Explain the different types of data formats.<\/li><li>Explain the storage solution options on Google Cloud.<\/li><li>Learn about the metadata management options on Google Cloud.<\/li><li>Understand how to share datasets with ease using Analytics Hub.<\/li><li>Understand how to load data into BigQuery using the Google Cloud console and\/or the gcloud CLI.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Loading Data into BigQuery<\/li><\/ul><h4>Module 02 - Data replication and migration<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Replication and migration architecture<\/li><li>The gcloud command line tool<\/li><li>Moving datasets<\/li><li>Datastream<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain the baseline Google Cloud data replication and migration architecture.<\/li><li>Understand the options and use cases for the gcloud command line tool.<\/li><li>Explain the functionality and use cases for the Storage Transfer Service.<\/li><li>Explain the functionality and use cases for the Transfer Appliance.<\/li><li>Understand the features and deployment of Datastream.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Datastream: PostgreSQL Replication to BigQuery<\/li><\/ul><h4>Module 03 - The extract and load data pipeline pattern<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Extract and load architecture<\/li><li>The bq command line tool<\/li><li>BigQuery Data Transfer Service<\/li><li>BigLake<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain the baseline extract and load architecture diagram.<\/li><li>Understand the options of the bq command line tool.<\/li><li>Explain the functionality and use cases for the BigQuery Data Transfer Service.<\/li><li>Explain the functionality and use cases for BigLake as a non-extract-load pattern.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: BigLake: Qwik Start<\/li><\/ul><h4>Module 04 - The extract, load, and transform data pipeline pattern<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Extract, load, and transform (ELT) architecture<\/li><li>SQL scripting and scheduling with BigQuery<\/li><li>Dataform<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain the baseline extract, load, and transform architecture diagram.<\/li><li>Understand a common ELT pipeline on Google Cloud.<\/li><li>Learn about BigQuery&rsquo;s SQL scripting and scheduling capabilities.<\/li><li>Explain the functionality and use cases for Dataform.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Create and Execute a SQL Workflow in Dataform<\/li><\/ul><h4>Module 05 - The extract, transform, and load data pipeline pattern<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Extract, transform, and load (ETL) architecture<\/li><li>Google Cloud GUI tools for ETL data pipelines<\/li><li>Batch data processing using Dataproc<\/li><li>Streaming data processing options<\/li><li>Bigtable and data pipelines<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain the baseline extract, transform, and load architecture diagram.<\/li><li>Learn about the GUI tools on Google Cloud used for ETL data pipelines.<\/li><li>Explain batch data processing using Dataproc.<\/li><li>Learn to use Dataproc Serverless for Spark for ETL.<\/li><li>Explain streaming data processing options.<\/li><li>Explain the role Bigtable plays in data pipelines.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Use Dataproc Serverless for Spark to Load BigQuery<\/li><li>Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow<\/li><\/ul><h4>Module 06 - Automation techniques<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Automation patterns and options for pipelines<\/li><li>Cloud Scheduler and Workflows<\/li><li>Cloud Composer<\/li><li>Cloud Run functions<\/li><li>Eventarc<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain the automation patterns and options available for pipelines.<\/li><li>Learn about Cloud Scheduler and workflows.<\/li><li>Learn about Cloud Composer.<\/li><li>Learn about Cloud Run functions.<\/li><li>Explain the functionality and automation use cases for Eventarc.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Use Cloud Run Functions to Load BigQuery<\/li><\/ul><h4>Module 07 - Introduction to data engineering<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Data engineer&rsquo;s role<\/li><li>Data engineering challenges<\/li><li>Introduction to BigQuery<\/li><li>Data lakes and data warehouses<\/li><li>Transactional databases versus data warehouses<\/li><li>Effective partnership with other data teams<\/li><li>Management of data access and governance<\/li><li>Building of production-ready pipelines<\/li><li>Google Cloud customer case study<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.<\/li><li>Review and understand the purpose of a data lake versus a data warehouse, and when to use which.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Using BigQuery to Do Analysis<\/li><\/ul><h4>Module 08 - Build a Data Lake<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Introduction to data lakes<\/li><li>Data storage and ETL options on Google Cloud<\/li><li>Building of a data lake using Cloud Storage<\/li><li>Secure Cloud Storage<\/li><li>Store all sorts of data types<\/li><li>Cloud SQL as your OLTP system<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.<\/li><li>Explain how to use Cloud SQL for a relational data lake.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Loading Taxi Data into Cloud SQL<\/li><\/ul><h4>Module 09 - Build a data warehouse<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>The modern data warehouse<\/li><li>Introduction to BigQuery<\/li><li>Get started with BigQuery<\/li><li>Loading of data into BigQuery<\/li><li>Exploration of schemas<\/li><li>Schema design<\/li><li>Nested and repeated fields<\/li><li>Optimization with partitioning and clustering<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Discuss requirements of a modern warehouse.<\/li><li>Explain why BigQuery is the scalable data warehousing solution on Google Cloud.<\/li><li>Discuss the core concepts of BigQuery and review options of loading data into BigQuery.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Working with JSON and Array Data in BigQuery<\/li><li>Lab: Partitioned Tables in BigQuery<\/li><\/ul><h4>Module 10 - Introduction to building batch data pipelines<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>EL, ELT, ETL<\/li><li>Quality considerations<\/li><li>Ways of executing operations in BigQuery<\/li><li>Shortcomings<\/li><li>ETL to solve data quality issues<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.<\/li><\/ul><h4>Module 11 - Execute Spark on Dataproc<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>The Hadoop ecosystem<\/li><li>Run Hadoop on Dataproc<\/li><li>Cloud Storage instead of HDFS<\/li><li>Optimize Dataproc<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Review the Hadoop ecosystem.<\/li><li>Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.<\/li><li>Explain when you would use Cloud Storage instead of HDFS storage.<\/li><li>Explain how to optimize Dataproc jobs.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Running Apache Spark Jobs on Dataproc<\/li><\/ul><h4>Module 12 - Serverless data processing with Dataflow<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Introduction to Dataflow<\/li><li>Reasons why customers value Dataflow<\/li><li>Dataflow pipelines<\/li><li>Aggregating with GroupByKey and Combine<\/li><li>Side inputs and windows<\/li><li>Dataflow templates<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Identify features customers value in Dataflow.<\/li><li>Discuss core concepts in Dataflow.<\/li><li>Review the use of Dataflow templates and SQL.<\/li><li>Write a simple Dataflow pipeline and run it both locally and on the cloud.<\/li><li>Identify Map and Reduce operations, execute the pipeline, and use command line parameters.<\/li><li>Read data from BigQuery into Dataflow and use the output of a pipeline as a side-input to another pipeline.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: A Simple Dataflow Pipeline (Python\/Java)<\/li><li>Lab: MapReduce in Beam (Python\/Java)<\/li><li>Lab: Side Inputs (Python\/Java)<\/li><\/ul><h4>Module 13 - Manage data pipelines with Cloud Data Fusion and Cloud Composer<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Build batch data pipelines visually with Cloud Data Fusion<ul>\n<li>Components<\/li><li>UI overview<\/li><li>Building a pipeline<\/li><li>Exploring data using Wrangler<\/li><\/ul><\/li><li>Orchestrate work between Google Cloud services with Cloud Composer<ul>\n<li>Apache Airflow environment<\/li><li>DAGs and operators<\/li><li>Workflow scheduling<\/li><li>Monitoring and logging<\/li><\/ul><\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.<\/li><li>Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.<\/li><li>Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Building and Executing a Pipeline Graph in Data Fusion<\/li><li>Lab: An Introduction to Cloud Composer<\/li><\/ul><h4>Module 14 - Introduction to processing streaming data<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Process streaming data<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Explain streaming data processing.<\/li><li>Identify the Google Cloud products and tools that can help address streaming data challenges.<\/li><\/ul><h4>Module 15 - Serverless messaging with Pub\/Sub<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Introduction to Pub\/Sub<\/li><li>Pub\/Sub push versus pull<\/li><li>Publishing with Pub\/Sub code<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Describe the Pub\/Sub service.<\/li><li>Explain how Pub\/Sub works.<\/li><li>Simulate real-time streaming sensor data using Pub\/Sub.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Publish Streaming Data into Pub\/Sub<\/li><\/ul><h4>Module 16 - Dataflow streaming features<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Steaming data challenges<\/li><li>Dataflow windowing<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Describe the Dataflow service.<\/li><li>Build a stream processing pipeline for live traffic data.<\/li><li>Demonstrate how to handle late data using watermarks, triggers, and accumulation.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Streaming Data Pipelines<\/li><\/ul><h4>Module 17 - High-throughput BigQuery and Bigtable streaming features<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Streaming into BigQuery and visualizing results<\/li><li>High-throughput streaming with Bigtable<\/li><li>Optimizing Bigtable performance<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.<\/li><li>Discuss Bigtable as a low-latency solution.<\/li><li>Describe how to architect for Bigtable and how to ingest data into Bigtable.<\/li><li>Highlight performance considerations for the relevant services.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Streaming Analytics and Dashboards<\/li><li>Lab: Generate Personalized Email Content with BigQuery Continuous Queries and Gemini<\/li><li>Lab: Streaming Data Pipelines into Bigtable<\/li><\/ul><h4>Module 18 - Advanced BigQuery functionality and performance<\/h4><p>\n<strong>Topics:<\/strong>\n<\/p>\n<ul>\n<li>Analytic window functions<\/li><li>GIS functions<\/li><li>Performance considerations<\/li><\/ul><p><strong>Objectives:<\/strong>\n<\/p>\n<ul>\n<li>Review some of BigQuery&rsquo;s advanced analysis capabilities.<\/li><li>Discuss ways to improve query performance.<\/li><\/ul><p><strong>Activities:<\/strong>\n<\/p>\n<ul>\n<li>Lab: Optimizing Your BigQuery Queries for Performance<\/li><\/ul>","summary":"<p>Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.<\/p>","objective_plain":"- Design and build data processing systems on Google Cloud.\n- Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.\n- Derive business insights from extremely large datasets using BigQuery.\n- Leverage unstructured data using Spark and ML APIs on Dataproc.\n- Enable instant insights from streaming data.","essentials_plain":"- Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.\n- Basic proficiency with a common query language such as SQL.\n- Experience with data modeling and ETL (extract, transform, load) activities.\n- Experience developing applications using a common programming language such as Python","audience_plain":"- Data engineers\n- Database administrators\n- System administrators","outline_plain":"Module 01 - Data engineering tasks and components\n\n\nTopics:\n\n\n\n- The role of a data engineer\n- Data sources versus data syncs\n- Data formats\n- Storage solution options on Google Cloud\n- Metadata management options on Google Cloud\n- Share datasets using Analytics Hub\nObjectives:\n\n\n\n- Explain the role of a data engineer.\n- Understand the differences between a data source and a data sink.\n- Explain the different types of data formats.\n- Explain the storage solution options on Google Cloud.\n- Learn about the metadata management options on Google Cloud.\n- Understand how to share datasets with ease using Analytics Hub.\n- Understand how to load data into BigQuery using the Google Cloud console and\/or the gcloud CLI.\nActivities:\n\n\n\n- Lab: Loading Data into BigQuery\nModule 02 - Data replication and migration\n\n\nTopics:\n\n\n\n- Replication and migration architecture\n- The gcloud command line tool\n- Moving datasets\n- Datastream\nObjectives:\n\n\n\n- Explain the baseline Google Cloud data replication and migration architecture.\n- Understand the options and use cases for the gcloud command line tool.\n- Explain the functionality and use cases for the Storage Transfer Service.\n- Explain the functionality and use cases for the Transfer Appliance.\n- Understand the features and deployment of Datastream.\nActivities:\n\n\n\n- Lab: Datastream: PostgreSQL Replication to BigQuery\nModule 03 - The extract and load data pipeline pattern\n\n\nTopics:\n\n\n\n- Extract and load architecture\n- The bq command line tool\n- BigQuery Data Transfer Service\n- BigLake\nObjectives:\n\n\n\n- Explain the baseline extract and load architecture diagram.\n- Understand the options of the bq command line tool.\n- Explain the functionality and use cases for the BigQuery Data Transfer Service.\n- Explain the functionality and use cases for BigLake as a non-extract-load pattern.\nActivities:\n\n\n\n- Lab: BigLake: Qwik Start\nModule 04 - The extract, load, and transform data pipeline pattern\n\n\nTopics:\n\n\n\n- Extract, load, and transform (ELT) architecture\n- SQL scripting and scheduling with BigQuery\n- Dataform\nObjectives:\n\n\n\n- Explain the baseline extract, load, and transform architecture diagram.\n- Understand a common ELT pipeline on Google Cloud.\n- Learn about BigQuery\u2019s SQL scripting and scheduling capabilities.\n- Explain the functionality and use cases for Dataform.\nActivities:\n\n\n\n- Lab: Create and Execute a SQL Workflow in Dataform\nModule 05 - The extract, transform, and load data pipeline pattern\n\n\nTopics:\n\n\n\n- Extract, transform, and load (ETL) architecture\n- Google Cloud GUI tools for ETL data pipelines\n- Batch data processing using Dataproc\n- Streaming data processing options\n- Bigtable and data pipelines\nObjectives:\n\n\n\n- Explain the baseline extract, transform, and load architecture diagram.\n- Learn about the GUI tools on Google Cloud used for ETL data pipelines.\n- Explain batch data processing using Dataproc.\n- Learn to use Dataproc Serverless for Spark for ETL.\n- Explain streaming data processing options.\n- Explain the role Bigtable plays in data pipelines.\nActivities:\n\n\n\n- Lab: Use Dataproc Serverless for Spark to Load BigQuery\n- Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow\nModule 06 - Automation techniques\n\n\nTopics:\n\n\n\n- Automation patterns and options for pipelines\n- Cloud Scheduler and Workflows\n- Cloud Composer\n- Cloud Run functions\n- Eventarc\nObjectives:\n\n\n\n- Explain the automation patterns and options available for pipelines.\n- Learn about Cloud Scheduler and workflows.\n- Learn about Cloud Composer.\n- Learn about Cloud Run functions.\n- Explain the functionality and automation use cases for Eventarc.\nActivities:\n\n\n\n- Lab: Use Cloud Run Functions to Load BigQuery\nModule 07 - Introduction to data engineering\n\n\nTopics:\n\n\n\n- Data engineer\u2019s role\n- Data engineering challenges\n- Introduction to BigQuery\n- Data lakes and data warehouses\n- Transactional databases versus data warehouses\n- Effective partnership with other data teams\n- Management of data access and governance\n- Building of production-ready pipelines\n- Google Cloud customer case study\nObjectives:\n\n\n\n- Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.\n- Review and understand the purpose of a data lake versus a data warehouse, and when to use which.\nActivities:\n\n\n\n- Lab: Using BigQuery to Do Analysis\nModule 08 - Build a Data Lake\n\n\nTopics:\n\n\n\n- Introduction to data lakes\n- Data storage and ETL options on Google Cloud\n- Building of a data lake using Cloud Storage\n- Secure Cloud Storage\n- Store all sorts of data types\n- Cloud SQL as your OLTP system\nObjectives:\n\n\n\n- Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.\n- Explain how to use Cloud SQL for a relational data lake.\nActivities:\n\n\n\n- Lab: Loading Taxi Data into Cloud SQL\nModule 09 - Build a data warehouse\n\n\nTopics:\n\n\n\n- The modern data warehouse\n- Introduction to BigQuery\n- Get started with BigQuery\n- Loading of data into BigQuery\n- Exploration of schemas\n- Schema design\n- Nested and repeated fields\n- Optimization with partitioning and clustering\nObjectives:\n\n\n\n- Discuss requirements of a modern warehouse.\n- Explain why BigQuery is the scalable data warehousing solution on Google Cloud.\n- Discuss the core concepts of BigQuery and review options of loading data into BigQuery.\nActivities:\n\n\n\n- Lab: Working with JSON and Array Data in BigQuery\n- Lab: Partitioned Tables in BigQuery\nModule 10 - Introduction to building batch data pipelines\n\n\nTopics:\n\n\n\n- EL, ELT, ETL\n- Quality considerations\n- Ways of executing operations in BigQuery\n- Shortcomings\n- ETL to solve data quality issues\nObjectives:\n\n\n\n- Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.\nModule 11 - Execute Spark on Dataproc\n\n\nTopics:\n\n\n\n- The Hadoop ecosystem\n- Run Hadoop on Dataproc\n- Cloud Storage instead of HDFS\n- Optimize Dataproc\nObjectives:\n\n\n\n- Review the Hadoop ecosystem.\n- Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.\n- Explain when you would use Cloud Storage instead of HDFS storage.\n- Explain how to optimize Dataproc jobs.\nActivities:\n\n\n\n- Lab: Running Apache Spark Jobs on Dataproc\nModule 12 - Serverless data processing with Dataflow\n\n\nTopics:\n\n\n\n- Introduction to Dataflow\n- Reasons why customers value Dataflow\n- Dataflow pipelines\n- Aggregating with GroupByKey and Combine\n- Side inputs and windows\n- Dataflow templates\nObjectives:\n\n\n\n- Identify features customers value in Dataflow.\n- Discuss core concepts in Dataflow.\n- Review the use of Dataflow templates and SQL.\n- Write a simple Dataflow pipeline and run it both locally and on the cloud.\n- Identify Map and Reduce operations, execute the pipeline, and use command line parameters.\n- Read data from BigQuery into Dataflow and use the output of a pipeline as a side-input to another pipeline.\nActivities:\n\n\n\n- Lab: A Simple Dataflow Pipeline (Python\/Java)\n- Lab: MapReduce in Beam (Python\/Java)\n- Lab: Side Inputs (Python\/Java)\nModule 13 - Manage data pipelines with Cloud Data Fusion and Cloud Composer\n\n\nTopics:\n\n\n\n- Build batch data pipelines visually with Cloud Data Fusion\n- Components\n- UI overview\n- Building a pipeline\n- Exploring data using Wrangler\n- Orchestrate work between Google Cloud services with Cloud Composer\n- Apache Airflow environment\n- DAGs and operators\n- Workflow scheduling\n- Monitoring and logging\nObjectives:\n\n\n\n- Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.\n- Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.\n- Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.\nActivities:\n\n\n\n- Lab: Building and Executing a Pipeline Graph in Data Fusion\n- Lab: An Introduction to Cloud Composer\nModule 14 - Introduction to processing streaming data\n\n\nTopics:\n\n\n\n- Process streaming data\nObjectives:\n\n\n\n- Explain streaming data processing.\n- Identify the Google Cloud products and tools that can help address streaming data challenges.\nModule 15 - Serverless messaging with Pub\/Sub\n\n\nTopics:\n\n\n\n- Introduction to Pub\/Sub\n- Pub\/Sub push versus pull\n- Publishing with Pub\/Sub code\nObjectives:\n\n\n\n- Describe the Pub\/Sub service.\n- Explain how Pub\/Sub works.\n- Simulate real-time streaming sensor data using Pub\/Sub.\nActivities:\n\n\n\n- Lab: Publish Streaming Data into Pub\/Sub\nModule 16 - Dataflow streaming features\n\n\nTopics:\n\n\n\n- Steaming data challenges\n- Dataflow windowing\nObjectives:\n\n\n\n- Describe the Dataflow service.\n- Build a stream processing pipeline for live traffic data.\n- Demonstrate how to handle late data using watermarks, triggers, and accumulation.\nActivities:\n\n\n\n- Lab: Streaming Data Pipelines\nModule 17 - High-throughput BigQuery and Bigtable streaming features\n\n\nTopics:\n\n\n\n- Streaming into BigQuery and visualizing results\n- High-throughput streaming with Bigtable\n- Optimizing Bigtable performance\nObjectives:\n\n\n\n- Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.\n- Discuss Bigtable as a low-latency solution.\n- Describe how to architect for Bigtable and how to ingest data into Bigtable.\n- Highlight performance considerations for the relevant services.\nActivities:\n\n\n\n- Lab: Streaming Analytics and Dashboards\n- Lab: Generate Personalized Email Content with BigQuery Continuous Queries and Gemini\n- Lab: Streaming Data Pipelines into Bigtable\nModule 18 - Advanced BigQuery functionality and performance\n\n\nTopics:\n\n\n\n- Analytic window functions\n- GIS functions\n- Performance considerations\nObjectives:\n\n\n\n- Review some of BigQuery\u2019s advanced analysis capabilities.\n- Discuss ways to improve query performance.\nActivities:\n\n\n\n- Lab: Optimizing Your BigQuery Queries for Performance","summary_plain":"Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.","skill_level":"Intermediate","version":"3.0","duration":{"unit":"d","value":4,"formatted":"4 days"},"pricelist":{"List Price":{"IT":{"country":"IT","currency":"EUR","taxrate":20,"price":2600},"DE":{"country":"DE","currency":"EUR","taxrate":19,"price":2600},"NL":{"country":"NL","currency":"EUR","taxrate":21,"price":2695},"BE":{"country":"BE","currency":"EUR","taxrate":21,"price":2695},"AT":{"country":"AT","currency":"EUR","taxrate":20,"price":2600},"US":{"country":"US","currency":"USD","taxrate":null,"price":2495},"ES":{"country":"ES","currency":"EUR","taxrate":18,"price":1950},"SG":{"country":"SG","currency":"SGD","taxrate":8,"price":3450},"SE":{"country":"SE","currency":"EUR","taxrate":25,"price":2600},"AE":{"country":"AE","currency":"USD","taxrate":5,"price":2600},"CH":{"country":"CH","currency":"CHF","taxrate":8.1,"price":3380},"IN":{"country":"IN","currency":"USD","taxrate":12.36,"price":1500},"RU":{"country":"RU","currency":"RUB","taxrate":18,"price":221000},"IL":{"country":"IL","currency":"ILS","taxrate":17,"price":9020},"GR":{"country":"GR","currency":"EUR","taxrate":null,"price":1950},"MK":{"country":"MK","currency":"EUR","taxrate":null,"price":1950},"HU":{"country":"HU","currency":"EUR","taxrate":20,"price":1950},"SI":{"country":"SI","currency":"EUR","taxrate":20,"price":2600},"GB":{"country":"GB","currency":"GBP","taxrate":20,"price":2640},"CA":{"country":"CA","currency":"CAD","taxrate":null,"price":3445},"FR":{"country":"FR","currency":"EUR","taxrate":19.6,"price":2990}}},"lastchanged":"2025-11-18T18:18:14+01:00","parenturl":"https:\/\/portal.flane.ch\/swisscom\/en\/json-courses","nexturl_course_schedule":"https:\/\/portal.flane.ch\/swisscom\/en\/json-course-schedule\/18642","source_lang":"en","source":"https:\/\/portal.flane.ch\/swisscom\/en\/json-course\/google-degcp"}}