SkillPractical Google Cloud Professional Data Engineer Certification Test is for data scientists, solution architects, DevOps engineers, and anyone wanting to move into machine learning and data engineering in the context of Google. Students will need to have some familiarity with the basics of GCP, such as storage, compute, and security; some basic coding skills (like Python); and a good understanding of databases. You do not need to have a background in data engineering or machine learning, but some experience with GCP is essential. This is an advanced certification and we strongly recommend that students take the SkillPractical Google Certified Associate Cloud Engineer exam before. FYI, 87% of Google Cloud certified users feel more confident in their cloud skills. Course Learning ObjectivesDesign a data processing systemBuild and maintain data structures and databasesAnalyze data and enable machine learningOptimize data representations, data infrastructure performance, and costEnsure reliability of data processing infrastructureVisualize dataDesign secure data processing systemsCourse syllabus description:1. Designing data processing systems1.1 Selecting the appropriate storage technologies. Considerations include: Mapping storage systems to business requirementsData modelingTradeoffs involving latency, throughput, transactionsDistributed systemsSchema design1.2 Designing data pipelines. Considerations include: Data publishing and visualization (e.g, BigQuery)Batch and streaming data (e.g, Cloud Dataflow, Cloud Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Cloud Pub/Sub, Apache Kafka)Online (interactive) vs. batch predictionsJob automation and orchestration (e.g, Cloud Composer)1.3 Designing a data processing solution. Considerations include: Choice of infrastructureSystem availability and fault toleranceUse of distributed systemsCapacity planningHybrid cloud and edge computingArchitecture options (e.g, message brokers, message queues, middleware, service-oriented architecture, serverless functions)At least once, in-order, and exactly once, etc, event processing1.4 Migrating data warehousing and data processing. Considerations include: Awareness of current state and how to migrate a design to a future stateMigrating from on-premises to cloud (Data Transfer Service, Transfer Appliance, Cloud Networking)Validating a migration2. Building and operationalizing data processing systems2.1 Building and operationalizing storage systems. Considerations include: Effective use of managed services (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Cloud Datastore, Cloud Memorystore)Storage costs and performanceLifecycle management of data2.2 Building and operationalizing pipelines. Considerations include: Data cleansingBatch and streamingTransformationData acquisition and importIntegrating with new data sources2.3 Building and operationalizing processing infrastructure. Considerations include: Provisioning resourcesMonitoring pipelinesAdjusting pipelinesTesting and quality control3. Operationalizing machine learning models3.1 Leveraging pre-built ML models as a service. Considerations include: ML APIs (e.g, Vision API, Speech API)Customizing ML APIs (e.g, AutoML Vision, Auto ML text)Conversational experiences (e.g, Dialogflow)3.2 Deploying an ML pipeline. Considerations include: Ingesting appropriate dataRetraining of machine learning models (Cloud Machine Learning Engine, BigQuery ML, Kubeflow, Spark ML)Continuous evaluation3.3 Choosing the appropriate training and serving infrastructure. Considerations include: Distributed vs. single machineUse of edge computeHardware accelerators (e.g, GPU, TPU)3.4 Measuring, monitoring, and troubleshooting machine learning models. Considerations include: Machine learning terminology (e.g, features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)Impact of dependencies of machine learning modelsCommon sources of error (e.g, assumptions about data)4. Ensuring solution quality4.1 Designing for security and compliance. Considerations include: Identity and access management (e.g, Cloud IAM)Data security (encryption, key management)Ensuring privacy (e.g, Data Loss Prevention API)Legal compliance (e.g, Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), FedRAMP, General Data Protection Regulation (GDPR))4.2 Ensuring scalability and efficiency. Considerations include: Building and running test suitesPipeline monitoring (e.g, Stackdriver)Assessing, troubleshooting, and improving data representations and data processing infrastructureResizing and autoscaling resources4.3 Ensuring reliability and fidelity. Considerations include: Performing data preparation and quality control (e.g, Cloud Dataprep)Verification and monitoringPlanning, executing, and stress testing data recovery (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)Choosing between ACID, idempotent, eventually consiste