As part of this course, you will learn all the Data Engineering using cloud platform-agnostic technology called Databricks. About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETLDevelopment, Data Warehouse Development, etc. About DatabricksDatabricks is the most popular cloud platform-agnostic data engineering tech stack. They are the committers of the Apache Spark project. Databricks run time provide Spark leveraging the elasticity of the cloud. With Databricks, you pay for what you use. Over a period of time, they came up with the idea of Lakehouse by providing all the features that are required for traditional BIas well as AI & ML. Here are some of the core features of Databricks. Spark - Distributed ComputingDelta Lake - Perform CRUD Operations. It is primarily used to build capabilities such as inserting, updating, and deleting the data from files in Data Lake. cloudFiles - Get the files in an incremental fashion in the most efficient way leveraging cloud features. Databricks SQL - A Photon-based interface that is fine-tuned for running queries submitted for reporting and visualization by reporting tools. It is also used for Ad-hoc Analysis. Course DetailsAs part of this course, you will be learning Data Engineering using Databricks. Getting Started with DatabricksSetup Local Development Environment to develop Data Engineering Applications using DatabricksUsing Databricks CLI to manage files, jobs, clusters, etc related to Data Engineering ApplicationsSpark Application Development Cycle to build Data Engineering ApplicationsDatabricks Jobs and ClustersDeploy and Run Data Engineering Jobs on Databricks Job Clusters as Python ApplicationDeploy and Run Data Engineering Jobs on Databricks Job Clusters using NotebooksDeep Dive into Delta Lake using Dataframes on Databricks PlatformDeep Dive into Delta Lake using Spark SQL on Databricks PlatformBuilding Data Engineering Pipelines using Spark Structured Streaming on Databricks ClustersIncremental File Processing using Spark Structured Streaming leveraging Databricks Auto Loader cloudFilesOverview of AutoLoader cloudFiles File Discovery Modes - Directory Listing and File NotificationsDifferences between Auto Loader cloudFiles File Discovery Modes - Directory Listing and File NotificationsDifferences between traditional Spark Structured Streaming and leveraging Databricks Auto Loader cloudFiles for incremental file processing. Overview of Databricks SQLfor Data Analysis and reporting. We will be adding a few more modules related to Pyspark, Spark with Scala, Spark SQL, and Streaming Pipelines in the coming weeks. Desired AudienceHere is the desired audience for this advanced course. Experienced application developers to gain expertise related to Data Engineering with prior knowledge and experience of Spark. Experienced Data Engineers to gain enough skills to add Databricks to their profile. Testers to improve their testing capabilities related to Data Engineering applications using Databricks. PrerequisitesLogisticsComputer with decent configuration (At least 4 GB RAM, however 8 GBis highly desired)Dual Core is required and Quad-Core is highly desiredChrome BrowserHigh-Speed InternetValid AWS AccountValid Databricks Account (free Databricks Account is not sufficient)Experience as Data Engineer especially using Apache SparkKnowledge about some of the cloudconcepts such as storage, users, roles, etc. Associated CostsAs part of the training, you will only get the material. You need to practice on your own or corporate cloud account and Databricks Account. You need to take care of the associated AWS or Azure costs. You need to take care of the associated Databricks costs. Training ApproachHere are the details related to the training approach. It is self-paced with reference material, code snippets, and videos provided as part of Udemy. One needs to sign up for their own Databricks environment to practice all the core features of Databricks. We would recommend completing 2 modules every week by spending 4 to 5 hours per week. It is highly recommended to take care of all the tasks so that one can get real experience of Databricks. Support will be provided through Udemy Q & A.Here is the detailed course outline. Getting Started with Databricks on AzureAs part of this section, we will go through the details about signing up to Azure and setup the Databricks cluster on Azure. Getting Started with Databricks on AzureSignup for the Azure AccountLogin and Increase Quotas for regional vCPUs in AzureCreate Azure Databricks WorkspaceLaunching Azure Databricks Workspace or ClusterQuick Walkthrough of Azure Databricks UICreate Azure Databricks Single Node ClusterUpload Data using Azure Databricks UIOverview of Creating Notebook and Validating Files using Azure Databri

Data Engineering using Databricks on AWS and Azure

Recommended products