Training Information
Azure Databricks with Pyspark
We are pleased to offer a comprehensive suite of training solutions tailored to meet your needs. Our services encompass both online and offline corporate training options, ensuring flexibility and accessibility for your team's professional development.
Course Content
ADB with PYSPARK
Module 1: Cloud Computing Concepts
What is the "Cloud" ?
Why cloud services
Types of cloud models
Deployment Models
private Cloud deployment model
public Cloud deployment model
hybrid cloud deployment model
Microsoft Azure,
Amazon Web Services,
Google Cloud Platform
characteristics of cloud computing
On-demand self-service
Broad network access
Multi-tenancy and resource pooling
Rapid elasticity and scalability
Measured service
Cloud Data Warehouse Architecture
Shared Memory architecture
Shared Disk architecture
Shared Nothing architecture
Module 2: Core Azure Services
Core Azure Architectural components
Core Azure Services and Products
Azure solutions
Azure management tools
Module 3: Security, Privacy, Compliance
Securing network connectivity
Core Azure identity services
Security tools and features
Azure Governance methodologies
Monitoring and reportings
Privacy, compliance, and data protection standards
Module 4: Azure Pricing and Support
Azure subscriptions
Planning and managing costs
Azure support options
Azure Service Level Agreements (SLAs)
Service Lifecycle in Azure
Module 5: Introduction to Azure Databricks
Introduction to Databricks
Azure Databricks Architecture
Azure Databricks Main Concepts
Module 6: Azure Databricks Account Creation
Azure Free Account
Free Subscription for Azure Databricks
Create Databricks Community Edition Account
Module 7: Databricks Cluster Types and Notebook Options
Creating and configuring clusters
create Notebook
quick tour on notebook options
Module 8: Databricks Utilities and Notebook Parameters
Dbutils commands on files, directories
Notebooks and libraries
Databricks Variables
Widget Types
Databricks notebook parameters
Module 9: Databricks CLI
Azure Databricks CLI Installation
Databricks CLI - DBFS, Libraries and Jobs
Module 10: Databricks Integration with Azure Blob Storage
Read data from Blob Storage and Creating Blob mount point
Module 11: Databricks Integration with Azure Data Lake Storage Gen2
Reading files from Azure Data Lake Storage Gen2
Module 12: Databricks Integration with Azure Data Lake Storage Gen1
Reading Files from data lake storage Gen1
Module 13: Reading and Writing CSV files in Databricks
Read CSV Files
Read TSV Files and PIPE Seperated CSV Files
Read CSV Files with multiple delimiter in spark 2 and spark 3
Reading different position Multidelimiter CSV files
Module 14: Reading and Writing Parquet files in Databricks
Read Parquet files from Data Lake Storage Gen2
Reading and Creating Partition files in Spark
Module 16: Parsing Complex Json FilesL
Reading and Writing JSON Files
Reading, Transforming and Writing Complex JSON files
Module 17: Reading and Writing ORC and Avro Files
Reading and Writing ORC and Avro Files
Module 19: Databricks Integration with Azure Synapse
Reading and Writing Azure Synapse data from Azure Databricks
Module 20: Databricks Integration with Amazon Redshift(Redshift)
Read and Write data from Redshift using databricks
Module 21: Databricks Integration with Snowflake
Reading and Writing data from Snowflake
Module 22: Databricks Integration with CosmosDB SQL API
Reading and Writing data from Azure CosmosDB Account
Module 23: Python Introduction
Python Introduction
Installation and setup
Python Data Types for Azure Databricks
Module 24: Python Data Types
Deep dive into String Data Types in Python for Azure Databricks
Deep dive into python collection list and tuple
Deep dive on set and dict data types in python
Module 25: Python Functions and Arguments
Python Functions and Arguments
Lambda Functions
Module 26: Python Modules and Packages
Python Modules and Packages
Module 27: Python Flow Control
Python Flow Control
For-Each
While
Module 28: Python File Handling
Python File Handling
Module 29: Python Logging Module
Python Logging Module
Module 30: Python Exception Handling
Python Exception Handlings
Module 31: Pyspark Introduction
Pyspark Introduction
Pyspark Components and Features
Module 32: Spark Architecture and Internals
Apache Spark Internal architecture
jobs stages and tasks
Spark Cluster Architecture Explained
Module 33: Spark RDD
Different Ways to create RDD in Databricks
Spark Lazy Evaluation Internals & Word Count Program
RDD Transformations in Databricks & coalesce vs repartition
RDD Transformation and Use Cases
Module 34: Spark SQL
Spark SQL Introduction
Different ways to create DataFrames
Module 35: Spark SQL Intenals
Catalyst Optimizer and Spark SQL Execution Plan
Deep dive on Sparksession vs sparkcontext
spark SQL Basics part-1
RDD Transformation and Use Cases
Module 36: Spark SQL Basics
Spark SQL Basics Part-2
Joins in Spark SQL
Module 37: Spark SQL Functions and UDFs
Spark SQL Functions part-1
Spark SQL Functions part-2
Spark SQL Functions Part-3
Spark SQL UDFs
Spark SQL Temp tables and Joins
Module 38: Databricks Delta and Implementing Dimensions SCD1 and SCD2
Implementing SCD Type1 and Apache Spark Databricks Delta
Delta Lake in Azure Databricks
Implementing SCD Type with and without Databricks Delta
Module 39: Databricks Integration with Azure Data Factory
Azure Data Factory Integration with Azure Databricks
Module 40: Databricks Streaming
Delta Streaming in Azure Databricks
Data Ingestion with Auto Loader in Azure Databricks
Module 41: Azure Databricks Projects
Azure Databricks Project-1
Azure Databricks Project-2
Module 42: Databricks Integration with Azure Devops
Azure Databricks CICD Pipelines