Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Overview of Python and Scala

Core Concepts (Theory):

  • Architecture
  • RDDs
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering Basics in the Databricks Environment:

  • Exercises using the RDD API
  • Basic action and transformation functions
  • PairRDDs
  • Joins
  • Caching strategies
  • Exercises using the DataFrame API
  • Spark SQL
  • DataFrame operations: select, filter, group, and sort
  • UDFs (User-Defined Functions)
  • Introduction to the Dataset API
  • Streaming

Hands-on Workshop: Deployment in the AWS Environment:

  • Introduction to AWS Glue
  • Differences between AWS EMR and AWS Glue
  • Example jobs on both platforms
  • Advantages and disadvantages of each approach

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python or Scala)

Foundational knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories