- Course overview
- Course details
- Prerequisites
Course overview
About this course
This course offers an in-depth exploration into constructing robust data analytics pipelines on the AWS platform. It equips learners with the skills to leverage AWS services for high-performance analytics, focusing on Batch data processing using tools like Amazon EMR and Apache Spark.
Audience profile
- Data platform engineers
- Architects and operators who build and manage data analytics pipelines
Course details
Module A: Overview of Data Analytics and the Data Pipeline
• Data analytics use cases
• Using the data pipeline for analytics
Module 1: Introduction to Amazon EMR
• Using Amazon EMR in analytics solutions
• Amazon EMR cluster architecture
• Interactive Demo 1: Launching an Amazon EMR cluster
• Cost management strategies
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
• Storage optimization with Amazon EMR
• Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
• Apache Spark on Amazon EMR use cases
• Why Apache Spark on Amazon EMR
• Spark concepts
• Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the
Spark shell
• Transformation, processing, and analytics
• Using notebooks with Amazon EMR
• Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
• Using Amazon EMR with Hive to process batch data
• Transformation, processing, and analytics
• Practice Lab 2: Batch data processing using Amazon EMR with Hive
• Introduction to Apache HBase on Amazon EMR
Module 5: Serverless Data Processing
• Serverless data processing, transformation, and analytics
• Using AWS Glue with Amazon EMR workloads
• Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
• Securing EMR clusters
• Interactive Demo 3: Client-side encryption with EMRFS
• Monitoring and troubleshooting Amazon EMR clusters
• Demo: Reviewing Apache Spark cluster history
Module 7: Designing Batch Data Analytics Solutions
• Batch data analytics use cases
• Activity: Designing a batch data analytics workflow
Module B: Developing Modern Data Architectures on AWS
• Modern data architectures
Prerequisites
Minimum of 1 year experience managing open-source data frameworks such
Apache Spark, Apache Hadoop, Hive and so on.
Enquiry
Course : Building Batch Data Analytics Solutions on AWS
Enquiry
request for : Building Batch Data Analytics Solutions on AWS