Skip to content

ETL MIGRATION & MODERNIZATION
TRAIN-THE-TRAINER WORKSHOP

This workshop contains a set of seven hands-on labs which complementes the Day One of the ETL Migration & Modernization - Train The Trainer Workshop.

Participants will build an end-to-end ETL Pipeline that will...[add details of the pipeline...]

After completing the labs, the participants will have a high level understand of AWS Glue core capabilities as well as they will be able to demonstrate most of the Glue core capabilities.

WORKSHOP PARTS & STEPS

Part 0 - PRE STEPS

     1. Setting up Cloud9 Environment Variables
     2. Switching Cloud9 Role Credentials
     3. Setting up Security Required Groups Inbound Rules
     4. Installing Required Libraries (Boto3)

Part 1 - TPCDS & RDS MySQL

     1. Preparing & Generating TPCDS Dataset
     2. Populating the Amazon RDS-MySQL Database with TPCDS Dataset
     3. Unloading Tables (in CSV) from RDS MySQL Database and Uploading to S3

Part 2 - AWS GLUE DISCOVERY COMPONENTS (Databases, Tables, Connections, Crawlers and Classifiers)

     1. Understanding the Glue Resources Provided (CloudFormation Resources)
     2. Testing and running pre-created Glue Resources (Connection & MySQL-RDS-Crawler)
     3. Creating new Glue Resources (Crawler & Classifier)

Part 3 - GLUE (STUDIO) STREAMING

     1. Understanding the Streaming Resources Provided (CloudFormation & Scripts)
     2. Validating Streaming Job Logic and Data (Glue Studio Dummy Job)
     3. Creating the Glue Streaming Job (Cloning Jobs!)

Part 4 - ORCHESTRATION & DATA ANALYSIS

     1. Understanding the Orchestration Flow
     2. Creating Glue Workflow and Glue Event Based Trigger (via CLI)
     3. Creating Event Bridge Rule and Target (via CLI)
     4. Triggering Orchestration & Following The Flow
     5. Exploring and Analyzing Table's Data Cataloged in Glue Data Catalog

Part 5 - MACHINE LEARNING WITH GLUE & GLUE STUDIO NOTEBOOKS

     0. (Pre Steps) - Understading & Setting up the Resources for the ML Lab
     1. Creating and Training the Glue FindMatches ML Transform
     2. Testing FindMatches Transform with Glue Studio Notebook
     3. Deploying & Running a FindMatches Glue Job

Part 6 - WORKFLOW ORCHESTRATION WITH AWS STEP FUNCTIONS

     1. Creating the Step Function Workflow
     2. Creating an EventBridge Rule & Target for the Step Function Workflow (via CLI)

Part 7 - DATA QUALITY & PREPARATION WITH AWS GLUE DATABREW

     1. Creating Datasets & Profiling the Data (with Quality Rules)
     2. Working with DataBrew Recipes & Projects