This course prepares participants to become Data Engineers and set them on the path to become Data Scientists. Participants must have a basic knowledge of Python, SQL, HTML and JavaScript.
Course Objectives
Upon completion of this course, participants will be familiarized with all the relevant components of Big Data. Participants will also be able to perform data analysis, execute complex data management, build apps, and create visualizations on top of data.
Course Outline
Introduction to Big Data
ETL/ELT Best Practices
Managing Metadata
Consolidating Multiple Data Sources
Data Cleansing and Transformation
Scheduling Data Refresh
Introduction to Hadoop
Ingesting Flat Files into Hadoop
Integration between RDBMS and Hadoop
Data Processing using Hive
Interactive Query using Impala
Processing Log Files
Collecting External Data from the Internet
Introduction to Spark
Processing Data using Spark
Querying Data in Spark
Job Orchestration and Workflow using Oozie
Troubleshooting an ETL Job
Performance Optimization
Developing Data Visualizations using D3.js
Location:
Level 12, Tower A, Plaza 33, 1 Jalan Kemajuan, Section 13, 46200 Petaling Jaya, Selangor, Malaysia