From 0 to 1: Pig For Wrangling Big Data

Extract, Transform and Load data using Pig to harness the power of Hadoop.

Data PRO Plan Pay Per Course

Data PRO or Pay-Per-Course

Pick a plan that right's for you!

Pay per course or a flat monthly subscription with no contracts. Become a Data PRO and get access to all data engineering, analytics and data science courses.

Explore Data PRO

Course curriculum

1

You, This Course and Us
2

Where does Pig fit in?
3

Pig Basics
4

Pig Operations And Data Transformations
5

Advanced Data Transformations
6

Optimizing Data Transformations
7

A real-world example
8

Installing Hadoop in a Local Environment
9

Appendix

You, This Course and Us
Pig and the Hadoop ecosystem Install and set up How does Pig compare with Hive? Pig Latin as a data flow language Pig with HBase Downloads
Operating modes, running a Pig script, the Grunt shell Loading data and creating our first relation Scalar data types Complex data types - The Tuple, Bag and Map Partial schema specification for relations Displaying and storing relations - The dump and store commands Downloads
Selecting fields from a relation Built-in functions Evaluation functions Using the distinct, limit and order by keywords Filtering records based on a predicate Downloads
Group by and aggregate transformations Combining datasets using Join Concatenating datasets using Union Generating multiple records by flattening complex fields Using Co-Group, Semi-Join and Sampling records The nested Foreach command Debug Pig scripts using Explain and Illustrate Downloads
Parallelize operations using the Parallel keyword Join Optimizations: Multiple relations join, large and small relation join Join Optimizations: Skew join and sort-merge join Common sense optimizations Downloads
Parsing server logs Summarizing error logs Downloads
Hadoop Install Modes Hadoop Standalone mode Install Hadoop Pseudo-Distributed mode Install Downloads
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables Setup a Virtual Linux Instance (For Windows users) Downloads

Course Description

What will I learn?

Work with unstructured data to extract information, transform it and store it in a usable form
Write intermediate level Pig scripts to munge data
Optimize Pig operations which work on large data sets

About the course

This is taught by a team which includes 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing jobs.

Pig is aptly named, it is omnivorous, will consume any data that you throw at it and bring home the bacon!

Let's parse that

omnivorous: Pig works with unstructured data. It has many operations which are very SQL-like but Pig can perform these operations on data sets which have no fixed schema. Pig is great at wrestling data into a form which is clean and can be stored in a data warehouse for reporting and analysis.

bring home the bacon: Pig allows you to transform data in a way that makes is structured, predictable and useful, ready for consumption.

What's Covered

Pig Basics: Scalar and Complex data types (Bags, Maps, Tuples), basic transformations such as Filter, Foreach, Load, Dump, Store, Distinct, Limit, Order by and other built-in functions.

Advanced Data Transformations and Optimizations: The mind-bending Nested Foreach, Joins and their optimizations using "parallel", "merge", "replicated" and other keywords, Co-groups and Semi-joins, debugging using Explain and Illustrate commands

Real-world example: Clean up server logs using Pig

Who should take the course?

Yep! Analysts who want to wrangle large, unstructured data into shape
Yep! Engineers who want to parse and extract useful information from large datasets

Pre-requisites & Requirements

Working with Pig requires some basic knowledge of the SQL query language, a brief understanding of the Hadoop eco-system and MapReduce
A basic understanding of SQL and working with data
A basic understanding of the Hadoop eco-system and MapReduce tasks

Become a Data PRO

Subscribe to Data PRO plan and get access to all the courses.

Get started now