Data Warehousing
1. Normalizing
We’re going to talk about data. We’ll start by discussing operational databases and normalization. Assignment!
2. Creating Tables in PostgreSQL
This tutorial will walk you through the basics of working with PostgreSQL. Tutorial!
3. Python and Postgres
This assignment will help you through the basics of working with PostgreSQL from python. There is a solutions branch! Assignment!
4. Warehousing
We now understand how to work with databases and create normalized data. Now we’re going to denormalize our data, introducing concepts of data warehouse design, and do an assignment where we create an ETL process to take our data from a normalized database table to an analytics-style database table. Assignment!
5. ETL and Data Lakes
There’s not assignment for this module. It’s just a lecture to introduce you to the concepts of data lakes!
6. Dockerizing
We’re going to learn about Docker and use it to Dockerize our ETL app! The assignment will walk you through everything and also give you more practice with SQL. There is a solutions branch. Assignment!
7. Big Data & Spark
We’re going to practice using Spark! There is a repository with some tutorials and an assignment. Note that the assignment requires access to some data that I should post somewhere but haven’t gotten around to yet. It’s nothing more than a bunch of tweets in json-lines files. Tutorials/Assignment
NOTE: The sketchboard PDFs from the lectures can be found in the Github directory: github.com/nandans-summer-camp/sketchboard