Nick Ross – Doing Stuff

The notes represent a distinctly practical, hands-on approach to learning both pandas and SQL. While these two topics may be touched upon in traditional database courses (or introduction to data science courses), delving deeply into how to efficiently write modern code using these tools is often a lower priority compared to other topics. These notes approach learning the basics of data management from a different angle — you need to be comfortable with directly manipulating data before you can easily internalize additional concepts.

This course has been taught at undergraduate, graduate, and executive certificate levels in a variety of course structures.

This page contains the most recent version of those notes, currently taught to students in the MS-CAPP Program at the University of Chicago.

The data used in this course can be found in the repository here. Below is the table of contents with links to specific chapters. Please note that some sections are works in progress or TBD.

A combined PDF with all the notes is available here. Specific chapters are listed below, but please be aware that links in the chapter-specific PDFs are not currently functional.

Introduction and Errata		Introduction
Relational Databases
Rows and Columns	What is a Relational Database Selecting Columns WHERE: Filtering rows Null ORDER BY and LIMIT Column Numbering Where are we: A Note on Scope	Chapter 1
Basic Manipulations	Types Renaming a Column Basic Mathematical Manipulations, ABS and LEAST/GREATEST Queries without a FROM Clause and Singletons String Functions: LEFT, RIGHT, LOWER, UPPER, LENGTH, TRIM and CONCAT ROUND and Changing Types (CAST) CAST and changing types	Chapter 2
Subqueries, Distinct & Case	Query Evaluation Order: SELECT and WHERE Comparisons: BETWEEN, LIKE and ILIKE CASE: Conditional Logic The DISTINCT Operator Subqueries (IN, ANY, ALL) Correlated Subqueries	Chapter 3
Database Internals: Transactions	REDO / COMBINE NEXT SECTIONS Table Creation and Deletion Database Operations: CRUD Creating Tables, Constraints and Deleting tables Altering Tables Inserting, Copying, Updating and Deleting Transactions and ACID Isolation Levels in Relational Databases Why do we care (NoSQL)? NoSQL Transaction Implementations [TBD]	Chapter 4
Aggregations	Introduction to MTA data set GROUP BY clause Column numbering syntax Aggregates and CASE Statements Named Subqueries	Chapter 5
Dates and Types	Date Types Date Functions Hard GROUP BY problems	Chapter 6
Averages	The Trouble with Averages HAVING COALESCE and NVL	Chapter 7
Joins	Joins UNION and UNION ALL Best Practices when Combining Tables Intermediate Joins Statistical Analysis in SQL	Chapter 8
Advanced Joins	The Shape of Data Revenue over time & Advanced Joins	Chapter 9
Analytic Functions & CTE's	Analytic Functions Using Analytic Functions with Transaction Data Common Table Expressions (``CTE") CTEs with the transaction data	Chapter 10
Database Internals: Performance Evaluation	Normalization Views Information Schema Performance Considerations Index Distributed Systems and the CAP Theorem	Chapter 11
Extensions [TBD]	More Advanced Joins OLAP: Cube and Rollup Schemas Keys Data Exploration Strategies Query Strategies	Chapter 12
Interview Hints	Interview Hints Example Interview #1 Example Interview #2 Example Interview #3 Example Interview #4	Chapter 13
Pandas
Introduction	What is Pandas Data structures Selecting Columns and Rows Column Types Conversion Dealing with NaN Choosing the largest and smallest values Manipulating Data & Method Chaining Indexes: Creating and Dropping Views and Copies	Chapter 14
More Manipulations and Types	Sorting DataFrames Dealing with Duplicates Using Type specific functions CASE style statements and the ``isin" operator Regex Pattern Matching	Chapter 15
Aggregations	Introduction to the MTA dataset Simple Aggregations GroupBy Objects Advanced Index / Multiindex If not indexes... Indexing with aggregations, a big Gotcha	Chapter 16
Joins	Helpful Table / Review Merging data in Pandas Complex Join Conditions Stacking Data Lags and Leads Apply, map and applymap: Advanced Transformations	Chapter 17
Window Functions	Window Functions in Pandas Some gotchas Reshaping Data: Transpose, Stack and Unstack A Bunch of stuff to clean up Combining with the original DataFrame Moving the Window Pivot / Melt	Chapter 18
Appendix
Data Dictionaries	Introduction Iowa Fleet data NY MTA Data Daily Stock Data: s2010 and s2011 Annual Fundamental Financial information: fnd Soap Transaction Data	Appendix A
Connecting SQL to Python or R	Connecting to any database: ODBC and JDBC Connecting only to PostgreSQL	Appendix B
Assignments	HW 0A HW 0B HW 0C HW 1A HW 1B HW 2A HW 3A HW 3B HW 4A HW 4B HW 5A HW 5B HW 6A HW 6B HW 7A HW 7B HW 8AO HW 8A HW 8B BART Project HW 5AO
Example Exams	2023 CAPP Databases Final A 2023 CAPP Databases Final B 2023 CAPP Databases Midterm A 2023 CAPP Databases Midterm B 2017 SQL Final 2018 SQL Final 2019 Exams USF's student table FF Sales Example The Sales Rollup Sales Example I Sales Example II

Table of Contents