Introduction and Errata |
| Introduction |
Relational Databases |
Rows and Columns |
- What is a Relational Database
- Selecting Columns
- WHERE: Filtering rows
- Null
- ORDER BY and LIMIT
- Column Numbering
- Where are we: A Note on Scope
| Chapter 1 |
Basic Manipulations |
- Types
- Renaming a Column
- Basic Mathematical Manipulations, ABS and LEAST/GREATEST
- Queries without a FROM Clause and Singletons
- String Functions: LEFT, RIGHT, LOWER, UPPER, LENGTH, TRIM and CONCAT
- ROUND and Changing Types (CAST)
- CAST and changing types
| Chapter 2 |
Subqueries, Distinct & Case |
- Query Evaluation Order: SELECT and WHERE
- Comparisons: BETWEEN, LIKE and ILIKE
- CASE: Conditional Logic
- The DISTINCT Operator
- Subqueries (IN, ANY, ALL)
- Correlated Subqueries
| Chapter 3 |
Database Internals: Transactions |
- REDO / COMBINE NEXT SECTIONS
- Table Creation and Deletion
- Database Operations: CRUD
- Creating Tables, Constraints and Deleting tables
- Altering Tables
- Inserting, Copying, Updating and Deleting
- Transactions and ACID
- Isolation Levels in Relational Databases
- Why do we care (NoSQL)?
- NoSQL
- Transaction Implementations [TBD]
| Chapter 4 |
Aggregations |
- Introduction to MTA data set
- GROUP BY clause
- Column numbering syntax
- Aggregates and CASE Statements
- Named Subqueries
| Chapter 5 |
Dates and Types |
- Date Types
- Date Functions
- Hard GROUP BY problems
| Chapter 6 |
Averages |
- The Trouble with Averages
- HAVING
- COALESCE and NVL
| Chapter 7 |
Joins |
- Joins
- UNION and UNION ALL
- Best Practices when Combining Tables
- Intermediate Joins
- Statistical Analysis in SQL
| Chapter 8 |
Advanced Joins |
- The Shape of Data
- Revenue over time & Advanced Joins
| Chapter 9 |
Analytic Functions & CTE's |
- Analytic Functions
- Using Analytic Functions with Transaction Data
- Common Table Expressions (``CTE")
- CTEs with the transaction data
| Chapter 10 |
Database Internals: Performance Evaluation |
- Normalization
- Views
- Information Schema
- Performance Considerations
- Index
- Distributed Systems and the CAP Theorem
| Chapter 11 |
Extensions [TBD] |
- More Advanced Joins
- OLAP: Cube and Rollup
- Schemas
- Keys
- Data Exploration Strategies
- Query Strategies
| Chapter 12 |
Interview Hints |
- Interview Hints
- Example Interview #1
- Example Interview #2
- Example Interview #3
- Example Interview #4
| Chapter 13 |
Pandas |
Introduction |
- What is Pandas
- Data structures
- Selecting Columns and Rows
- Column Types Conversion
- Dealing with NaN
- Choosing the largest and smallest values
- Manipulating Data & Method Chaining
- Indexes: Creating and Dropping
- Views and Copies
| Chapter 14 |
More Manipulations and Types |
- Sorting DataFrames
- Dealing with Duplicates
- Using Type specific functions
- CASE style statements and the ``isin" operator
- Regex Pattern Matching
| Chapter 15 |
Aggregations |
- Introduction to the MTA dataset
- Simple Aggregations
- GroupBy Objects
- Advanced Index / Multiindex
- If not indexes...
- Indexing with aggregations, a big Gotcha
| Chapter 16 |
Joins |
- Helpful Table / Review
- Merging data in Pandas
- Complex Join Conditions
- Stacking Data
- Lags and Leads
- Apply, map and applymap: Advanced Transformations
| Chapter 17 |
Window Functions |
- Window Functions in Pandas
- Some gotchas
- Reshaping Data: Transpose, Stack and Unstack
- A Bunch of stuff to clean up
- Combining with the original DataFrame
- Moving the Window
- Pivot / Melt
| Chapter 18 |
Appendix |
Data Dictionaries |
- Introduction
- Iowa Fleet data
- NY MTA Data
- Daily Stock Data: s2010 and s2011
- Annual Fundamental Financial information: fnd
- Soap Transaction Data
| Appendix A |
Connecting SQL to Python or R |
- Connecting to any database: ODBC and JDBC
- Connecting only to PostgreSQL
| Appendix B |
Assignments |
| |
Example Exams |
| |