Introduction and Errata Introduction
Relational Databases
Rows and Columns
  • What is a Relational Database
  • Selecting Columns
  • WHERE: Filtering rows
  • Null
  • ORDER BY and LIMIT
  • Column Numbering
  • Where are we: A Note on Scope
Chapter 1
Basic Manipulations
  • Types
  • Renaming a Column
  • Basic Mathematical Manipulations, ABS and LEAST/GREATEST
  • Queries without a FROM Clause and Singletons
  • String Functions: LEFT, RIGHT, LOWER, UPPER, LENGTH, TRIM and CONCAT
  • ROUND and Changing Types (CAST)
  • CAST and changing types
Chapter 2
Subqueries, Distinct & Case
  • Query Evaluation Order: SELECT and WHERE
  • Comparisons: BETWEEN, LIKE and ILIKE
  • CASE: Conditional Logic
  • The DISTINCT Operator
  • Subqueries (IN, ANY, ALL)
  • Correlated Subqueries
Chapter 3
Database Internals: Transactions
  • REDO / COMBINE NEXT SECTIONS
  • Table Creation and Deletion
  • Database Operations: CRUD
  • Creating Tables, Constraints and Deleting tables
  • Altering Tables
  • Inserting, Copying, Updating and Deleting
  • Transactions and ACID
  • Isolation Levels in Relational Databases
  • Why do we care (NoSQL)?
  • NoSQL
  • Transaction Implementations [TBD]
Chapter 4
Aggregations
  • Introduction to MTA data set
  • GROUP BY clause
  • Column numbering syntax
  • Aggregates and CASE Statements
  • Named Subqueries
Chapter 5
Dates and Types
  • Date Types
  • Date Functions
  • Hard GROUP BY problems
Chapter 6
Averages
  • The Trouble with Averages
  • HAVING
  • COALESCE and NVL
Chapter 7
Joins
  • Joins
  • UNION and UNION ALL
  • Best Practices when Combining Tables
  • Intermediate Joins
  • Statistical Analysis in SQL
Chapter 8
Advanced Joins
  • The Shape of Data
  • Revenue over time & Advanced Joins
Chapter 9
Analytic Functions & CTE's
  • Analytic Functions
  • Using Analytic Functions with Transaction Data
  • Common Table Expressions (``CTE")
  • CTEs with the transaction data
Chapter 10
Database Internals: Performance Evaluation
  • Normalization
  • Views
  • Information Schema
  • Performance Considerations
  • Index
  • Distributed Systems and the CAP Theorem
Chapter 11
Extensions [TBD]
  • More Advanced Joins
  • OLAP: Cube and Rollup
  • Schemas
  • Keys
  • Data Exploration Strategies
  • Query Strategies
Chapter 12
Interview Hints
  • Interview Hints
  • Example Interview #1
  • Example Interview #2
  • Example Interview #3
  • Example Interview #4
Chapter 13
Pandas
Introduction
  • What is Pandas
  • Data structures
  • Selecting Columns and Rows
  • Column Types Conversion
  • Dealing with NaN
  • Choosing the largest and smallest values
  • Manipulating Data & Method Chaining
  • Indexes: Creating and Dropping
  • Views and Copies
Chapter 14
More Manipulations and Types
  • Sorting DataFrames
  • Dealing with Duplicates
  • Using Type specific functions
  • CASE style statements and the ``isin" operator
  • Regex Pattern Matching
Chapter 15
Aggregations
  • Introduction to the MTA dataset
  • Simple Aggregations
  • GroupBy Objects
  • Advanced Index / Multiindex
  • If not indexes...
  • Indexing with aggregations, a big Gotcha
Chapter 16
Joins
  • Helpful Table / Review
  • Merging data in Pandas
  • Complex Join Conditions
  • Stacking Data
  • Lags and Leads
  • Apply, map and applymap: Advanced Transformations
Chapter 17
Window Functions
  • Window Functions in Pandas
  • Some gotchas
  • Reshaping Data: Transpose, Stack and Unstack
  • A Bunch of stuff to clean up
  • Combining with the original DataFrame
  • Moving the Window
  • Pivot / Melt
Chapter 18
Appendix
Data Dictionaries
  • Introduction
  • Iowa Fleet data
  • NY MTA Data
  • Daily Stock Data: s2010 and s2011
  • Annual Fundamental Financial information: fnd
  • Soap Transaction Data
Appendix A
Connecting SQL to Python or R
  • Connecting to any database: ODBC and JDBC
  • Connecting only to PostgreSQL
Appendix B
Assignments
Example Exams