You have access to a database. Now what do you do? Building on your existing skills joining tables, using basic functions, grouping data, and using subqueries, the next step in your SQL journey is learning how to explore a database and the data in it. Using data from Stack Overflow, Fortune 500 companies, and 311 help requests from Evanston, IL, you'll get familiar with numeric, character, and date/time data types. You'll use functions to aggregate, summarize, and analyze data without leaving the database. Errors and inconsistencies in the data won't stop you! You'll learn common problems to look for and strategies to clean up messy data. By the end of this course, you'll be ready to start exploring your own PostgreSQL databases and analyzing the data in them.
What's in the database?Free
Start exploring a database by identifying the tables and the foreign keys that link them. Look for missing values, count the number of observations, and join tables to understand how they're related. Learn about coalescing and casting data along the way.What's in the database?50 xpExplore table sizes50 xpCount missing values100 xpJoin tables100 xpThe keys to the database50 xpForeign keys50 xpRead an entity relationship diagram100 xpCoalesce100 xpCoalesce with a self-join100 xpColumn types and constraints50 xpEffects of casting100 xpSummarize the distribution of numeric values100 xp
Summarizing and aggregating numeric data
You'll build on functions like min and max to summarize numeric data in new ways. Add average, variance, correlation, and percentile functions to your toolkit, and learn how to truncate and round numeric values too. Build complex queries and save your results by creating temporary tables.Numeric data types and summary functions50 xpDivision100 xpExplore with division100 xpSummarize numeric columns100 xpSummarize group statistics100 xpExploring distributions50 xpTruncate100 xpGenerate series100 xpMore summary functions50 xpCorrelation100 xpMean and Median100 xpCreating temporary tables50 xpCreate a temp table100 xpCreate a temp table to simplify a query100 xpInsert into a temp table100 xp
Exploring categorical data and unstructured text
Text, or character, data can get messy, but you'll learn how to deal with inconsistencies in case, spacing, and delimiters. Learn how to use a temporary table to recode messy categorical data to standardized values you can count and aggregate. Extract new variables from unstructured text as you explore help requests submitted to the city of Evanston, IL.Character data types and common issues50 xpCount the categories100 xpSpotting character data problems50 xpCases and spaces50 xpTrimming100 xpExploring unstructured text100 xpSplitting and concatenating text50 xpConcatenate strings100 xpSplit strings on a delimiter100 xpShorten long strings100 xpStrategies for multiple transformations50 xpCreate an "other" category50 xpGroup and recode values100 xpCreate a table with indicator variables100 xp
Working with dates and timestamps
What time is it? In this chapter, you'll learn how to find out. You'll aggregate date/time data by hour, day, month, or year and practice both constructing time series and finding gaps in them.Date/time types and formats50 xpISO 860150 xpDate comparisons100 xpDate arithmetic100 xpCompletion time by category100 xpDate/time components and aggregation50 xpDate parts100 xpVariation by day of week100 xpDate truncation100 xpAggregating with date/time series50 xpFind missing dates100 xpCustom aggregation periods100 xpMonthly average with missing dates100 xpTime between events50 xpLongest gap100 xpRats!100 xpWrap-up50 xp
DatasetsStack Overflow Question CountsFortune 500 CompaniesEvanston 311 Help RequestsCourse Database Creation CodeCourse Database Entity Relationship Diagram
PrerequisitesData Manipulation in SQL
Christina MaimoneSee More
Data Scientist, Northwestern University
Christina Maimone leads Research Data Services at Northwestern University with the IT Research Computing Services group. She enables innovative research by providing data science, programming, and software development support for researchers. Through consultations, project collaborations, user groups, and workshops, the Research Data Services team ensures researchers have the resources, services, and skills they need to overcome challenges in their work. Christina regularly uses R, Python, and SQL but enjoys the challenge of using a wide range of programs and languages in her work. She has a PhD in political science and an MS in statistics from Stanford.