Hacking Date Functions in SQLite
Some of the most useful functions in Postgres implementations of SQL (like Amazon Redshift)are
DATE_DIFFgives the amount of time that has elapsed between two different dates. For example, the following code would give the number of days between
DATE_DIFF('day', date1, date2)
DATE_DIFF is great for calculating the number of days from sign-up to cancellation or the number of hours from sign-in to sign-out.
DATE_TRUNCcuts off a date to the nearest day, week, month, or year. For example, the following code would give the nearest Monday, to the timestamp
DATE_TRUNC is great for aggregating data. For instance, we can use it to find the number of Monthly Active Users (MAU) by truncating to the nearest month start date.
But not every SQL implementation had these great functions. For our Learn SQL from Scratch code, we use SQLite, a light-weight implementation of SQL that can be run on a single Docker instance. SQLite is great for website backends and small projects, but it's missing my two favorite functions. Luckily, there are workarounds.
DATE_DIFF, we can use a little-known function called
juliandate. According to Wikipedia, "the Julian Day Number (JDN) is the integer assigned to a whole solar day in the Julian day count starting from noon Universal time, with Julian day number 0 assigned to the day starting at noon on Monday, January 1, 4713 BC". By converting a date to a floating point number, we can use subtraction to find the difference between two timestamps.
We can even convert our answer to hours by multiplying by
24 or to minutes by multiplying by
24 * 60.
We can emulate some of the functionality of
DATE_TRUNC by using
strftime. This function converts a timestamp to a string with a given output.
%dday of month: 00 %ffractional seconds: SS.SSS %Hhour: 00-24%jday of year: 001-366 %JJulian day number%mmonth: 01-12 %Mminute: 00-59 %sseconds since 1970-01-01 %Sseconds: 00-59 %wday of week 0-6 with Sunday==0 %Wweek of year: 00-53 %Yyear: 0000-9999
Normally, we might use this to convert between different timestamp formats like YYYY-MM-DD to MM-DD-YYYY:
But we can cleverly choose our formatting to truncate to the appropriate point. For example, if we want to truncate to the nearest month, we can:
Or we can truncate to the nearest week, by using:
With these two easy tricks, you can use SQLite for some of the same great analyses as Amazon Redshift!
If you would like to learn more about the basics of SQL, take DataCamp's Intro to SQL for Data Science course.