Skip to content
Time Series Analysis in SQL Server - Course Notes by Chinmay Garg

Time Series Analysis in SQL Server

1. Working with Date and Times

  1. Downsampling and Upsampling data → Decrease the frequency of a time series dataset (ms, s to min, hrs); increase the frequency of a time series dataset for an imbalanced dataset. (Covered in more detail later)
  2. DATETIME2 (SYSDATETIME(), SYSUTCDATETIME()) → more accurate than DATETIME
  3. DAY(), MONTH(), YEAR() - Break out a date into individual components
  4. DATEPART, DATENAME - Ex. Week of Year, ISO Week of Year (Mostly used in Europe)
  5. DATEADD, DATEDIFF - Used for DATE Math - Can be nested for precision
Spinner
DataFrameavailable as
df
variable
SELECT DATEADD(Hour, -3, DATEADD(DAY, -4, GETDATE())) AS [date_precise-4d-3h], 
	GETDATE() [current_date]
  1. Rounding Dates - Nest DATEDIFF and DATEADD - Can be used to calculate first day of the past n month(s)
Spinner
DataFrameavailable as
df1
variable
SELECT DATEADD(Day, DATEDIFF(Day, 0, GETDATE()), 0) [rounded_date(add total_days to 0th date)],
	DATEDIFF(Day, 0, GETDATE()) [total_days],
	GETDATE() [current_date], 
	DATEADD(Month, DATEDIFF(Month, 0, GETDATE()) - 1, 0) [fist_day_m-1]
  1. DATEDIFF() returns an integer type, so it can overflow if you try to round to seconds. It could overflow when rounding to minutes, but that will be after the year 5000
  2. CAST(): Since 2000, ANSI SQL Standard (All flavours of SQL), No control over formatting
  3. CONVERT(): Since 2000, Only for T-SQL, Some control over formatting using the style (Third) Parameter
  4. For interoperability, OBDC (20/120) & ISO8601 (126) style code are recommended (CONVERT())
  5. FORMAT(): Since 2012, Only for T-SQL, Uses .NET framework for conversion, Greater Flexibility over formatting from dates to strings, slow (~8s for 1M rows, compared to ~4s (CAST() & CONVERT()))
Spinner
DataFrameavailable as
df2
variable
DECLARE @d DATE = '11/22/2020';
SELECT FORMAT( @d, 'd', 'en-US' ) 'US English'  ,FORMAT( @d, 'D', 'en-gb' ) 'British English'  ,FORMAT( @d, 'd', 'de-de' ) 'German'  	,FORMAT( @d, 'D', 'zh-cn' ) 'Chinese Simplified (PRC)';  
Spinner
DataFrameavailable as
df3
variable
DECLARE @d DATE = GETDATE();  
SELECT FORMAT( @d, 'dd/MM/yyyy') AS 'Date'  
       ,FORMAT(123456789,'###-##-####') AS 'Custom Number';  
  1. Calendar Table: Stores date information for quick retrieval. Can be built once & ideally never updated. Should be treated as Warehouse.
  2. Contents (Invalid URL) of Calendar Table: General Columns (Date, Day, Month, Year, IsWeekend), Fiscal & Calendar Year Columns, Specialized Columns (Holiday, Lunar, ISO Week)
Spinner
DataFrameavailable as
df7
variable
-- Live Calendar Table with Recursion
WITH calendar AS (
    SELECT CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS DATE) [date]
    UNION ALL 
    SELECT DATEADD(DAY, 1, date) [date] FROM calendar WHERE date < DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()) + 1, 0)
)
SELECT date, MONTH(date) month, YEAR(date) year, DAY(date) day, EOMONTH(date) end_of_month, DATEPART(WEEKDAY, date) weekday, 
    DATEPART(WEEK, date) week,
CASE WHEN DATEPART(WEEKDAY, date) = 7 THEN 
        CASE WHEN (ROW_NUMBER() OVER (PARTITION BY DATEPART(WEEKDAY, date) ORDER BY date))%2 = 0 THEN 1 ELSE 0 END
    WHEN DATEPART(WEEKDAY, date) = 1 THEN 1 ELSE 0 END weekend, 
    DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date) saturday_offset,
    CEILING(CAST(DAY(DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date))/7 AS FLOAT)) sat_offset_weeknum
FROM calendar
Hidden output
  1. APPLY() -

2. Converting to dates and times

3. Aggregating Time Series data

  1. Basic Agg. Func. - COUNT()', COUNT_BIG() (64 - bit integerBIGINT), SUM(), MAX(), MIN()
  2. Cardinality: No. of unique values in a column (attribute) - checked by COUNT(DISTINCT
  3. Total no. of rows - COUNT(), non-NULL no. of rows - COUNT(column_name), COUNT(NULLIF(column_name, 'filter'))
  4. CASE WHEN can be nested inside aggregate functions to filter the aggregates
Spinner
DataFrameavailable as
df5
variable
-- ex
  1. Statistical Agg. Func. - AVG(), STDEV(), VAR(), STDEVP(), VARP() (Population standard deviation and variance)
  2. Median: PERCENTILE_CONT(0.5)
  3. PERCENTILE_CONT() - not recommended for large tables on a busy production server. Can be finicky about working with grouped data, so it may sometimes be easier to calculate the median and store it as a variable if you can.