Skip to content
Time Series Analysis in SQL Server - Course Notes by Chinmay Garg
Time Series Analysis in SQL Server
1. Working with Date and Times
- Downsampling and Upsampling data → Decrease the frequency of a time series dataset (ms, s to min, hrs); increase the frequency of a time series dataset for an imbalanced dataset. (Covered in more detail later)
DATETIME2 (SYSDATETIME(), SYSUTCDATETIME())→ more accurate thanDATETIMEDAY(), MONTH(), YEAR()- Break out a date into individual componentsDATEPART, DATENAME- Ex. Week of Year, ISO Week of Year (Mostly used in Europe)DATEADD,DATEDIFF- Used for DATE Math - Can be nested for precision
DataFrameas
df
variable
SELECT DATEADD(Hour, -3, DATEADD(DAY, -4, GETDATE())) AS [date_precise-4d-3h],
GETDATE() [current_date]- Rounding Dates - Nest
DATEDIFFandDATEADD- Can be used to calculate first day of the past n month(s)
DataFrameas
df1
variable
SELECT DATEADD(Day, DATEDIFF(Day, 0, GETDATE()), 0) [rounded_date(add total_days to 0th date)],
DATEDIFF(Day, 0, GETDATE()) [total_days],
GETDATE() [current_date],
DATEADD(Month, DATEDIFF(Month, 0, GETDATE()) - 1, 0) [fist_day_m-1]DATEDIFF()returns an integer type, so it can overflow if you try to round to seconds. It could overflow when rounding to minutes, but that will be after the year 5000CAST(): Since 2000, ANSI SQL Standard (All flavours of SQL), No control over formattingCONVERT(): Since 2000, Only for T-SQL, Some control over formatting using the style (Third) Parameter- For interoperability, OBDC (
20/120) & ISO8601 (126) style code are recommended (CONVERT()) FORMAT(): Since 2012, Only for T-SQL, Uses .NET framework for conversion, Greater Flexibility over formatting from dates to strings, slow (~8s for 1M rows, compared to ~4s (CAST() & CONVERT()))
DataFrameas
df2
variable
DECLARE @d DATE = '11/22/2020';
SELECT FORMAT( @d, 'd', 'en-US' ) 'US English' ,FORMAT( @d, 'D', 'en-gb' ) 'British English' ,FORMAT( @d, 'd', 'de-de' ) 'German' ,FORMAT( @d, 'D', 'zh-cn' ) 'Chinese Simplified (PRC)'; DataFrameas
df3
variable
DECLARE @d DATE = GETDATE();
SELECT FORMAT( @d, 'dd/MM/yyyy') AS 'Date'
,FORMAT(123456789,'###-##-####') AS 'Custom Number'; - Calendar Table: Stores date information for quick retrieval. Can be built once & ideally never updated. Should be treated as Warehouse.
- Contents (Invalid URL) of Calendar Table: General Columns (Date, Day, Month, Year, IsWeekend), Fiscal & Calendar Year Columns, Specialized Columns (Holiday, Lunar, ISO Week)
DataFrameas
df7
variable
-- Live Calendar Table with Recursion
WITH calendar AS (
SELECT CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS DATE) [date]
UNION ALL
SELECT DATEADD(DAY, 1, date) [date] FROM calendar WHERE date < DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()) + 1, 0)
)
SELECT date, MONTH(date) month, YEAR(date) year, DAY(date) day, EOMONTH(date) end_of_month, DATEPART(WEEKDAY, date) weekday,
DATEPART(WEEK, date) week,
CASE WHEN DATEPART(WEEKDAY, date) = 7 THEN
CASE WHEN (ROW_NUMBER() OVER (PARTITION BY DATEPART(WEEKDAY, date) ORDER BY date))%2 = 0 THEN 1 ELSE 0 END
WHEN DATEPART(WEEKDAY, date) = 1 THEN 1 ELSE 0 END weekend,
DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date) saturday_offset,
CEILING(CAST(DAY(DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date))/7 AS FLOAT)) sat_offset_weeknum
FROM calendarHidden output
APPLY()-
2. Converting to dates and times
3. Aggregating Time Series data
- Basic Agg. Func. -
COUNT()',COUNT_BIG()(64 - bit integerBIGINT),SUM(),MAX(),MIN() - Cardinality: No. of unique values in a column (attribute) - checked by
COUNT(DISTINCT - Total no. of rows -
COUNT(), non-NULLno. of rows -COUNT(column_name), COUNT(NULLIF(column_name, 'filter')) CASE WHENcan be nested inside aggregate functions to filter the aggregates
DataFrameas
df5
variable
-- ex- Statistical Agg. Func. -
AVG(),STDEV(),VAR(),STDEVP(),VARP()(Population standard deviation and variance) - Median:
PERCENTILE_CONT(0.5) PERCENTILE_CONT()- not recommended for large tables on a busy production server. Can be finicky about working with grouped data, so it may sometimes be easier to calculate the median and store it as a variable if you can.