Skip to content
Time Series Analysis in SQL Server - Course Notes by Chinmay Garg
Time Series Analysis in SQL Server
1. Working with Date and Times
- Downsampling and Upsampling data → Decrease the frequency of a time series dataset (ms, s to min, hrs); increase the frequency of a time series dataset for an imbalanced dataset. (Covered in more detail later)
DATETIME2 (SYSDATETIME(), SYSUTCDATETIME())
→ more accurate thanDATETIME
DAY(), MONTH(), YEAR()
- Break out a date into individual componentsDATEPART, DATENAME
- Ex. Week of Year, ISO Week of Year (Mostly used in Europe)DATEADD
,DATEDIFF
- Used for DATE Math - Can be nested for precision
DataFrameavailable as
df
variable
SELECT DATEADD(Hour, -3, DATEADD(DAY, -4, GETDATE())) AS [date_precise-4d-3h],
GETDATE() [current_date]
- Rounding Dates - Nest
DATEDIFF
andDATEADD
- Can be used to calculate first day of the past n month(s)
DataFrameavailable as
df1
variable
SELECT DATEADD(Day, DATEDIFF(Day, 0, GETDATE()), 0) [rounded_date(add total_days to 0th date)],
DATEDIFF(Day, 0, GETDATE()) [total_days],
GETDATE() [current_date],
DATEADD(Month, DATEDIFF(Month, 0, GETDATE()) - 1, 0) [fist_day_m-1]
DATEDIFF()
returns an integer type, so it can overflow if you try to round to seconds. It could overflow when rounding to minutes, but that will be after the year 5000CAST()
: Since 2000, ANSI SQL Standard (All flavours of SQL), No control over formattingCONVERT()
: Since 2000, Only for T-SQL, Some control over formatting using the style (Third) Parameter- For interoperability, OBDC (
20/120
) & ISO8601 (126
) style code are recommended (CONVERT()
) FORMAT()
: Since 2012, Only for T-SQL, Uses .NET framework for conversion, Greater Flexibility over formatting from dates to strings, slow (~8s for 1M rows, compared to ~4s (CAST() & CONVERT()
))
DataFrameavailable as
df2
variable
DECLARE @d DATE = '11/22/2020';
SELECT FORMAT( @d, 'd', 'en-US' ) 'US English' ,FORMAT( @d, 'D', 'en-gb' ) 'British English' ,FORMAT( @d, 'd', 'de-de' ) 'German' ,FORMAT( @d, 'D', 'zh-cn' ) 'Chinese Simplified (PRC)';
DataFrameavailable as
df3
variable
DECLARE @d DATE = GETDATE();
SELECT FORMAT( @d, 'dd/MM/yyyy') AS 'Date'
,FORMAT(123456789,'###-##-####') AS 'Custom Number';
- Calendar Table: Stores date information for quick retrieval. Can be built once & ideally never updated. Should be treated as Warehouse.
- Contents (Invalid URL) of Calendar Table: General Columns (Date, Day, Month, Year, IsWeekend), Fiscal & Calendar Year Columns, Specialized Columns (Holiday, Lunar, ISO Week)
DataFrameavailable as
df7
variable
-- Live Calendar Table with Recursion
WITH calendar AS (
SELECT CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS DATE) [date]
UNION ALL
SELECT DATEADD(DAY, 1, date) [date] FROM calendar WHERE date < DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()) + 1, 0)
)
SELECT date, MONTH(date) month, YEAR(date) year, DAY(date) day, EOMONTH(date) end_of_month, DATEPART(WEEKDAY, date) weekday,
DATEPART(WEEK, date) week,
CASE WHEN DATEPART(WEEKDAY, date) = 7 THEN
CASE WHEN (ROW_NUMBER() OVER (PARTITION BY DATEPART(WEEKDAY, date) ORDER BY date))%2 = 0 THEN 1 ELSE 0 END
WHEN DATEPART(WEEKDAY, date) = 1 THEN 1 ELSE 0 END weekend,
DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date) saturday_offset,
CEILING(CAST(DAY(DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date))/7 AS FLOAT)) sat_offset_weeknum
FROM calendar
Hidden output
APPLY()
-
2. Converting to dates and times
3. Aggregating Time Series data
- Basic Agg. Func. -
COUNT()'
,COUNT_BIG()
(64 - bit integerBIGINT
),SUM()
,MAX()
,MIN()
- Cardinality: No. of unique values in a column (attribute) - checked by
COUNT(DISTINCT
- Total no. of rows -
COUNT()
, non-NULL
no. of rows -COUNT(column_name), COUNT(NULLIF(column_name, 'filter'))
CASE WHEN
can be nested inside aggregate functions to filter the aggregates
DataFrameavailable as
df5
variable
-- ex
- Statistical Agg. Func. -
AVG()
,STDEV()
,VAR()
,STDEVP()
,VARP()
(Population standard deviation and variance) - Median:
PERCENTILE_CONT(0.5)
PERCENTILE_CONT()
- not recommended for large tables on a busy production server. Can be finicky about working with grouped data, so it may sometimes be easier to calculate the median and store it as a variable if you can.