Time Series Analysis in SQL Server

1. Working with Date and Times

Downsampling and Upsampling data → Decrease the frequency of a time series dataset (ms, s to min, hrs); increase the frequency of a time series dataset for an imbalanced dataset. (Covered in more detail later)
DATETIME2 (SYSDATETIME(), SYSUTCDATETIME()) → more accurate than DATETIME
DAY(), MONTH(), YEAR() - Break out a date into individual components
DATEPART, DATENAME - Ex. Week of Year, ISO Week of Year (Mostly used in Europe)
DATEADD, DATEDIFF - Used for DATE Math - Can be nested for precision

DataFrameas

df

variable

SELECT DATEADD(Hour, -3, DATEADD(DAY, -4, GETDATE())) AS [date_precise-4d-3h], 
	GETDATE() [current_date]

Rounding Dates - Nest DATEDIFF and DATEADD - Can be used to calculate first day of the past n month(s)

DataFrameas

df1

variable

SELECT DATEADD(Day, DATEDIFF(Day, 0, GETDATE()), 0) [rounded_date(add total_days to 0th date)],
	DATEDIFF(Day, 0, GETDATE()) [total_days],
	GETDATE() [current_date], 
	DATEADD(Month, DATEDIFF(Month, 0, GETDATE()) - 1, 0) [fist_day_m-1]

DATEDIFF() returns an integer type, so it can overflow if you try to round to seconds. It could overflow when rounding to minutes, but that will be after the year 5000
CAST(): Since 2000, ANSI SQL Standard (All flavours of SQL), No control over formatting
CONVERT(): Since 2000, Only for T-SQL, Some control over formatting using the style (Third) Parameter
For interoperability, OBDC (20/120) & ISO8601 (126) style code are recommended (CONVERT())
FORMAT(): Since 2012, Only for T-SQL, Uses .NET framework for conversion, Greater Flexibility over formatting from dates to strings, slow (~8s for 1M rows, compared to ~4s (CAST() & CONVERT()))

DataFrameas

df2

variable

DECLARE @d DATE = '11/22/2020';
SELECT FORMAT( @d, 'd', 'en-US' ) 'US English'  ,FORMAT( @d, 'D', 'en-gb' ) 'British English'  ,FORMAT( @d, 'd', 'de-de' ) 'German'  	,FORMAT( @d, 'D', 'zh-cn' ) 'Chinese Simplified (PRC)';

DataFrameas

df3

variable

DECLARE @d DATE = GETDATE();  
SELECT FORMAT( @d, 'dd/MM/yyyy') AS 'Date'  
       ,FORMAT(123456789,'###-##-####') AS 'Custom Number';

Calendar Table: Stores date information for quick retrieval. Can be built once & ideally never updated. Should be treated as Warehouse.
Contents (Invalid URL) of Calendar Table: General Columns (Date, Day, Month, Year, IsWeekend), Fiscal & Calendar Year Columns, Specialized Columns (Holiday, Lunar, ISO Week)

DataFrameas

df7

variable

-- Live Calendar Table with Recursion
WITH calendar AS (
    SELECT CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) AS DATE) [date]
    UNION ALL 
    SELECT DATEADD(DAY, 1, date) [date] FROM calendar WHERE date < DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()) + 1, 0)
)
SELECT date, MONTH(date) month, YEAR(date) year, DAY(date) day, EOMONTH(date) end_of_month, DATEPART(WEEKDAY, date) weekday, 
    DATEPART(WEEK, date) week,
CASE WHEN DATEPART(WEEKDAY, date) = 7 THEN 
        CASE WHEN (ROW_NUMBER() OVER (PARTITION BY DATEPART(WEEKDAY, date) ORDER BY date))%2 = 0 THEN 1 ELSE 0 END
    WHEN DATEPART(WEEKDAY, date) = 1 THEN 1 ELSE 0 END weekend, 
    DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date) saturday_offset,
    CEILING(CAST(DAY(DATEADD(DAY, -DATEPART(WEEKDAY, DATE)%7-1, Date))/7 AS FLOAT)) sat_offset_weeknum
FROM calendar

Hidden output

APPLY() -

2. Converting to dates and times

3. Aggregating Time Series data

Basic Agg. Func. - COUNT()', COUNT_BIG() (64 - bit integerBIGINT), SUM(), MAX(), MIN()
Cardinality: No. of unique values in a column (attribute) - checked by COUNT(DISTINCT
Total no. of rows - COUNT(), non-NULL no. of rows - COUNT(column_name), COUNT(NULLIF(column_name, 'filter'))
CASE WHEN can be nested inside aggregate functions to filter the aggregates

DataFrameas

df5

variable

-- ex

Statistical Agg. Func. - AVG(), STDEV(), VAR(), STDEVP(), VARP() (Population standard deviation and variance)
Median: PERCENTILE_CONT(0.5)
PERCENTILE_CONT() - not recommended for large tables on a busy production server. Can be finicky about working with grouped data, so it may sometimes be easier to calculate the median and store it as a variable if you can.

‌
‌
‌

Time Series Analysis in SQL Server - Course Notes by Chinmay Garg

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Time Series Analysis in SQL Server

1. Working with Date and Times

2. Converting to dates and times

3. Aggregating Time Series data

Time Series Analysis in SQL Server