Skip to content
# Start coding here... 

pd.merge_asof

Perform a merge by key distance.

This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key.

Basic statistics

variance

Measures the average distance from eatch point to the mean algo:

  • for every point take distance to mean
  • square distance
  • sum
  • devide by number of points - 1 numpy: np.var(data, ddof=1) ddof= 0 only for full population stats

std:

Square root of variance numpy: np.std(data, ddof=1)

mean absolute deviation

algo:

  • take distance for each point to mean
  • take absolute value for each distance
  • take mean of those distances

std vs mad : std penalizes longer distances more then shorter distances (due to square) vs mad penalizes equally

quantiles:

Splits up data in some number of equal parts numpy: np.quantile(data, quantiles)

box plot = np.quantile(data, [0, 0.25, 0.5, 0.75, 1]) -> can be shortend using np.linspace: np.quantile(data, np.linspace(0,1,4))

IQR

Distance between .25 quantile and .75 quantile = height of boxplot scipy:

from scipy.stats import iqr iqr(data)

Outliers

defined as: data < Q1 - 1.5 * IQR && data > Q3 + 1.5 * IQR