Skip to content
# Start coding here...
pd.merge_asof
Perform a merge by key distance.
This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key.
Basic statistics
variance
Measures the average distance from eatch point to the mean algo:
- for every point take distance to mean
- square distance
- sum
- devide by number of points - 1
numpy:
np.var(data, ddof=1)
ddof= 0 only for full population stats
std:
Square root of variance
numpy: np.std(data, ddof=1)
mean absolute deviation
algo:
- take distance for each point to mean
- take absolute value for each distance
- take mean of those distances
std vs mad : std penalizes longer distances more then shorter distances (due to square) vs mad penalizes equally
quantiles:
Splits up data in some number of equal parts
numpy: np.quantile(data, quantiles)
box plot = np.quantile(data, [0, 0.25, 0.5, 0.75, 1])
-> can be shortend using np.linspace: np.quantile(data, np.linspace(0,1,4))
np.quantile(data, [0, 0.25, 0.5, 0.75, 1])
-> can be shortend using np.linspace: np.quantile(data, np.linspace(0,1,4))
IQR
Distance between .25 quantile and .75 quantile = height of boxplot scipy:
from scipy.stats import iqr iqr(data)
Outliers
defined as: data < Q1 - 1.5 * IQR && data > Q3 + 1.5 * IQR