Skip to content

`# Start coding here... `

`pd.merge_asof`

Perform a merge by key distance.

This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key.

## Basic statistics

### variance

Measures the average distance from eatch point to the mean algo:

- for every point take distance to mean
- square distance
- sum
- devide by number of points - 1
numpy:
`np.var(data, ddof=1)`

ddof= 0 only for full population stats

### std:

Square root of variance
numpy: `np.std(data, ddof=1)`

### mean absolute deviation

algo:

- take distance for each point to mean
- take absolute value for each distance
- take mean of those distances

std vs mad : std penalizes longer distances more then shorter distances (due to square) vs mad penalizes equally

### quantiles:

Splits up data in some number of equal parts
numpy: `np.quantile(data, quantiles)`

#### box plot = `np.quantile(data, [0, 0.25, 0.5, 0.75, 1])`

-> can be shortend using np.linspace: `np.quantile(data, np.linspace(0,1,4))`

`np.quantile(data, [0, 0.25, 0.5, 0.75, 1])`

-> can be shortend using np.linspace: `np.quantile(data, np.linspace(0,1,4))`

#### IQR

Distance between .25 quantile and .75 quantile = height of boxplot scipy:

`from scipy.stats import iqr iqr(data)`

#### Outliers

defined as: data < Q1 - 1.5 * IQR && data > Q3 + 1.5 * IQR