Skip to content
course notes
  • AI Chat
  • Code
  • Report
  • # Start coding here... 

    pd.merge_asof

    Perform a merge by key distance.

    This is similar to a left-join except that we match on nearest key rather than equal keys. Both DataFrames must be sorted by the key.

    Basic statistics

    variance

    Measures the average distance from eatch point to the mean algo:

    • for every point take distance to mean
    • square distance
    • sum
    • devide by number of points - 1 numpy: np.var(data, ddof=1) ddof= 0 only for full population stats

    std:

    Square root of variance numpy: np.std(data, ddof=1)

    mean absolute deviation

    algo:

    • take distance for each point to mean
    • take absolute value for each distance
    • take mean of those distances

    std vs mad : std penalizes longer distances more then shorter distances (due to square) vs mad penalizes equally

    quantiles:

    Splits up data in some number of equal parts numpy: np.quantile(data, quantiles)

    box plot = np.quantile(data, [0, 0.25, 0.5, 0.75, 1]) -> can be shortend using np.linspace: np.quantile(data, np.linspace(0,1,4))

    IQR

    Distance between .25 quantile and .75 quantile = height of boxplot scipy:

    from scipy.stats import iqr iqr(data)

    Outliers

    defined as: data < Q1 - 1.5 * IQR && data > Q3 + 1.5 * IQR