Skip to content

Introduction to Statistics in Python

Run the hidden code cell below to import the data used in this course.

# Importing numpy and pandas
import numpy as np
import pandas as pd

# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")
happiness.head()

Main Chapter cover in this notes

Theory

1. Variance 2. STD 3. MAD 4. Quantiles 5. IQR 6. Outlier 7. Measuring Chance

Codes

  1. Variance = np.var(dataframe, ddof = 1 ) OR SUM(VALUE - MEAN)/(N-1)
  2. std = np.std(dataframe , ddof = 1) OR np.sqrt(Variance)
  3. MAD = MEAN(SUM(ABS(VALUE - MEAN))
  4. Quantile = np.quantile(happiness['life_exp'],[0,0.25,0.5,0.75,1])
  5. make_devide_array => np.linspace(start, stop, numof devide)
  6. IQR = np.quantile(happiness['life_exp'],0.75) - np.quantile(happiness['life_exp'], 0.25); OR from scipy.stats import iqr ; iqr2 = iqr(happiness['life_exp'])
  7. lower_point = np.quantile(happiness['life_exp'], 0.25) - 1.5 * iqr1 ; upper_point = np.quantile(happiness['life_exp'], 0.75) + 1.5 * iqr1 ; Outlier = happiness[(happiness['life_exp'] < lower_point) | (happiness['life_exp'] > upper_point)]
  8. get random = happiness['life_exp'].sample(replace=True))
  9. count the valuse of product with group => amir_deals['product'].value_counts()

Variance

Variance and standard deviation are two of the most common ways to measure the spread of a variable

Calculating variance

  1. Find the mean value
  2. calculate the disstance from the mean

Add your notes here

life_exp_mean = np.mean(happiness['life_exp'])
happiness['dist'] = abs(happiness['life_exp'] - life_exp_mean)
happiness['dist'].head()
  1. Clculate the square distance
  2. Sum of the suqare distance
happiness['sq_dis'] = happiness['dist']**2
sum_sq_dis = np.sum(happiness['sq_dis'])
print(sum_sq_dis)
  1. Calculate Variance (devided by n-1 )
n = happiness['sq_dis'].size
variance = sum_sq_dis / (n - 1)
print(variance)

Another method to find Variance

using np.var

variance2 = np.var(happiness['life_exp'], ddof=1)
print(variance2)

STD (Standard deviation)

STD = square root of (Variance)