Skip to content
Introduction to Statistics in Python
Introduction to Statistics in Python
Run the hidden code cell below to import the data used in this course.
# Importing numpy and pandas
import numpy as np
import pandas as pd
# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")happiness.head()Main Chapter cover in this notes
Theory
1. Variance 2. STD 3. MAD 4. Quantiles 5. IQR 6. Outlier 7. Measuring Chance
Codes
- Variance = np.var(
dataframe, ddof = 1 ) OR SUM(VALUE - MEAN)/(N-1) - std = np.std(dataframe , ddof = 1) OR np.sqrt(Variance)
- MAD = MEAN(SUM(ABS(VALUE - MEAN))
- Quantile = np.quantile(happiness['life_exp'],[0,0.25,0.5,0.75,1])
- make_devide_array => np.linspace(start, stop, numof devide)
- IQR = np.quantile(happiness['life_exp'],0.75) - np.quantile(happiness['life_exp'], 0.25); OR from scipy.stats import iqr ; iqr2 = iqr(happiness['life_exp'])
- lower_point = np.quantile(happiness['life_exp'], 0.25) - 1.5 * iqr1 ; upper_point = np.quantile(happiness['life_exp'], 0.75) + 1.5 * iqr1 ; Outlier = happiness[(happiness['life_exp'] < lower_point) | (happiness['life_exp'] > upper_point)]
- get random = happiness['life_exp'].sample(replace=True))
- count the valuse of product with group => amir_deals['product'].value_counts()
Variance
Variance and standard deviation are two of the most common ways to measure the spread of a variable
Calculating variance
- Find the mean value
- calculate the disstance from the mean
Add your notes here
life_exp_mean = np.mean(happiness['life_exp'])
happiness['dist'] = abs(happiness['life_exp'] - life_exp_mean)
happiness['dist'].head()- Clculate the square distance
- Sum of the suqare distance
happiness['sq_dis'] = happiness['dist']**2
sum_sq_dis = np.sum(happiness['sq_dis'])
print(sum_sq_dis)- Calculate Variance (devided by n-1 )
n = happiness['sq_dis'].size
variance = sum_sq_dis / (n - 1)
print(variance)Another method to find Variance
using np.var
variance2 = np.var(happiness['life_exp'], ddof=1)
print(variance2)STD (Standard deviation)
STD = square root of (Variance)