Introduction to Statistics in Python

Run the hidden code cell below to import the data used in this course.

# Importing numpy and pandas
import numpy as np
import pandas as pd

# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")

Summary Statistics

Definition

The field of statistics : the practice and study of collecting and analyzing data
A summary statistic : a fact about or summary of some data

Type of Statistics

Descriptive statistics

Describe and summarize data
- 50% of friends drive to work
- 25% take the bus
- 25% take the bike

Inferential statistics

Use a sample of data to make inferences about a larger population
- what percent of people drive to work

Types of data :

Numeric

Countinous (measured)
- Airplane speed
- Time Spent waiting in line
Discrete (counted)
- Number of pets
- Number of packages shipped

Categorical

Nominal (Unordered)
- Married/unmerried
- Country of residence
Ordinal (Ordered)
- strongly agree - strongly disagree (1-5)

Measure of center

where is the center of the data?

Mean
Median
Mode

# Mode

food['country'].value_counts().head(5)

import statistics
statistics.mode(food["country"])

# Median and mean
# Opsi 1
var = food[food['country']=='Argentina']['consumption']
print("median : ", np.median(var))
print("mean : ",np.mean(var))

# Opsi 2
food[food["country"]=="Argentina"]["consumption"].agg([np.mean, np.median])

If the data skewed, it depend on the type of skewed of the distribution.

Right-skewed
Left-skewed It is recommended to used Median

Measure of Spread

Variance
Standard deviation
Quartiles
IQR (Q3-Q1)
Outliers

# to calculate the variance
# use np.var()
np.var(food["consumption"], ddof=1)

# ddof = 1, it means we calculate the sample variance
# without ddof = 1, we calculate the population

# to calculate the standard deviation
# use np.std()

np.std(food["consumption"], ddof=1)

# Remember that standard deviation is a squared-root of variance
np.sqrt(np.var(food["consumption"], ddof=1))

‌
‌
‌

Introduction to Statistics in Python

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Introduction to Statistics in Python

Summary Statistics

Definition

Type of Statistics

Types of data :

Measure of center

Measure of Spread

Introduction to Statistics in Python