Skip to content

Introduction to Statistics in Python

Run the hidden code cell below to import the data used in this course.

# Importing numpy and pandas
import numpy as np
import pandas as pd

# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Calculating probabilities

p(event)= (#ways event can happen) / (total # of possible outcome)

#Example
# Count the deals for each product
counts = deals['product'].value_counts()

# Calculate probability of picking a deal with each product
probs = counts / counts.sum()
print(probs)

Sampling Deals

np.random.seed() #set seed number for replication

df.sample(no_of_sample_to_take, replacement=true/false) #take a random sample from a dataset

Random sampling with or without replacements

with replacements - the taken sample is replaced in the dataset and has the chance of being picked on the next draw

without replacements - the taken sample is not replaced in the dataset and has 0% chance of being picked on the next draw

with replacements sampling are mostly independent events while without replacements are dependent events since the first event will affect the probability distribution on the next event

#Example

# Sample 5 deals without replacement
sample_without_replacement = deals.sample(5)
print(sample_without_replacement)
      
# Sample 5 deals with replacement
sample_with_replacement = deals.sample(5, replace=True)
print(sample_with_replacement)

Discrete distributions

describe probability for discrete outcomes

probability distribution

describes the probability of each possible outcome in a scenario

expected value

mean of a probability distribution. We can calculate this by multiplying each value by its probability.

Law of large numbers

as the size of your sample increases the sample mean will approach the expected value