Introduction to Statistics in Python
Run the hidden code cell below to import the data used in this course.
# Importing numpy and pandas
import numpy as np
import pandas as pd
# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Calculating probabilities
p(event)= (#ways event can happen) / (total # of possible outcome)
#Example
# Count the deals for each product
counts = deals['product'].value_counts()
# Calculate probability of picking a deal with each product
probs = counts / counts.sum()
print(probs)Sampling Deals
np.random.seed() #set seed number for replication
df.sample(no_of_sample_to_take, replacement=true/false) #take a random sample from a dataset
Random sampling with or without replacements
with replacements - the taken sample is replaced in the dataset and has the chance of being picked on the next draw
without replacements - the taken sample is not replaced in the dataset and has 0% chance of being picked on the next draw
with replacements sampling are mostly independent events while without replacements are dependent events since the first event will affect the probability distribution on the next event
#Example
# Sample 5 deals without replacement
sample_without_replacement = deals.sample(5)
print(sample_without_replacement)
# Sample 5 deals with replacement
sample_with_replacement = deals.sample(5, replace=True)
print(sample_with_replacement)Discrete distributions
describe probability for discrete outcomes
probability distribution
describes the probability of each possible outcome in a scenario
expected value
mean of a probability distribution. We can calculate this by multiplying each value by its probability.
Law of large numbers
as the size of your sample increases the sample mean will approach the expected value