Skip to content
Introduction to Statistics in Python
Introduction to Statistics in Python
Run the hidden code cell below to import the data used in this course.
# Importing numpy and pandas
import numpy as np
import pandas as pd
# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Usable modules for statistics in python
- scipy.stats.uniform
- uniform.rvs(start, stop, size=no of random items to be generated) #to get random samples
- uniform.cdf(7, 0, 12) # P(x <= 7)
- scipy.stats.binom # binomial distribution
- binom.rvs(no of coins, pr of success, size=no_of_trials)
- binom.pmf(no_of_heads, num_trials, pr_of_heads)
- binom.cdf()
- Noormal distribution
- scipy.stats.norm
- norm.cdf(no_of_interst, mean, std)
- percentile:
- norm.ppf(percent, mean, std)
- To generate random n heights:
- norm.rvs(mean, std, size=n)
- Poisson distribution
- scipy.stats.poisson
- if avg adoption per week is 8, pr = 5
- poisson.pmf(5, 8)
- pr(x <= 5)
- poisson.cdf(5, 8)
- Exponential distribution
- scipy.stats.expon
- pr(wait < 1 min) : expon.cdf(1, scale=2)
- scale = 1 / lambda(.5)
Seaborn as sns sns.scatterplot(x, y, data)
to add linear trendline sns.lmplot(x, y, data, ci=None)