Skip to content

Introduction to Statistics in Python

Run the hidden code cell below to import the data used in this course.

# Importing numpy and pandas
import numpy as np
import pandas as pd

# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Usable modules for statistics in python

  1. scipy.stats.uniform
  • uniform.rvs(start, stop, size=no of random items to be generated) #to get random samples
  • uniform.cdf(7, 0, 12) # P(x <= 7)
  1. scipy.stats.binom # binomial distribution
  • binom.rvs(no of coins, pr of success, size=no_of_trials)
  • binom.pmf(no_of_heads, num_trials, pr_of_heads)
  • binom.cdf()
  1. Noormal distribution
  • scipy.stats.norm
  • norm.cdf(no_of_interst, mean, std)
  • percentile:
  • norm.ppf(percent, mean, std)
  • To generate random n heights:
  • norm.rvs(mean, std, size=n)
  1. Poisson distribution
  • scipy.stats.poisson
  • if avg adoption per week is 8, pr = 5
  • poisson.pmf(5, 8)
  • pr(x <= 5)
  • poisson.cdf(5, 8)
  1. Exponential distribution
  • scipy.stats.expon
  • pr(wait < 1 min) : expon.cdf(1, scale=2)
  • scale = 1 / lambda(.5)

Seaborn as sns sns.scatterplot(x, y, data)

to add linear trendline sns.lmplot(x, y, data, ci=None)