Skip to content

Sampling in Python

Run the hidden code cell below to import the data used in this course.

# Importing pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Importing the course arrays
attrition = pd.read_feather("datasets/attrition.feather")
spotify = pd.read_feather("datasets/spotify_2000_2020.feather")
coffee = pd.read_feather("datasets/coffee_ratings_full.feather")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here
coffee.head()
coffee['country_of_origin'].value_counts()
Hidden output
top_countries = ['Mexico','Colombia','Guatemala','Brazil','Taiwan','United States (Hawaii)','Honduras','Costa' 'Rica','Ethiopia','Tanzania']
coffee_top = coffee[coffee['country_of_origin'].isin(top_countries)]
coffee_top1 = coffee_top[['country_of_origin','flavor','aftertaste','acidity']]
coffee_samp = coffee_top1.sample(frac=1/3,random_state=2021)
coffee_samp['country_of_origin'].value_counts(normalize= True),coffee_top['country_of_origin'].value_counts(normalize = True)
Hidden output
coffee_strat = coffee_top.groupby('country_of_origin').sample(frac=0.1,random_state=2021)
coffee_strats = coffee_top.groupby('country_of_origin').sample(n=15,random_state=2021)
coffee_strat['country_of_origin'].value_counts(normalize=True),coffee_strats['country_of_origin'].value_counts(normalize=True)
Hidden output

2 hidden cells
standard_error = round(np.std(bstrap_mean_aftertaste,ddof=1),5)

Having a aftertaste hypothesized mean = 7.40

z = (sample stat − hypoth. param. value)/ standard_error

samp_mean =  round(coffee_samp['aftertaste'].mean(),2)
z = (samp_mean - 7.40)/standard_error
z