Skip to content
2 hidden cells
Sampling in Python
Sampling in Python
Run the hidden code cell below to import the data used in this course.
# Importing pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Importing the course arrays
attrition = pd.read_feather("datasets/attrition.feather")
spotify = pd.read_feather("datasets/spotify_2000_2020.feather")
coffee = pd.read_feather("datasets/coffee_ratings_full.feather")
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
coffee.head()
coffee['country_of_origin'].value_counts()
Hidden output
top_countries = ['Mexico','Colombia','Guatemala','Brazil','Taiwan','United States (Hawaii)','Honduras','Costa' 'Rica','Ethiopia','Tanzania']
coffee_top = coffee[coffee['country_of_origin'].isin(top_countries)]
coffee_top1 = coffee_top[['country_of_origin','flavor','aftertaste','acidity']]
coffee_samp = coffee_top1.sample(frac=1/3,random_state=2021)
coffee_samp['country_of_origin'].value_counts(normalize= True),coffee_top['country_of_origin'].value_counts(normalize = True)
Hidden output
coffee_strat = coffee_top.groupby('country_of_origin').sample(frac=0.1,random_state=2021)
coffee_strats = coffee_top.groupby('country_of_origin').sample(n=15,random_state=2021)
coffee_strat['country_of_origin'].value_counts(normalize=True),coffee_strats['country_of_origin'].value_counts(normalize=True)
Hidden output
2 hidden cells
standard_error = round(np.std(bstrap_mean_aftertaste,ddof=1),5)
Having a aftertaste hypothesized mean = 7.40
z = (sample stat − hypoth. param. value)/ standard_error
samp_mean = round(coffee_samp['aftertaste'].mean(),2)
z = (samp_mean - 7.40)/standard_error
z