Sampling in Python

Run the hidden code cell below to import the data used in this course.

# Importing pandas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Importing the course arrays
attrition = pd.read_feather("datasets/attrition.feather")
spotify = pd.read_feather("datasets/spotify_2000_2020.feather")
coffee = pd.read_feather("datasets/coffee_ratings_full.feather")

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

coffee.head()

coffee['country_of_origin'].value_counts()

Hidden output

top_countries = ['Mexico','Colombia','Guatemala','Brazil','Taiwan','United States (Hawaii)','Honduras','Costa' 'Rica','Ethiopia','Tanzania']
coffee_top = coffee[coffee['country_of_origin'].isin(top_countries)]
coffee_top1 = coffee_top[['country_of_origin','flavor','aftertaste','acidity']]

coffee_samp = coffee_top1.sample(frac=1/3,random_state=2021)
coffee_samp['country_of_origin'].value_counts(normalize= True),coffee_top['country_of_origin'].value_counts(normalize = True)

Hidden output

coffee_strat = coffee_top.groupby('country_of_origin').sample(frac=0.1,random_state=2021)
coffee_strats = coffee_top.groupby('country_of_origin').sample(n=15,random_state=2021)
coffee_strat['country_of_origin'].value_counts(normalize=True),coffee_strats['country_of_origin'].value_counts(normalize=True)

Hidden output

2 hidden cells

standard_error = round(np.std(bstrap_mean_aftertaste,ddof=1),5)

Having a aftertaste hypothesized mean = 7.40

z = (sample stat − hypoth. param. value)/ standard_error

samp_mean =  round(coffee_samp['aftertaste'].mean(),2)

z = (samp_mean - 7.40)/standard_error
z

‌
‌
‌

Sampling in Python

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Sampling in Python

Take Notes

Sampling in Python