Skip to content

Introduction to Statistics in Python

This notebook aims to show the issue I faced with Exercise 9 on DataCamp Learn. These are the graphs I get and am not sure where I messed up.

Configure Notebook

This creates as close of an environment to the real exercise as I could do.

import numpy as np import pandas as pd data = pd.read_csv('food_consumption.csv') # Create DataFrame of Food Consumption Data food_consumption = pd.DataFrame(data)
import numpy as np
import pandas as pd

data = pd.read_csv('food_consumption.csv')

# Create DataFrame of Food Consumption Data
food_consumption = pd.DataFrame(data)

My attempted solution

I attempted to use Pandas built-in queries.

My hypothesis is that the codes are being checked line by line to see if they're identical to solution.py.

# Print variance and sd of co2_emission for each food_category df = food_consumption.groupby('food_category')['co2_emission'].agg([np.var, lambda x: np.std(x, ddof=1)]) df.columns = ['var', 'std'] print(df) # Import matplotlib.pyplot with alias plt import matplotlib.pyplot as plt # Create histogram of co2_emission for food_category 'beef' food_consumption.query('food_category == "beef"')['co2_emission'].plot(kind='hist') # Show plot plt.show() # Create histogram of co2_emission for food_category 'eggs' food_consumption.query('food_category == "eggs"')['co2_emission'].plot(kind='hist') # Show plot plt.show()
# Print variance and sd of co2_emission for each food_category
df = food_consumption.groupby('food_category')['co2_emission'].agg([np.var, lambda x: np.std(x, ddof=1)])
df.columns = ['var', 'std']
print(df)

# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt

# Create histogram of co2_emission for food_category 'beef'
food_consumption.query('food_category == "beef"')['co2_emission'].plot(kind='hist')
# Show plot
plt.show()

# Create histogram of co2_emission for food_category 'eggs'
food_consumption.query('food_category == "eggs"')['co2_emission'].plot(kind='hist')
# Show plot
plt.show()

Official

# Print variance and sd of co2_emission for each food_category print(food_consumption.groupby('food_category')['co2_emission'].agg([np.var, np.std])) # Import matplotlib.pyplot with alias plt import matplotlib.pyplot as plt # Create histogram of co2_emission for food_category 'beef' food_consumption[food_consumption['food_category'] == 'beef']['co2_emission'].hist() # Show plot plt.show() # Create histogram of co2_emission for food_category 'eggs' food_consumption[food_consumption['food_category'] == 'eggs']['co2_emission'].hist() # Show plot plt.show()
# Print variance and sd of co2_emission for each food_category
print(food_consumption.groupby('food_category')['co2_emission'].agg([np.var, np.std]))

# Import matplotlib.pyplot with alias plt
import matplotlib.pyplot as plt

# Create histogram of co2_emission for food_category 'beef'
food_consumption[food_consumption['food_category'] == 'beef']['co2_emission'].hist()
# Show plot
plt.show()

# Create histogram of co2_emission for food_category 'eggs'
food_consumption[food_consumption['food_category'] == 'eggs']['co2_emission'].hist()
# Show plot
plt.show()