Skip to content

## Introduction to Statistics in Python

Run the hidden code cell below to import the data used in this course.

```
# Importing numpy and pandas
import numpy as np
import pandas as pd
# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")
```

`food.head()`

## Mean and Median

Calculate the mean and median of kilograms of food consumed per person per year for both countries.

```
#Filter for belgium
be_consumption = food[food['country']=='Belgium']
# Filter for USA
usa_consumption = food[food['country']=='USA']
be_consump_agg = food[food['country']=='Belgium'].agg([np.mean,np.median])
usa_consump_agg = food[food['country']=='USA'].agg([np.mean,np.median])
be_consump_agg
usa_consump_agg
```

##### Subset food_consumption for rows with data about Belgium and the USA.

##### Group the subsetted data by country and select only the consumption column.

##### Calculate the mean and median of the kilograms of food consumed per person per year in each country using .agg().

```
be_and_usa = food[(food['country']=='Belgium')|(food['country'] == 'USA')]
be_and_usa.groupby("country")["consumption"].agg([np.mean,np.median])
```

```
import matplotlib.pyplot as plt
rice_consump = food[food['food_category'] == 'rice']
rice_consump['co2_emission'].hist()
```

Use .agg() to calculate the mean and median of co2_emission for rice.

`rice_consump['co2_emission'].agg([np.mean,np.median])`

Calculate the quartiles of the co2_emission column of food_consumption.

`np.quantile(rice_consump['co2_emission'],0.5)`

Calculate the eleven quantiles of co2_emission that split up the data into ten pieces (deciles).

`np.quantile(rice_consump['co2_emission'],[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])`

Calculate the variance and standard deviation of co2_emission for each food_category by grouping and aggregating.