Skip to content

Introduction to Statistics in Python

Run the hidden code cell below to import the data used in this course.

# Importing numpy and pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Importing the course datasets
deals = pd.read_csv("datasets/amir_deals.csv")
happiness = pd.read_csv("datasets/world_happiness.csv")
food = pd.read_csv("datasets/food_consumption.csv")

Take Notes

Definition Statistics

Type of statistics

Countinuous variable

Discreate variable

Types of data

Measures of Center including mean, median and mode

Measures of speard

finding outliers

Distribution

What is statistics mean ?

  • Any quantity computed from values
  • Statistics the praactice, study of collecting and analyzing data.
  • In another words statistics How to use mathematical to analyzing the data. _Type of statistics. _There two type of statistics .
  1. Descriptive statistics
  2. Inferential statistics

What is descriptive statistics?

  • In the sample definition
  • Is to describe the data by using summary statistic like average median or charts or graphs and so on.
  • Example if we want to answer this question What is the average grade point for students for the first semester?

What is inferential statistics ?

  • Use sample data to make inference about target population.
  • This type of statistic belongs to distribution like benomail normal distribution.

What is type of data?

  • When we say type of data we mean types data in statistic not in general data.
  • Have two types Numerical and Categorical data:
  • NUMERICAL data (Quantitative) Example salary
  • CATEGORICAL data (qualitative) example amounts of products

Categorical data:

  • Can be represent as numbers or take numbers values.
  • We can subset categorical data to:
  1. Nominal data : Sometimes called labeled its unordered data.
  • Examples names, sex, ayes colours 2.Ordinal data: Strongly disagree Neither agree nor disagree and so on

What is continuous variables or continuous data?

Continuous data is data can be measured between an interval such as time ,temperature , sales per year

What is discreate values or data?

It's counted limited such as students in the class or the cars in jaraj

# display the data 
deals
# statistics numeric
deals.describe(include='all')
# look at distribution  of  the data use histogram
sns.histplot(data=deals.dropna(), x="amount")
plt.show()

Measures of center

# Calculate mean and median of amount group by product
deals_mean_median = deals.groupby("product")["amount"].agg([np.mean, np.median])
deals_mean_median
# Use pandas plot to display mean and median
deals_mean_median.hist()
plt.show()
# load dataset from csv file
food = pd.read_csv("datasets/food_consumption.csv")
food
# Filter Argentina
arg_consumption = food[food["country"]=="Argentina"]
# Filter Albania
alb_consumption = food[food["country"]=="Albania"]
# Calculate mean and median for Argentina
print(np.mean(arg_consumption["consumption"]), 'Average of Argentina consumption')
print(np.median(arg_consumption["consumption"]),  ' median of Argentina consumption   ')
# Calculate mean and median for Albania
print(np.mean(alb_consumption["consumption"]), 'Average consumption of Albania')
print(np.median(alb_consumption["consumption"]), 'Median consumption of Albania')
# subset for Argentina and albania
arg_and_alb = food[(food["country"]=="Argentina")|(food["country"]=="Albania")]
# Calculate mean and median for Argentina, Albania group by country
print(arg_and_alb.groupby("country")["consumption"].agg([np.mean, np.median]))
# Filter type of food  kind Wheat
wheat_consumption = food[food["food_category"]=="wheat"]
# Histogram emission of carbon from wheat
wheat_consumption["co2_emission"].hist()
plt.title('percentage of carbon emission from wheat', color='r')
plt.show()