Supervised Learning with scikit-learn

Run the hidden code cell below to import the data used in this course.

# Importing pandas and numpy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scikitplot
# Importing the course datasets 
diabetes = pd.read_csv('datasets/diabetes_clean.csv')
music = pd.read_csv('datasets/music_clean.csv')
advertising = pd.read_csv('datasets/advertising_and_sales_clean.csv')
telecom = pd.read_csv("datasets/telecom_churn_clean.csv")

Chapter 1: Classification

What is machine learning?

Process whereby:

Computers learn to make decisions from data without being expilcitly programed

Supervised learning

The predicted values are known
Aim: Predict the target values of unseen data, given the features
Uses features to predict the value of a target variable

Types of supervised learning

Classification: target variable consists of categories (fraudulent vs non-fraudulent transaction is an example of binary classification)
Regression: Target variable is continuous

Naming conventions

Feature = predictor varible = independent variable (column in table)
Target variable = dependent variable = response variable

Requirements before using supervised learning:

No missing values
Data in numeric form
Data stored in pandas DF or NumPy array
Perform EDA first

scikit-learn syntax

`from sklearn.module_name import ModelName

model = ModelName()

model.fit(X, y)

predictions = model.predict(X_new)

print(predictions)`

Add your notes here

Classification challenge

Classifying labels of unseen data

Build a model
Model learns from the labeled dat we pass to it
Pass unlabeled data to the model as input
Model predicts the labels of the unseen data

Labeled data = training data

k-Nearest Neighbors

KNN predicts label of a data point by
Looking at the k closest labeled data points
Taking a majority vote

telecom.head()

from sklearn.neighbors import KNeighborsClassifier
X = telecom[['total_day_charge', 'total_eve_charge']].values
y = telecom['churn'].values
# .values converts X and y to numpy arrays
print(X.shape, y.shape)
print(f'There are {X.shape[0]} observations of {X.shape[1]} features, and {y.shape[0]} observations of the target feature.')

# instantiate the algorithim n_neighbors being 15
knn = KNeighborsClassifier(n_neighbors=15)
knn.fit(X, y)

Predicting on unlabeled data

# inputting new data to test the model
X_new = np.array([[56.8, 17.5], [24.4, 24.1], [50.1, 10.9]])

# predicting the new values
predictions = knn.predict(X_new)

print('Predictions: {}'.format(predictions))

Predicted that the first customer will churn, and the next 2 won't.

Measuring model performance

In classification, accuracy is a commonly used metric
Accuracy is NOT indicative of ability to generalize

Computing Accuracy

Split the data into a training set and a test set, fit/train classifier on the training set, then calculate accuracy using test set.

‌
‌
‌

Supervised Learning with scikit-learn

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Supervised Learning with scikit-learn

Chapter 1: Classification

What is machine learning?

Supervised learning

Types of supervised learning

Naming conventions

Requirements before using supervised learning:

scikit-learn syntax

Classification challenge

Classifying labels of unseen data

k-Nearest Neighbors

Predicting on unlabeled data

Measuring model performance

Computing Accuracy

Supervised Learning with scikit-learn