Skip to content
Supervised Learning with scikit-learn
Supervised Learning with scikit-learn
Run the hidden code cell below to import the data used in this course.
# Importing pandas
import pandas as pd
# Importing the course datasets
diabetes = pd.read_csv('datasets/diabetes_clean.csv')
music = pd.read_csv('datasets/music_clean.csv')
advertising = pd.read_csv('datasets/advertising_and_sales_clean.csv')
telecom = pd.read_csv("datasets/telecom_churn_clean.csv")Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Classification
telecom.head()# Import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier
# Create arrays for the features and the target variable
y = telecom["churn"].values
X = telecom[["account_length", "customer_service_calls"]].values
# Create a KNN classifier with 6 neighbors
knn = KNeighborsClassifier(n_neighbors=6)
# Fit the classifier to the data
knn.fit(X, y)import numpy as np
X_new = np.array([[30.0, 17.5],
[107.0, 24.1],
[213.0, 10.9]])
# Predict the labels for the X_new
y_pred = knn.predict(X_new)
# Print the predictions for X_new
print("Predictions: {}".format(y_pred)) Measuring model performance
# Import the module
from sklearn.model_selection import train_test_split
X = telecom.drop("churn", axis=1).values
y = telecom["churn"].values
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
knn = KNeighborsClassifier(n_neighbors=5)
# Fit the classifier to the training data
knn.fit(X_train,y_train)
# Print the accuracy
print(knn.score(X_test, y_test))# Create neighbors
neighbors = np.arange(1, 13)
train_accuracies = {}
test_accuracies = {}
for neighbor in neighbors:
# Set up a KNN Classifier
knn = KNeighborsClassifier(n_neighbors=neighbor)
# Fit the model
knn.fit(X_train, y_train)
# Compute accuracy
train_accuracies[neighbor] = knn.score(X_train, y_train)
test_accuracies[neighbor] = knn.score(X_test, y_test)
print(neighbors, '\n', train_accuracies, '\n', test_accuracies)import matplotlib.pyplot as plt
# Add a title
plt.title("KNN: Varying Number of Neighbors")
# Plot training accuracies
plt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")
# Plot test accuracies
plt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")
plt.legend()
plt.xlabel("Number of Neighbors")
plt.ylabel("Accuracy")
# Display the plot
plt.show()Regression
Introducing to regression
Aca lo que hacemos es extraer columnas "X" y "Y" de acuerdo a lo que deseamos solo para esas dos columnas.
Los vectores deben estar en dos dimensione spara sckit learn los pueda procesar, por eso usamos .reshape y como se vera en la salida del codigo, solo "X" quedo convertida a una tupla que puede ser manejada por scikit learn
import numpy as np
import pandas as pd
# Create X from the radio column's values
X = advertising["radio"].values
# Create y from the sales column's values
y = advertising["sales"].values
# Reshape X
X = X.reshape(-1,1)
# Check the shape of the features and targets
print(X.shape,y.shape)Vamos a crear el modelo de regresion
- Tengamos presente que las predicciones solo las hacemos sobre los features o caracteristicas= "X" y que curiosamente podemos tener varias predicciones pues se esta usando el "X" y podriamos proyect mucho mas valores por medio de la linea "print(predictions[:#])"