Unsupervised Learning in Python

Run the hidden code cell below to import the data used in this course.

# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import scipy.stats 

# Import the course datasets 
grains = pd.read_csv('datasets/grains.csv')
fish = pd.read_csv('datasets/fish.csv', header=None)
wine = pd.read_csv('datasets/wine.csv')
eurovision = pd.read_csv('datasets/eurovision-2016.csv')
stocks = pd.read_csv('datasets/company-stock-movements-2010-2015-incl.csv', index_col=0)
digits = pd.read_csv('datasets/lcd-digits.csv', header=None)

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

You work for an agricultural research center. Your manager wants you to group seed varieties based on different measurements contained in the grains DataFrame. They also want to know how your clustering solution compares to the seed types listed in the dataset (the variety_number and variety columns). Try to use all of the relevant techniques you learned in Unsupervised Learning in Python!
In the fish DataFrame, each row represents an individual fish. Standardize the features and cluster the fish by their measurements. You can then compare your cluster labels with the actual fish species (first column).
In the wine DataFrame, there are three class_labels in this dataset. Transform the features to get the most accurate clustering.
In the eurovision DataFrame, perform hierarchical clustering of the voting countries using complete linkage and plot the resulting dendrogram.

grains.variety.value_counts()

# print(grains.iloc[:,0].values)
# print(grains.iloc[:,2].values)
grains_56 = grains[['5','6','variety']]
grains_56.sample(6)

sns.scatterplot(data=grains_56, x='5', y='6', hue="variety")

sns.pairplot(grains.drop('variety_number', axis=1))
# pd.plotting.scatter_matrix(grain_vals_df, figsize=(10,10))
# x = grains.iloc[:,0].values
# y = grains.iloc[:,6].values
# plt.scatter(x, y, 
#             # c=grains.variety, 
#             alpha=0.6)
# plt.show()

from sklearn.cluster import KMeans
model = KMeans(n_clusters=7)
model.fit(grains.iloc[:,:7])
centroids = model.cluster_centers_
print(centroids)

ks = range(1, 11)
inertias = []

for k in ks:
    # Create a KMeans instance with k clusters: model
    model = KMeans(n_clusters=k)
    
    # Fit model to samples
    model.fit(grains.iloc[:,:7])
    
    # Append the inertia to the list of inertias
    inertias.append(model.inertia_)
    
# Plot ks vs inertias
plt.plot(ks, inertias, '-o')
plt.xlabel('number of clusters, k')
plt.ylabel('inertia')
plt.xticks(ks)
plt.show()

model = KMeans(n_clusters=3)
model.fit(grains.iloc[:,:7])
centroids = model.cluster_centers_
print(centroids)

# Use fit_predict to fit model and obtain cluster labels: labels
labels = model.fit_predict(grains.iloc[:,:7]) # .fit_predict() for a model class; .predict() for a pipeline class
varieties = grains.variety

# Create a DataFrame with labels and varieties as columns: df
df = pd.DataFrame({'labels': labels, 'varieties': varieties})

# Create crosstab: ct
ct = pd.crosstab(df['labels'], df['varieties'])

# Display ct
print(ct)

# Perform the necessary imports
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

# Create scaler: scaler
scaler = StandardScaler()

# Create KMeans instance: kmeans
kmeans = KMeans(n_clusters=3)

# Create pipeline: pipeline
pipeline = make_pipeline(scaler, kmeans)

# Fit the pipeline to samples
pipeline.fit(grains.iloc[:,:7])

# Calculate the cluster labels: labels
labels = pipeline.predict(grains.iloc[:,:7])

# Create a DataFrame with labels and species as columns: df
df = pd.DataFrame({'labels':labels, 'variety':grains.variety})

# Create crosstab: ct
ct = pd.crosstab(df['labels'], df['variety'])

# Display ct
print(ct)

‌
‌
‌

Unsupervised Learning in Python

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Unsupervised Learning in Python

Take Notes

Explore Datasets

Unsupervised Learning in Python