Skip to content

Unsupervised Learning in Python

Here, we can experiment with the data used in Unsupervised Learning in Python

Below is a code cell that imports the course packages and loads in the course datasets as pandas DataFrames.

# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import scipy.stats 

# Import the course datasets as DataFrames
grains = pd.read_csv('datasets/grains.csv')
fish = pd.read_csv('datasets/fish.csv', header=None)
wine = pd.read_csv('datasets/wine.csv')
eurovision = pd.read_csv('datasets/eurovision-2016.csv')
stocks = pd.read_csv('datasets/company-stock-movements-2010-2015-incl.csv', index_col=0)
digits = pd.read_csv('datasets/lcd-digits.csv', header=None)

# Preview the first DataFrame
grains

Challenge

Don't know where to start? try the following challenge:

You work for an agricultural research center. Your manager wants you to group seed varieties based on different measurements contained in the grains DataFrame. They also want to know how your clustering solution compares to the seed types listed in the dataset (the variety_number and variety columns).

use all of the relevant techniques you learned in Unsupervised Learning in Python


Exploration

Feeling confident about your skills? Continue to Machine Learning with Tree-Based Models in Python, or check out the other Machine Learning Scientist with Python Career Track courses to learn other advanced machine learning techniques.

If you're interested in exploring the remaining course datasets, you can refer to the DataFrames and potential problems below:

  • fish: Each row represents an individual fish. Standardize the features and cluster the fish by their measurements. You can then compare your cluster labels with the actual fish species (first column).
  • wine: There are three class_labels in this dataset. Transform the features to get the most accurate clustering.
  • eurovision: Perform hierarchical clustering of the voting countries using complete linkage and plot the resulting dendrogram.