Skip to content
Unsupervised Learning in Python
Run the hidden code cell below to import the data used in this course.
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import scipy.stats
# Import the course datasets
grains = pd.read_csv('datasets/grains.csv')
fish = pd.read_csv('datasets/fish.csv', header=None)
wine = pd.read_csv('datasets/wine.csv')
eurovision = pd.read_csv('datasets/eurovision-2016.csv')
stocks = pd.read_csv('datasets/company-stock-movements-2010-2015-incl.csv', index_col=0)
digits = pd.read_csv('datasets/lcd-digits.csv', header=None)
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- You work for an agricultural research center. Your manager wants you to group seed varieties based on different measurements contained in the
grains
DataFrame. They also want to know how your clustering solution compares to the seed types listed in the dataset (thevariety_number
andvariety
columns). Try to use all of the relevant techniques you learned in Unsupervised Learning in Python! - In the
fish
DataFrame, each row represents an individual fish. Standardize the features and cluster the fish by their measurements. You can then compare your cluster labels with the actual fish species (first column). - In the
wine
DataFrame, there are threeclass_labels
in this dataset. Transform the features to get the most accurate clustering. - In the
eurovision
DataFrame, perform hierarchical clustering of the voting countries usingcomplete
linkage and plot the resulting dendrogram.
fish = pd.read_csv('datasets/fish.csv')