Unsupervised Learning in Python
👋 Welcome to your new workspace! Here, you can experiment with the data you used in Unsupervised Learning in Python and practice your newly learned skills with a challenge. You can find out more about DataCamp Workspace here.
Below is a code cell that imports the course packages and loads in the course datasets as pandas DataFrames.
🏃To execute the code, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import scipy.stats
# Import the course datasets as DataFrames
grains = pd.read_csv('datasets/grains.csv')
fish = pd.read_csv('datasets/fish.csv', header=None)
wine = pd.read_csv('datasets/wine.csv')
eurovision = pd.read_csv('datasets/eurovision-2016.csv')
stocks = pd.read_csv('datasets/company-stock-movements-2010-2015-incl.csv', index_col=0)
digits = pd.read_csv('datasets/lcd-digits.csv', header=None)
# Preview the first DataFrame
grains
Challenge Yourself
Don't know where to start? Add code to the code cell below to try the following challenge:
You work for an agricultural research center. Your manager wants you to group seed varieties based on different measurements contained in the
grains
DataFrame. They also want to know how your clustering solution compares to the seed types listed in the dataset (thevariety_number
andvariety
columns).Try to use all of the relevant techniques you learned in Unsupervised Learning in Python!
Reminder: To execute the code you add to a cell, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell and automatically switch to the next cell.
# Use this cell (and add others as needed) to cluster the grains data!
Continue to Explore
Feeling confident about your skills? Continue to Machine Learning with Tree-Based Models in Python, or check out the other Machine Learning Scientist with Python Career Track courses to learn other advanced machine learning techniques.
If you're interested in exploring the remaining course datasets, you can refer to the DataFrames and potential problems below:
fish
: Each row represents an individual fish. Standardize the features and cluster the fish by their measurements. You can then compare your cluster labels with the actual fish species (first column).wine
: There are threeclass_labels
in this dataset. Transform the features to get the most accurate clustering.eurovision
: Perform hierarchical clustering of the voting countries usingcomplete
linkage and plot the resulting dendrogram.