Skip to content
Unsupervised Learning in Python
  • AI Chat
  • Code
  • Report
  • Spinner

    Unsupervised Learning in Python

    Run the hidden code cell below to import the data used in this course.

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import sklearn
    import scipy.stats 
    
    # Import the course datasets 
    grains = pd.read_csv('datasets/grains.csv')
    fish = pd.read_csv('datasets/fish.csv', header=None)
    wine = pd.read_csv('datasets/wine.csv')
    eurovision = pd.read_csv('datasets/eurovision-2016.csv')
    stocks = pd.read_csv('datasets/company-stock-movements-2010-2015-incl.csv', index_col=0)
    digits = pd.read_csv('datasets/lcd-digits.csv', header=None)
    
    
    print(eurovision.head())
    print(eurovision.shape)
    print(eurovision['From country'].unique().shape)
    
    from scipy.cluster.hierarchy import linkage, dendrogram
    
    # Calculate the linkage: mergings
    #mergings = linkage(samples, method='single')
    
    # Plot the dendrogram
    #dendrogram(mergings, labels=country_names, leaf_rotation=90, leaf_font_size=6)
    #plt.show()

    Take Notes

    Add notes about the concepts you've learned and code cells with code you want to keep.

    Add your notes here

    # Add your code snippets here

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • You work for an agricultural research center. Your manager wants you to group seed varieties based on different measurements contained in the grains DataFrame. They also want to know how your clustering solution compares to the seed types listed in the dataset (the variety_number and variety columns). Try to use all of the relevant techniques you learned in Unsupervised Learning in Python!
    • In the fish DataFrame, each row represents an individual fish. Standardize the features and cluster the fish by their measurements. You can then compare your cluster labels with the actual fish species (first column).
    • In the wine DataFrame, there are three class_labels in this dataset. Transform the features to get the most accurate clustering.
    • In the eurovision DataFrame, perform hierarchical clustering of the voting countries using complete linkage and plot the resulting dendrogram.