Skip to content
(Data Analysis) EDA: find out what's in your dataset
  • AI Chat
  • Code
  • Report
  • Find out what's in your dataset

    Please note: This activity was developed to provide a practical example to support the article "EDA: understanding the process through the PACE framework" featured on my personal Blog.

    Introduction

    In this activity, you will discover the features of a data set and use visualisations to analyse them. In this way, you will develop and strengthen your exploratory data analysis (EDA) skills and knowledge of the features that allow you to explore and visualise data.

    EDA is an essential process in the data science workflow. As a data professional, you will need to lead this process to understand the data at hand better and determine how to use it to solve the problem you want to address. This activity will allow you to put this process into practice and prepare you for EDA in future projects.

    In this activity, you are a member of an analysis team providing information to an investment company. To help it decide which companies to invest in, the company wants to have information on unicorn companies worth more than $1 billion. The data you will use for this task provide information on over 1,000 unicorn companies, including industry, country, year founded, and selected investors. You will use this information to understand how and when companies reach this prestigious milestone and suggest the investing company's next steps.





    Step 1: Imports

    Importing libraries and packages.

    First, import the relevant Python libraries and modules. Use the pandas library and the matplotlib.pyplot module.

    # Import libraries and packages
    import pandas as pd
    import matplotlib.pyplot as plt
    import datetime as dt

    Load the dataset into a DataFrame.

    The dataset provided is a CSV file called Unicorn_Companies.csv containing a subset of data on unicorn companies. Load the data from the CSV file into a Data Frame and save it in a variable.

    # Load data from the CSV file into a DataFrame and save it in a variable
    companies = pd.read_csv("data/Unicorn_Companies.csv")




    Step 2: Data exploration

    View the first 10 rows of data.

    Next, explore the dataset and answer the questions that will guide you in researching and analyzing the data. To begin, visualise the first ten rows of the data to understand how the dataset is structured.