(Data Analysis) EDA: find out what's in your dataset

Find out what's in your dataset

Please note: This activity was developed to provide a practical example to support the article "EDA: understanding the process through the PACE framework" featured on my personal Blog.

Introduction

In this activity, you will discover the features of a data set and use visualisations to analyse them. In this way, you will develop and strengthen your exploratory data analysis (EDA) skills and knowledge of the features that allow you to explore and visualise data.

EDA is an essential process in the data science workflow. As a data professional, you will need to lead this process to understand the data at hand better and determine how to use it to solve the problem you want to address. This activity will allow you to put this process into practice and prepare you for EDA in future projects.

In this activity, you are a member of an analysis team providing information to an investment company. To help it decide which companies to invest in, the company wants to have information on unicorn companies worth more than $1 billion. The data you will use for this task provide information on over 1,000 unicorn companies, including industry, country, year founded, and selected investors. You will use this information to understand how and when companies reach this prestigious milestone and suggest the investing company's next steps.

Step 1: Imports

Importing libraries and packages.

First, import the relevant Python libraries and modules. Use the pandas library and the matplotlib.pyplot module.

# Import libraries and packages
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt

Load the dataset into a DataFrame.

The dataset provided is a CSV file called Unicorn_Companies.csv containing a subset of data on unicorn companies. Load the data from the CSV file into a Data Frame and save it in a variable.

# Load data from the CSV file into a DataFrame and save it in a variable
companies = pd.read_csv("data/Unicorn_Companies.csv")

Step 2: Data exploration

View the first 10 rows of data.

Next, explore the dataset and answer the questions that will guide you in researching and analyzing the data. To begin, visualise the first ten rows of the data to understand how the dataset is structured.

‌
‌
‌