Skip to content

NOTE TO SELF

Before you begin: The "initial_testing_of_datacamp.ipynb" has some useful stuff. But this was written based on old knowledge of dataframe functionality (before completing datacamp courses).

Notes:

  • The completed course "Data manipulation with pandas" is highly useful for this project.
    • Particularly the later stages are useful:
      • (3) Slicing and indexing dataframes
      • (4) Creating and vizualizing dataframes
  • The completed course "Introduction to statistics in Python" is highly useful for this project.
    • Particularly the items that focus on handling data spread and cut-off.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

Analyzing unicorn company data

In this workspace, we'll be exploring the relationship between total funding a company receives and its valuation.

Spinner
DataFrameas
df
variable
SELECT * FROM companies INNER JOIN funding USING(company_id)
print("Number of data points for this analysis (number of valuation-funding pairs) = " + str(len(df)))
# check for empty data
df.isna().sum()
df.head()
# make a copy of the input dataframe to enable checks towards the input data during data analysis
df_cp = df.copy()
# set order of data sorting (based on columns)
SORT_ORDER = ["continent", "country", "city"]
# sort data based on the input SORT_ORDER
df_srt = df.sort_values(SORT_ORDER)
# arrange dataframe columns such that the SORT_ORDER appears 1st
new_columns = SORT_ORDER
for c in df.columns:
    if c in SORT_ORDER:
        continue
    new_columns.append(c)
df_srt = df_srt[new_columns]
df_srt
# scatter plot of valuation vs funding (all data)
df_srt.plot(kind="scatter",x="funding",y="valuation")
plt.show()
# group funding by continent
funding_per_continent = df_srt.groupby("continent")["funding"].mean()
print("Mean value of funding per continent:\n")
print(funding_per_continent)
# group valuation by continent
valuation_per_continent = df_srt.groupby("continent")["valuation"].mean()
print("\nMean value of valuation per continent:\n")
print(valuation_per_continent)
# vizualise above results 
funding_per_continent.plot(kind="line",label="funding")
valuation_per_continent.plot(kind="line", label="valuation")
plt.legend()
plt.ylabel("Valuation [-]")
plt.title("Mean value of funding-valuation relations by Continent")
plt.show()
Open the video in a new tab