Project: Analyzing Unicorn Companies

Did you know that the average return from investing in stocks is 10% per year (not accounting for inflation)? But who wants to be average?!

You have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging. Providing them with this information gives them a competitive insight as to industry trends and how they should structure their portfolio looking forward.

You have been given access to their unicorns database, which contains the following tables:

dates

Column	Description
`company_id`	A unique ID for the company.
`date_joined`	The date that the company became a unicorn.
`year_founded`	The year that the company was founded.

funding

Column	Description
`company_id`	A unique ID for the company.
`valuation`	Company value in US dollars.
`funding`	The amount of funding raised in US dollars.
`select_investors`	A list of key investors in the company.

industries

Column	Description
`company_id`	A unique ID for the company.
`industry`	The industry that the company operates in.

companies

Column	Description
`company_id`	A unique ID for the company.
`company`	The name of the company.
`city`	The city where the company is headquartered.
`country`	The country where the company is headquartered.
`continent`	The continent where the company is headquartered.

The output

Your query should return a table in the following format:

industry	year	num_unicorns	average_valuation_billions
industry1	2021	---	---
industry2	2020	---	---
industry3	2019	---	---
industry1	2021	---	---
industry2	2020	---	---
industry3	2019	---	---
industry1	2021	---	---
industry2	2020	---	---
industry3	2019	---	---

Where industry1, industry2, and industry3 are the three top-performing industries.

DataFrameas

df

variable

with unicorn_data as (
	select d.company_id, 
		date_joined,
		year_founded,
		valuation,
		funding,
		select_investors,
		industry,
		company,
		city,
		country,
		continent
	from dates as d
	join funding as f
	on d.company_id = f.company_id
	join industries as i
	on f.company_id = i.company_id 
	join companies as c
	on i.company_id = c.company_id
),
	
	-- Step 1: Identify top 3 industries based on new unicorns in 2019-2021
top_industries AS (
    SELECT
        industry,
        COUNT(company) AS num_unicorns
    FROM unicorn_data
    WHERE EXTRACT(YEAR FROM date_joined) BETWEEN 2019 AND 2021
    GROUP BY industry
    ORDER BY num_unicorns DESC
    LIMIT 3
),

-- Step 2: Find yearly data for the top industries
industry_yearly_data AS (
    SELECT
        u.industry,
        EXTRACT(YEAR FROM u.date_joined) AS year,
        COUNT(u.company) AS num_unicorns,
        ROUND(AVG(u.valuation) / 1000000000, 2) AS average_valuation_billions
    FROM unicorn_data u
    JOIN top_industries t
        ON u.industry = t.industry
    WHERE EXTRACT(YEAR FROM u.date_joined) BETWEEN 2019 AND 2021
    GROUP BY u.industry, EXTRACT(YEAR FROM u.date_joined)
)

-- Step 3: Return results sorted by year and number of unicorns
SELECT 
    industry,
    year,
    num_unicorns,
    average_valuation_billions
FROM industry_yearly_data
ORDER BY year DESC, num_unicorns DESC;

Analyzing Unicorn Companies

Project Overview

In this project, I analyzed trends in unicorn companies across different industries to provide insights into the top-performing sectors. The goal was to identify the industries with the most new unicorns in 2019-2021, analyze their growth, and evaluate the average valuation of unicorns within these industries. The results aim to assist investment firms in understanding which industries are growing the fastest and which companies are likely to produce the highest returns.

Project Goals

Identify the top 3 industries with the highest number of new unicorns from 2019 to 2021.
Determine the number of unicorns in each of these industries and the year they became unicorns.
Calculate the average valuation of these unicorns in billions of dollars.
Provide the analysis in a clear, readable format, sorted by year and number of unicorns.

Data Sources

The dataset used in this project consists of the following tables:

dates: Contains the date when a company became a unicorn.
funding: Contains details about the company's funding rounds.
industries: Lists the industry to which each company belongs.
companies: Includes details about the companies (e.g., company name, location, and country).

Methodology

Data Preparation: I joined data from multiple tables (dates, funding, industries, companies) to create a comprehensive dataset for analysis.
Top Industries Analysis: I identified the top 3 industries based on the number of new unicorns in 2019-2021.
Industry Yearly Data: For these top industries, I calculated the number of unicorns per year and their average valuation (in billions).
Results: I returned the results in a table format, sorted by year and number of unicorns in descending order.

Key Features

SQL Aggregations: Used to calculate the number of unicorns per industry and year.
Joins: Combined data from multiple sources to generate comprehensive insights.
Data Grouping: Grouped data by industry and year for better analysis.
Valuation Analysis: Provided average valuations in billions of dollars, rounded to two decimal places.

Skills Used

SQL
Data Analysis
Data Manipulation
Aggregations and Grouping
Date Functions in SQL
JOIN operations
Business Insights

Conclusion

This project provided valuable insights into the trends of unicorn companies, helping to identify the fastest-growing industries and offering crucial data for investment decisions. It combines SQL skills with data manipulation and aggregation techniques to generate actionable business insights.