Skip to content

Did you know that the average return from investing in stocks is 10% per year (not accounting for inflation)? But who wants to be average?!

You have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging. Providing them with this information gives them a competitive insight as to industry trends and how they should structure their portfolio looking forward.

You have been given access to their unicorns database, which contains the following tables:

dates

ColumnDescription
company_idA unique ID for the company.
date_joinedThe date that the company became a unicorn.
year_foundedThe year that the company was founded.

funding

ColumnDescription
company_idA unique ID for the company.
valuationCompany value in US dollars.
fundingThe amount of funding raised in US dollars.
select_investorsA list of key investors in the company.

industries

ColumnDescription
company_idA unique ID for the company.
industryThe industry that the company operates in.

companies

ColumnDescription
company_idA unique ID for the company.
companyThe name of the company.
cityThe city where the company is headquartered.
countryThe country where the company is headquartered.
continentThe continent where the company is headquartered.

The output

Your query should return a table in the following format:

industryyearnum_unicornsaverage_valuation_billions
industry12021------
industry22020------
industry32019------
industry12021------
industry22020------
industry32019------
industry12021------
industry22020------
industry32019------

Where industry1, industry2, and industry3 are the three top-performing industries.

Spinner
DataFrameas
df
variable
with unicorn_data as (
	select d.company_id, 
		date_joined,
		year_founded,
		valuation,
		funding,
		select_investors,
		industry,
		company,
		city,
		country,
		continent
	from dates as d
	join funding as f
	on d.company_id = f.company_id
	join industries as i
	on f.company_id = i.company_id 
	join companies as c
	on i.company_id = c.company_id
),
	
	-- Step 1: Identify top 3 industries based on new unicorns in 2019-2021
top_industries AS (
    SELECT
        industry,
        COUNT(company) AS num_unicorns
    FROM unicorn_data
    WHERE EXTRACT(YEAR FROM date_joined) BETWEEN 2019 AND 2021
    GROUP BY industry
    ORDER BY num_unicorns DESC
    LIMIT 3
),

-- Step 2: Find yearly data for the top industries
industry_yearly_data AS (
    SELECT
        u.industry,
        EXTRACT(YEAR FROM u.date_joined) AS year,
        COUNT(u.company) AS num_unicorns,
        ROUND(AVG(u.valuation) / 1000000000, 2) AS average_valuation_billions
    FROM unicorn_data u
    JOIN top_industries t
        ON u.industry = t.industry
    WHERE EXTRACT(YEAR FROM u.date_joined) BETWEEN 2019 AND 2021
    GROUP BY u.industry, EXTRACT(YEAR FROM u.date_joined)
)

-- Step 3: Return results sorted by year and number of unicorns
SELECT 
    industry,
    year,
    num_unicorns,
    average_valuation_billions
FROM industry_yearly_data
ORDER BY year DESC, num_unicorns DESC;

Analyzing Unicorn Companies

Project Overview

In this project, I analyzed trends in unicorn companies across different industries to provide insights into the top-performing sectors. The goal was to identify the industries with the most new unicorns in 2019-2021, analyze their growth, and evaluate the average valuation of unicorns within these industries. The results aim to assist investment firms in understanding which industries are growing the fastest and which companies are likely to produce the highest returns.

Project Goals

  • Identify the top 3 industries with the highest number of new unicorns from 2019 to 2021.
  • Determine the number of unicorns in each of these industries and the year they became unicorns.
  • Calculate the average valuation of these unicorns in billions of dollars.
  • Provide the analysis in a clear, readable format, sorted by year and number of unicorns.

Data Sources

The dataset used in this project consists of the following tables:

  • dates: Contains the date when a company became a unicorn.
  • funding: Contains details about the company's funding rounds.
  • industries: Lists the industry to which each company belongs.
  • companies: Includes details about the companies (e.g., company name, location, and country).

Methodology

  1. Data Preparation: I joined data from multiple tables (dates, funding, industries, companies) to create a comprehensive dataset for analysis.
  2. Top Industries Analysis: I identified the top 3 industries based on the number of new unicorns in 2019-2021.
  3. Industry Yearly Data: For these top industries, I calculated the number of unicorns per year and their average valuation (in billions).
  4. Results: I returned the results in a table format, sorted by year and number of unicorns in descending order.

Key Features

  • SQL Aggregations: Used to calculate the number of unicorns per industry and year.
  • Joins: Combined data from multiple sources to generate comprehensive insights.
  • Data Grouping: Grouped data by industry and year for better analysis.
  • Valuation Analysis: Provided average valuations in billions of dollars, rounded to two decimal places.

Skills Used

  • SQL
  • Data Analysis
  • Data Manipulation
  • Aggregations and Grouping
  • Date Functions in SQL
  • JOIN operations
  • Business Insights

Conclusion

This project provided valuable insights into the trends of unicorn companies, helping to identify the fastest-growing industries and offering crucial data for investment decisions. It combines SQL skills with data manipulation and aggregation techniques to generate actionable business insights.