Did you know that the average return from investing in stocks is 10% per year (not accounting for inflation)? But who wants to be average?!
You have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging. Providing them with this information gives them a competitive insight as to industry trends and how they should structure their portfolio looking forward.
You have been given access to their unicorns database, which contains the following tables:
dates
| Column | Description |
|---|---|
company_id | A unique ID for the company. |
date_joined | The date that the company became a unicorn. |
year_founded | The year that the company was founded. |
funding
| Column | Description |
|---|---|
company_id | A unique ID for the company. |
valuation | Company value in US dollars. |
funding | The amount of funding raised in US dollars. |
select_investors | A list of key investors in the company. |
industries
| Column | Description |
|---|---|
company_id | A unique ID for the company. |
industry | The industry that the company operates in. |
companies
| Column | Description |
|---|---|
company_id | A unique ID for the company. |
company | The name of the company. |
city | The city where the company is headquartered. |
country | The country where the company is headquartered. |
continent | The continent where the company is headquartered. |
The output
Your query should return a table in the following format:
| industry | year | num_unicorns | average_valuation_billions |
|---|---|---|---|
| industry1 | 2021 | --- | --- |
| industry2 | 2020 | --- | --- |
| industry3 | 2019 | --- | --- |
| industry1 | 2021 | --- | --- |
| industry2 | 2020 | --- | --- |
| industry3 | 2019 | --- | --- |
| industry1 | 2021 | --- | --- |
| industry2 | 2020 | --- | --- |
| industry3 | 2019 | --- | --- |
Where industry1, industry2, and industry3 are the three top-performing industries.
with unicorn_data as (
select d.company_id,
date_joined,
year_founded,
valuation,
funding,
select_investors,
industry,
company,
city,
country,
continent
from dates as d
join funding as f
on d.company_id = f.company_id
join industries as i
on f.company_id = i.company_id
join companies as c
on i.company_id = c.company_id
),
-- Step 1: Identify top 3 industries based on new unicorns in 2019-2021
top_industries AS (
SELECT
industry,
COUNT(company) AS num_unicorns
FROM unicorn_data
WHERE EXTRACT(YEAR FROM date_joined) BETWEEN 2019 AND 2021
GROUP BY industry
ORDER BY num_unicorns DESC
LIMIT 3
),
-- Step 2: Find yearly data for the top industries
industry_yearly_data AS (
SELECT
u.industry,
EXTRACT(YEAR FROM u.date_joined) AS year,
COUNT(u.company) AS num_unicorns,
ROUND(AVG(u.valuation) / 1000000000, 2) AS average_valuation_billions
FROM unicorn_data u
JOIN top_industries t
ON u.industry = t.industry
WHERE EXTRACT(YEAR FROM u.date_joined) BETWEEN 2019 AND 2021
GROUP BY u.industry, EXTRACT(YEAR FROM u.date_joined)
)
-- Step 3: Return results sorted by year and number of unicorns
SELECT
industry,
year,
num_unicorns,
average_valuation_billions
FROM industry_yearly_data
ORDER BY year DESC, num_unicorns DESC;Analyzing Unicorn Companies
Project Overview
In this project, I analyzed trends in unicorn companies across different industries to provide insights into the top-performing sectors. The goal was to identify the industries with the most new unicorns in 2019-2021, analyze their growth, and evaluate the average valuation of unicorns within these industries. The results aim to assist investment firms in understanding which industries are growing the fastest and which companies are likely to produce the highest returns.
Project Goals
- Identify the top 3 industries with the highest number of new unicorns from 2019 to 2021.
- Determine the number of unicorns in each of these industries and the year they became unicorns.
- Calculate the average valuation of these unicorns in billions of dollars.
- Provide the analysis in a clear, readable format, sorted by year and number of unicorns.
Data Sources
The dataset used in this project consists of the following tables:
dates: Contains the date when a company became a unicorn.funding: Contains details about the company's funding rounds.industries: Lists the industry to which each company belongs.companies: Includes details about the companies (e.g., company name, location, and country).
Methodology
- Data Preparation: I joined data from multiple tables (
dates,funding,industries,companies) to create a comprehensive dataset for analysis. - Top Industries Analysis: I identified the top 3 industries based on the number of new unicorns in 2019-2021.
- Industry Yearly Data: For these top industries, I calculated the number of unicorns per year and their average valuation (in billions).
- Results: I returned the results in a table format, sorted by year and number of unicorns in descending order.
Key Features
- SQL Aggregations: Used to calculate the number of unicorns per industry and year.
- Joins: Combined data from multiple sources to generate comprehensive insights.
- Data Grouping: Grouped data by industry and year for better analysis.
- Valuation Analysis: Provided average valuations in billions of dollars, rounded to two decimal places.
Skills Used
- SQL
- Data Analysis
- Data Manipulation
- Aggregations and Grouping
- Date Functions in SQL
- JOIN operations
- Business Insights
Conclusion
This project provided valuable insights into the trends of unicorn companies, helping to identify the fastest-growing industries and offering crucial data for investment decisions. It combines SQL skills with data manipulation and aggregation techniques to generate actionable business insights.