This is a simple example of a project or task using SQL in the context of answering data related questions for managers in a workplace setting.
In this case, we have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging.
The specific task at hand, as solicited by management, is to identify the three best-performing industries based on the number of new unicorns created over the 2019 - 2021 period combined. They have requested a list that includes:
A - The industry
B - The year
C - The number of companies in these industries that became unicorns each year
D - The average valuation per industry per year, converted to billions
As the firm is interested in trends for the top-performing industries, the results should be displayed by year in descending order.
We have been given access to their unicorns
database, which contains the following tables:
dates
Column | Description |
---|---|
company_id | A unique ID for the company. |
date_joined | The date that the company became a unicorn. |
year_founded | The year that the company was founded. |
funding
Column | Description |
---|---|
company_id | A unique ID for the company. |
valuation | Company value in US dollars. |
funding | The amount of funding raised in US dollars. |
select_investors | A list of key investors in the company. |
industries
Column | Description |
---|---|
company_id | A unique ID for the company. |
industry | The industry that the company operates in. |
companies
Column | Description |
---|---|
company_id | A unique ID for the company. |
company | The name of the company. |
city | The city where the company is headquartered. |
country | The country where the company is headquartered. |
continent | The continent where the company is headquartered. |
Before jumping into the main query we want to identify those top 3 industries:
--top 3 industries
SELECT industries.industry, COUNT(dates.company_id) AS "#_companies"
FROM dates
JOIN industries ON industries.company_id = dates.company_id
WHERE date_part('year', dates.date_joined) BETWEEN 2019 AND 2021
GROUP BY industries.industry
ORDER BY COUNT(dates.company_id) DESC
LIMIT 3
After identifying the top 3 we can form the full list as requested:
-- Final list
SELECT industries.industry AS "industry",
date_part('year', dates.date_joined) AS "year",
COUNT(dates.company_id) AS "num_unicorns",
ROUND(AVG(funding.valuation) / 1000000000, 2) AS "average_valuation_billions"
FROM industries
JOIN dates ON industries.company_id = dates.company_id
JOIN funding ON funding.company_id = dates.company_id
WHERE industries.industry IN ('Fintech',
'Internet software & services',
'E-commerce & direct-to-consumer') AND date_part('year', dates.date_joined) BETWEEN 2019 AND 2021
GROUP BY industries.industry, date_part('year', dates.date_joined)
ORDER BY industries.industry, date_part('year', dates.date_joined) DESC
After producing the list we see that the top 3 industries are: E-commerce, Internet software & services and Fintech.
The highest number of unicorns in a single year was 138 in 2021 within the Fintech industry (a big jump compared to the previous years)
The peak average valuation (per company) was reached by the Internet sfotware & services industry in 2020 with 4.35 billion dollars!
Now we can share this initial list with management and wait for further requests or questions.