Did you know that the average return from investing in stocks is 10% per year (not accounting for inflation)? But who wants to be average?!
You have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging. Providing them with this information gives them a competitive insight as to industry trends and how they should structure their portfolio looking forward.
You have been given access to their unicorns
database, which contains the following tables:
dates
Column | Description |
---|---|
company_id | A unique ID for the company. |
date_joined | The date that the company became a unicorn. |
year_founded | The year that the company was founded. |
funding
Column | Description |
---|---|
company_id | A unique ID for the company. |
valuation | Company value in US dollars. |
funding | The amount of funding raised in US dollars. |
select_investors | A list of key investors in the company. |
industries
Column | Description |
---|---|
company_id | A unique ID for the company. |
industry | The industry that the company operates in. |
companies
Column | Description |
---|---|
company_id | A unique ID for the company. |
company | The name of the company. |
city | The city where the company is headquartered. |
country | The country where the company is headquartered. |
continent | The continent where the company is headquartered. |
The output
Your query should return a table in the following format:
industry | year | num_unicorns | average_valuation_billions |
---|---|---|---|
industry1 | 2021 | --- | --- |
industry2 | 2020 | --- | --- |
industry3 | 2019 | --- | --- |
industry1 | 2021 | --- | --- |
industry2 | 2020 | --- | --- |
industry3 | 2019 | --- | --- |
industry1 | 2021 | --- | --- |
industry2 | 2020 | --- | --- |
industry3 | 2019 | --- | --- |
Where industry1
, industry2
, and industry3
are the three top-performing industries.
The Work
Step 1 - Looking at each column from the tables
Despite the table schema already provided above, it's also good practice to look at how the actual data on the table look. To do that, we showed the first 5 rows of each tables by selecting all columns and limiting the output to 5.
-- Showing the dates table data
SELECT * FROM dates LIMIT 5;
-- Showing the funding table data
SELECT * FROM funding LIMIT 5;
-- Showing the industries table data
SELECT * FROM industries LIMIT 5;
-- Showing the companies table data
SELECT * FROM companies LIMIT 5;
-- note that there is a null data in city column. Since we did'nt use the city column, it was left as is
Step 2 - Get the industry rank based on the total number of new unicorns from 2019 - 2021
To do that, two table were merged, the industries table to get the name of the industries and the dates table to filter the analysis just for the newcomers in 2019, 2020, and 2021. To rank the Industry, rank() window function is used over the number of unique id in each industries.
note : this query will act as a CTE in the full query that will be done in the later step. For this step, we do this query independently to show the result of this query.
SELECT industry,
COUNT(DISTINCT company_id) AS newcomer_number,
RANK() OVER(ORDER BY COUNT(DISTINCT company_id) DESC)
FROM dates
JOIN industries
USING(company_id)
WHERE EXTRACT(year FROM date_joined) IN (2019,2020,2021)
GROUP BY industry
Step 3 - Get the average valuations and newcomers number of each industry in each year from 2019 - 2021.
In order to do that, three table were merged using company_id column. The industries table is used to get the industry names, the funding table is used to get the average valuations, and the dates table is used to filter the years used for analysis to only in 2019, 2020, 2021
note : this query will act as a CTE in the full query that will be done in the later step. For this step, we do this query independently to show the result of this query.
SELECT industry,
d.year,
ROUND(AVG(f.valuation),2) AS avg_valuation_billions,
COUNT(i.company_id) as newcomers
FROM industries AS i
JOIN (
SELECT company_id, valuation/1000000000 AS valuation
FROM funding
) AS f
USING(company_id)
JOIN (
SELECT company_id, EXTRACT(year FROM date_joined) AS year
FROM dates
) AS d
USING(company_id)
WHERE d.year IN (2019,2020,2021)
GROUP BY industry, d.year
Step 4 - recreate the previous two queries as CTE and join them to match the desired output
Using WITH statement, the previous queries are converted to CTE. The query on step on is named best_industry, while the query on the step 2 is named industry_data.
Then, the two CTE's are joined, with the best_industry CTE for filtering the best three industries and the industry_data providing the data of each best industries from 2019 - 2021.
WITH best_industry AS(
SELECT industry,
COUNT(DISTINCT company_id) AS newcomer_number,
RANK() OVER(ORDER BY COUNT(DISTINCT company_id) DESC)
FROM dates
JOIN industries
USING(company_id)
WHERE EXTRACT(year FROM date_joined) IN (2019,2020,2021)
GROUP BY industry),
industry_data AS(
SELECT industry,
d.year,
ROUND(AVG(f.valuation),2) AS avg_valuation_billions,
COUNT(i.company_id) as newcomers
FROM industries AS i
JOIN (
SELECT company_id, valuation/1000000000 AS valuation
FROM funding
) AS f
USING(company_id)
JOIN (
SELECT company_id, EXTRACT(year FROM date_joined) AS year
FROM dates
) AS d
USING(company_id)
WHERE d.year IN (2019,2020,2021)
GROUP BY industry, d.year
)
SELECT industry,
year,
newcomers AS num_unicorns,
avg_valuation_billions
FROM industry_data
JOIN best_industry
USING(industry)
WHERE rank IN(1,2,3)
ORDER BY rank, year DESC, num_unicorns DESC;