Skip to content

Did you know that the average return from investing in stocks is 10% per year! But who wants to be average?!

You have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging. Providing them with this information gives them a competitive insight as to industry trends and how they should structure their portfolio looking forward.

You have been given access to their unicorns database, which contains the following tables:

dates

ColumnDescription
company_idA unique ID for the company.
date_joinedThe date that the company became a unicorn.
year_foundedThe year that the company was founded.

funding

ColumnDescription
company_idA unique ID for the company.
valuationCompany value in US dollars.
fundingThe amount of funding raised in US dollars.
select_investorsA list of key investors in the company.

industries

ColumnDescription
company_idA unique ID for the company.
industryThe industry that the company operates in.

companies

ColumnDescription
company_idA unique ID for the company.
companyThe name of the company.
cityThe city where the company is headquartered.
countryThe country where the company is headquartered.
continentThe continent where the company is headquartered.
Spinner
DataFrameas
df
variable
SELECT industry, count(industries.company_id) AS company_count
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3;

Next, use the above code as a CTE

Spinner
DataFrameas
df1
variable
WITH top3 AS
 (SELECT industry, 
  count(industries.company_id) AS company_count,
  rank() OVER( ORDER BY count(industries.company_id) DESC) AS rank
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3)

SELECT industries.industry, 
	date_part('year',date_joined) AS year,
	--round(valuation/1000000000,2)
	count(*) AS num_unicorns,
	round(avg(valuation)/1000000000,2) AS average_valuation_billions
FROM industries 
	LEFT JOIN dates
		ON industries.company_id= dates.company_id
	LEFT JOIN funding
		ON industries.company_id= industries.company_id
	LEFT JOIN top3
		ON industries.industry= top3.industry
WHERE industries.industry IN 
	(SELECT industry
	 FROM top3) AND
	 date_part('year',date_joined) IN (2019,2010,2021)
GROUP BY rank, industries.industry, year
ORDER BY rank, year DESC;
Spinner
DataFrameas
df2
variable
WITH top3 AS
 (SELECT industry, 
  count(industries.company_id) AS company_count,
  rank() OVER( ORDER BY count(industries.company_id) DESC) AS rank
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3)

SELECT industries.industry, 
	date_part('year',date_joined) AS year,
	avg(funding.valuation),
	count(*) AS num_unicorns
	--round(avg(valuation)/1000000000,2) AS average_valuation_billions
FROM companies
	LEFT JOIN industries
		ON industries.company_id= companies.company_id
	LEFT JOIN dates
		ON industries.company_id= dates.company_id
	LEFT JOIN funding
		ON industries.company_id= industries.company_id
	LEFT JOIN top3
		ON industries.industry= top3.industry
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31' AND
	industries.industry IN 
		(SELECT industry
		 FROM top3)
GROUP BY rank, industries.industry, year
ORDER BY rank, year DESC;
Spinner
DataFrameas
df3
variable
WITH top3 AS
 (SELECT industry, 
  count(industries.company_id) AS company_count,
  rank() OVER( ORDER BY count(industries.company_id) DESC) AS rank
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3)

SELECT industries.industry, 
	date_part('year',date_joined) AS year
	--avg(funding.valuation),
	--count(*) AS num_unicorns
	--round(avg(valuation)/1000000000,2) AS average_valuation_billions

	FROM companies
		LEFT JOIN industries
			ON industries.company_id= companies.company_id
		LEFT JOIN dates
			ON companies.company_id= dates.company_id
		LEFT JOIN funding
			ON companies.company_id= industries.company_id
	WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31' AND
		industries.industry IN 
			(SELECT industry
		 	FROM top3)


--GROUP BY rank, joins.industry, year
--ORDER BY rank, year DESC;
Spinner
DataFrameas
df4
variable
WITH top3 AS 
 (SELECT industry,
  		COUNT(company_id) AS cnt
  FROM industries
  GROUP BY industry
  ORDER BY cnt DESC
  LIMIT 3)
  
SELECT industry,
	date_part('year',date_joined) AS year,
	count(*) AS num_unicorns,
	round(avg(valuation)/1000000000,2) AS average_valuation_billions
FROM companies LEFT JOIN industries
	ON companies.company_id= industries.company_id
 LEFT JOIN dates
	ON companies.company_id= dates.company_id
LEFT JOIN funding
	ON companies.company_id= funding.company_id
WHERE industries.industry IN
	(SELECT industry
	FROM top3) AND
	dates.date_joined >= '2019-01-01' AND
	dates.date_joined <= '2021-12-31'
GROUP BY industry, year
ORDER BY industry, year DESC;