Project: Analyzing Unicorn Companies

Did you know that the average return from investing in stocks is 10% per year! But who wants to be average?!

You have been asked to support an investment firm by analyzing trends in high-growth companies. They are interested in understanding which industries are producing the highest valuations and the rate at which new high-value companies are emerging. Providing them with this information gives them a competitive insight as to industry trends and how they should structure their portfolio looking forward.

You have been given access to their unicorns database, which contains the following tables:

dates

Column	Description
company_id	A unique ID for the company.
date_joined	The date that the company became a unicorn.
year_founded	The year that the company was founded.

funding

Column	Description
company_id	A unique ID for the company.
valuation	Company value in US dollars.
funding	The amount of funding raised in US dollars.
select_investors	A list of key investors in the company.

industries

Column	Description
company_id	A unique ID for the company.
industry	The industry that the company operates in.

companies

Column	Description
company_id	A unique ID for the company.
company	The name of the company.
city	The city where the company is headquartered.
country	The country where the company is headquartered.
continent	The continent where the company is headquartered.

DataFrameas

df

variable

SELECT industry, count(industries.company_id) AS company_count
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3;

Next, use the above code as a CTE

DataFrameas

df1

variable

WITH top3 AS
 (SELECT industry, 
  count(industries.company_id) AS company_count,
  rank() OVER( ORDER BY count(industries.company_id) DESC) AS rank
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3)

SELECT industries.industry, 
	date_part('year',date_joined) AS year,
	--round(valuation/1000000000,2)
	count(*) AS num_unicorns,
	round(avg(valuation)/1000000000,2) AS average_valuation_billions
FROM industries 
	LEFT JOIN dates
		ON industries.company_id= dates.company_id
	LEFT JOIN funding
		ON industries.company_id= industries.company_id
	LEFT JOIN top3
		ON industries.industry= top3.industry
WHERE industries.industry IN 
	(SELECT industry
	 FROM top3) AND
	 date_part('year',date_joined) IN (2019,2010,2021)
GROUP BY rank, industries.industry, year
ORDER BY rank, year DESC;

DataFrameas

df2

variable

WITH top3 AS
 (SELECT industry, 
  count(industries.company_id) AS company_count,
  rank() OVER( ORDER BY count(industries.company_id) DESC) AS rank
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3)

SELECT industries.industry, 
	date_part('year',date_joined) AS year,
	avg(funding.valuation),
	count(*) AS num_unicorns
	--round(avg(valuation)/1000000000,2) AS average_valuation_billions
FROM companies
	LEFT JOIN industries
		ON industries.company_id= companies.company_id
	LEFT JOIN dates
		ON industries.company_id= dates.company_id
	LEFT JOIN funding
		ON industries.company_id= industries.company_id
	LEFT JOIN top3
		ON industries.industry= top3.industry
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31' AND
	industries.industry IN 
		(SELECT industry
		 FROM top3)
GROUP BY rank, industries.industry, year
ORDER BY rank, year DESC;

DataFrameas

df3

variable

WITH top3 AS
 (SELECT industry, 
  count(industries.company_id) AS company_count,
  rank() OVER( ORDER BY count(industries.company_id) DESC) AS rank
FROM industries LEFT JOIN dates
	ON industries.company_id= dates.company_id
WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31'
GROUP BY industry
ORDER BY company_count DESC
LIMIT 3)

SELECT industries.industry, 
	date_part('year',date_joined) AS year
	--avg(funding.valuation),
	--count(*) AS num_unicorns
	--round(avg(valuation)/1000000000,2) AS average_valuation_billions

	FROM companies
		LEFT JOIN industries
			ON industries.company_id= companies.company_id
		LEFT JOIN dates
			ON companies.company_id= dates.company_id
		LEFT JOIN funding
			ON companies.company_id= industries.company_id
	WHERE date_joined>= '2019-01-01' AND date_joined<= '2021-12-31' AND
		industries.industry IN 
			(SELECT industry
		 	FROM top3)


--GROUP BY rank, joins.industry, year
--ORDER BY rank, year DESC;

DataFrameas

df4

variable

WITH top3 AS 
 (SELECT industry,
  		COUNT(company_id) AS cnt
  FROM industries
  GROUP BY industry
  ORDER BY cnt DESC
  LIMIT 3)
  
SELECT industry,
	date_part('year',date_joined) AS year,
	count(*) AS num_unicorns,
	round(avg(valuation)/1000000000,2) AS average_valuation_billions
FROM companies LEFT JOIN industries
	ON companies.company_id= industries.company_id
 LEFT JOIN dates
	ON companies.company_id= dates.company_id
LEFT JOIN funding
	ON companies.company_id= funding.company_id
WHERE industries.industry IN
	(SELECT industry
	FROM top3) AND
	dates.date_joined >= '2019-01-01' AND
	dates.date_joined <= '2021-12-31'
GROUP BY industry, year
ORDER BY industry, year DESC;