Identifying Trends Amongst Most Successful Startups in the US

Project

Identifying Trends Amongst the Most Successful Startups in the US

Source

DataCamp's "Unicorn Companies" PostgreSQL database

Objective

Analyze data for US-based "Unicorn Companies" (startup companies valued over $1 billion) and identify:

The most valuable startups in the US
The most common US cities where "Unicorn Companies" are based out of
The most common US industries "Unicorn Companies" specialize in
Most common investors for American "Unicorn Companies"

1. Write a query that returns all American "Unicorn Companies" along with their city, industry, investors, funding, and valuation.

Queryas

unicorns_usa

variable

SELECT 
	companies.company_id, 
	company, city, industry, select_investors,
    ROUND((funding/1000000000.0),1) AS funding_billions, 
    ROUND((valuation/1000000000.0),1) AS valuation_billions 
FROM companies
INNER JOIN industries ON companies.company_id = industries.company_id
INNER JOIN funding ON companies.company_id = funding.company_id
WHERE country = 'United States';

2. Identify and visualize top 10 most valuable startups in the US

DataFrameas

usa_top_startups

variable

SELECT 
	company, 
	valuation_billions,
	RANK() OVER (ORDER BY valuation_billions DESC) AS rank
FROM unicorns_usa
ORDER BY valuation_billions DESC
LIMIT 11;

#import packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#define ranking column modifier (adds a 'T-' to the number if it's tied with another entry)
def rank_mod(df):
    df['rank'] = df['rank'].astype(str)
    dup_ranks = df.groupby('rank').filter(lambda x: len(x) > 1)
    for index, row in dup_ranks.iterrows():
        df.loc[index, 'rank'] = 'T-' + row['rank']

#create label column
rank_mod(usa_top_startups)
usa_top_startups['label'] = usa_top_startups['rank'] + ". " + usa_top_startups['company']


#create bar graph
startups_vis = sns.barplot(data=usa_top_startups, y='label',x='valuation_billions', color='forestgreen').set(title='Top 10 Most Valuable Startups in US', xlabel='Valuation ($ Billions)', ylabel='')
plt.show()

3. Identify and visualize top 10 most popular US cities for "Unicorn Companies"

DataFrameas

usa_top_cities

variable

SELECT 
	city,
	count(company_id) AS num_companies, 
	RANK() OVER (ORDER BY count(company_id) DESC) AS rank
FROM unicorns_usa
GROUP BY city 
ORDER BY num_companies DESC
LIMIT 11;

#create label column
rank_mod(usa_top_cities)
usa_top_cities['label'] = usa_top_cities['rank'] + ". " + usa_top_cities['city']

#create bar graph
cities_vis = sns.barplot(data=usa_top_cities, y='label',x='num_companies', color = 'forestgreen').set(title='Top 10 US Cities with Most Startups Valued Over $1B', xlabel='# of Companies', ylabel='')
plt.show()

4. Identify and visualize top 10 most poular US industries for "Unicorn Companies"

DataFrameas

usa_top_industries

variable

SELECT 
	industry,
	count(company_id) AS num_companies, 
	RANK() OVER (ORDER BY count(company_id) DESC) AS rank
FROM unicorns_usa
GROUP BY industry 
ORDER BY num_companies DESC
LIMIT 10;

#create label column
rank_mod(usa_top_industries)
usa_top_industries['label'] = usa_top_industries['rank'] + ". " + usa_top_industries['industry']

#create bar graph
cities_vis = sns.barplot(data=usa_top_industries, y='label',x='num_companies', color = 'forestgreen').set(title='Top 10 US Industries with Most Startups Valued Over $1B', xlabel='# of Companies', ylabel='')
plt.show()

5. Identify and visualize top 10 investors who have invested in the most American "Unicorn Companies"

DataFrameas

usa_investors

variable

SELECT company, select_investors
FROM unicorns_usa;

#Split comma-separated investors in select_investors column and stack into new df
investors_series = usa_investors['select_investors'].str.split(', ').apply(pd.Series).stack()

#count occurrences of each investor
investor_counts = investors_series.value_counts().rename_axis('investor').reset_index(name='num_companies')

#create ranking column
investor_counts['rank'] = investor_counts['num_companies'].rank(ascending=False)
investor_counts['rank'] = (investor_counts['rank'] - (investor_counts['rank'] % 1)).astype(int)

#create top 10 investors table
usa_top_investors = investor_counts[0:11]
print(usa_top_investors)

‌
‌
‌

Identifying Trends Amongst Most Successful Startups in the US

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Project

Source

Objective

1. Write a query that returns all American "Unicorn Companies" along with their city, industry, investors, funding, and valuation.

2. Identify and visualize top 10 most valuable startups in the US

3. Identify and visualize top 10 most popular US cities for "Unicorn Companies"

4. Identify and visualize top 10 most poular US industries for "Unicorn Companies"

5. Identify and visualize top 10 investors who have invested in the most American "Unicorn Companies"

Project