Skip to content

Union Square Business Distribution Analysis

Objective:

To analyze the types, categories, and distribution of businesses within the Union Square area, providing insights for urban planning, business development, and community services.

In this project, I will use NY Union Square Partnership (USP) Business List data from Kaggle

Here is the data description:

Field NameDescription
CategoryThe broad category of the business (e.g., Food and Drink, Retail).
Type of BusinessMore specific business type (e.g., Restaurants, Apparel and Footwear).
BusinessName of the business.
AddressAddress of the business.
City/StateCity and State where the business is located.
Zip CodeZip code of the business location.
Phone NumberContact phone number of the business.
BoroughBorough in which the business is located.
LatitudeLatitude coordinate of the business location.
LongitudeLongitude coordinate of the business location.
Community BoardAdministrative identifier for community board.
Council DistrictAdministrative identifier for council district.
Census TractAdministrative identifier for census tract.
BINAdministrative identifier for BIN (Building Identification Number)
BBLAdministrative identifier for BBL (Borough Block Lot).
NTAAdministrative identifier for NTA (Neighborhood Tabulation Area)
Location 1JSON-style string containing latitude and longitude.
Administrative columns Zip Codes, Community Districts, Borough Boundaries, City Council Districts, and Police Precincts.Various administrative and geographical identifiers.
Spinner
DataFrameas
unions
variable
--code to save the CSV file as unions
SELECT * FROM 'union.csv';
Spinner
DataFrameas
df6
variable
-- Stored Procedure for Counting Businesses by a Specified Column
CREATE PROCEDURE CountBusiness(IN column_name VARCHAR(255))
BEGIN
    SET @sql_query = CONCAT(
        'SELECT ', column_name, ', COUNT(*) AS Business_Count ',
        'FROM unions ',
        'GROUP BY ', column_name, ' ',
        'ORDER BY Business_Count DESC'
    );

    PREPARE stmt FROM @sql_query;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
END;
Spinner
DataFrameas
df1
variable
-- Count the number of businesses by category using CountBusiness SP
CALL CountBusiness('Category');
Spinner
DataFrameas
df
variable
-- Count the number of businesses by type using CountBusiness SP
CALL CountBusiness('"Type of Business"');

The procedure is created to dynamically count businesses based on any specified column. It constructs the SQL query dynamically using the input parameter.

Counts the number of businesses for each category and orders them in descending order to identify the most common business categories.

Counts the number of businesses for each type and orders them in descending order to identify the most common business types.

Spinner
DataFrameas
df4
variable
-- Count the number of businesses by ZIP code using CountBusiness SP
CALL CountBusiness('"Zip Code"');

Counts the number of businesses in each ZIP code to understand the geographical distribution of businesses.

Spinner
DataFrameas
df2
variable
-- Calculate the density of businesses in various community districts
SELECT "Community Districts", COUNT(*) AS Business_Count
FROM unions
WHERE "Community Districts" IS NOT NULL
GROUP BY "Community Districts"
ORDER BY Business_Count DESC;

Counts the number of businesses in each community district to understand business density and identify districts with the highest concentration of businesses.

Spinner
DataFrameas
df5
variable
-- Identify the top 5 most diverse business categories (categories with the most different types of businesses)
SELECT Category, COUNT(DISTINCT "Type of Business") AS Type_Count
FROM unions
GROUP BY Category
ORDER BY Type_Count DESC
LIMIT 5;