Skip to content
0

Trees in Manhattan

Dashboard

Tableau Dashboard

Introduction

Background ๐Ÿ“–

There is a nonprofit organization advising the planning department on ways to improve the quantity and quality of trees in New York City. The urban design team believes tree size (using trunk diameter as a proxy for size) and health are the most desirable characteristics of city trees. The city would like to learn more about which tree species are the best choice to plant on the streets of Manhattan. Here we who do the job to find the answer.

Goals and Objectives ๐ŸŽฏ

We need make a report that covers the following:

  • What are the most common tree species in Manhattan?
  • Which are the neighborhoods with the most trees?
  • A visualization of Manhattan's neighborhoods and tree locations.
  • What ten tree species would you recommend the city plant in the future?

Data Source ๐Ÿ’พ

The team has provided access to the 2015 tree census and geographical information on New York City neighborhoods (trees, neighborhoods):

Tree Census

ColumnDescription
tree_idUnique id of each tree.
tree_dbhThe diameter of the tree in inches measured at 54 inches above the ground.
curb_locLocation of the tree bed in relation to the curb. Either along the curb (OnCurb) or offset from the curb (OffsetFromCurb).
spc_commonCommon name for the species.
statusIndicates whether the tree is alive or standing dead.
healthIndication of the tree's health (Good, Fair, and Poor).
root_stoneIndicates the presence of a root problem caused by paving stones in the tree bed.
root_grateIndicates the presence of a root problem caused by metal grates in the tree bed.
root_otherIndicates the presence of other root problems.
trunk_wireIndicates the presence of a trunk problem caused by wires or rope wrapped around the trunk.
trnk_lightIndicates the presence of a trunk problem caused by lighting installed on the tree.
trnk_otherIndicates the presence of other trunk problems.
brch_lightIndicates the presence of a branch problem caused by lights or wires in the branches.
brch_shoeIndicates the presence of a branch problem caused by shoes in the branches.
brch_otherIndicates the presence of other branch problems.
postcodeFive-digit zip code where the tree is located.
ntaNeighborhood Tabulation Area (NTA) code from the 2010 US Census for the tree.
nta_nameNeighborhood name.
latitudeLatitude of the tree, in decimal degrees.
longitudeLongitude of the tree, in decimal degrees.

Neighborhoods' geographical information

ColumnDescription
ntacodeNTA code (matches Tree Census information).
ntanameNeighborhood name (matches Tree Census information).
geometryPolygon that defines the neighborhood.

Executive Summary

The purpose of this notebook is to provide an executive summary of the analysis conducted in the following cells.

Analysis Overview

In this analysis, I explored the dataset to gain insights and make data-driven decisions. We performed various data cleaning, preprocessing, and exploratory data analysis tasks.

Key Findings

  • The honeylocust is the most abundant tree in Manhattan, with a total of 13,176 scattered throughout the borough.
  • The Upper West Side is home to an impressive array of trees, with a total of 5,807 specimens representing 73 different species. The most popular of these is the Honeylocust | No. | Species | Probability of Good | Probability of Fair | Probability of Poor | Number of Trees Planted | | --- | --- | :-: | :-: | :-: | --: | | 1 | honeylocust | 83% | 15% | 2% | 13,175 | | 2 | Callery pear | 74% | 20% | 6% | 7,297 | | 3 | ginkgo | 75% | 17% | 8% | 5,859 | | 4 | pin oak | 81% | 16% | 3% | 4,584 | | 5 | Sophora | 80% | 16% | 4% | 4,453 | | 6 | Japanese zelkova | 77% | 17% | 6% | 4,122 | | 7 | London planetree | 63% | 28% | 9% | 3,596 | | 8 | littleleaf linden | 62% | 24% | 14% | 3,333 | | 9 | American elm | 80% | 15% | 5% | 1,698 | | 10 | American linden | 64% | 24% | 12% | 1,583 |

Exploratory Data Analyst

import pandas as pd
import geopandas as gpd

# loading tree cencus data
tree_cencus = pd.read_csv('data/trees.csv')

# loading geospatial data
maps = gpd.read_file('data/nta.shp')
tree_cencus.head()
maps.head()
import matplotlib.pyplot as plt
maps.plot()
plt.axis('off')
plt.show()
tree_cencus.info()
maps.info()

Data Cleaning & Preprocessing ๐Ÿงน

Null Value

# checking null values in tree_cencus
tree_cencus.isna().sum()[tree_cencus.isna().sum() > 0]
โ€Œ
โ€Œ
โ€Œ