Skip to content
New Workbook
Sign up
Competition - City Tree Species
0

Analyzing the Manhattan Tree Distribution

Summary

The urban design team team within the Department of City Planning is looking to understand and find ways on improving the quantity and quality of trees in New York.

Using New york's tree data from 2015, the profile resulting from the analysis in this report shows that:

  • 2 out every 62 trees are considered dead
  • The average diameter of trees that are alive is twice that of the dead ones
  • Honeylocust is the specie with largest number of trees accounting for more than 21% of the total number of trees considered alive followed by Callery Pear and Gingko
  • Upper west side, upper east side-carnegie hill and west village are the top three neighborhoods with the most trees
  • Location of trees is evenly spread across the whole of Manhattan

Based on the analysis, The top ten trees to consider planting are

  • Honeylocust
  • Callery Pear
  • Gingko
  • Pink Oak
  • Sophora
  • London Planetree
  • Japanese Zelkova
  • Littleleaf Linden
  • American elm
  • American Linden

Recommendation

  • the urban design team should prioritize planting trees with larger diameters in order to increase their chances of survival.
  • Additionally, the planning department should consider implementing measures to improve the overall health of the trees, such as providing adequate watering and nutrients or protecting them from pests and diseases
  • They should also consider having records of the specie of the dead trees as it was missing in the data as this would enable them know which species suffer more deaths.

IMPORT PACKAGES AND LIBRARIES

#!pip install GridSpec
#!pip install descartes
#!pip install geopandas
#!pip install descartes
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt; plt.style.use('seaborn-whitegrid')
import seaborn as sns
from matplotlib.gridspec import GridSpec
import geopandas as gpd
#import descartes
from shapely.geometry import Point, Polygon
#import geoplot
from pandas.plotting import register_matplotlib_converters
import warnings

register_matplotlib_converters()
%matplotlib inline
warnings.filterwarnings('ignore')
plt.style.use('seaborn-deep')
plt.rcParams['figure.figsize'] = (16,12)
plt.rcParams['axes.labelsize'] = 16
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['legend.fontsize'] = 14
plt.rcParams['xtick.labelsize'] = 14
plt.rcParams['ytick.labelsize'] = 14

Read the data

trees = pd.read_csv('data/trees.csv')
trees.head()

DATA WRANGLING

Preparing the data

The data is first checked for any missing entries. A lot of records seem to contain missing specie name and tree status. After ruling out the existence of duplicates records, the data is then passed through the following transformations:

  • All missing values are replaced with a string
  • Categorical features are identified, and their data type changed accordingly
  • Trees with diameter of 0 and diameter higher than 20 are dropped

VISUAL AND PROGRAMMATIC ASSESSMENT

# make a copy of the data
tree_info = trees.copy()

Only a two columns have null values which are the spc_common(specie of tree) and health (health status of trees whether Good, Fair, and Poor). There are coluns with erroneous data types which needs to be resolved to the correct type.

# check for missing value and data types
tree_info.info()

Looking at the dataframe, there seems to a situation of data missing not at random as we can see that for ebvery tree with status as Dead, the two columns have missing values in those rows. lets investigate more. Only about 1802 observation (which constitute 3% of the values) are missing in the two columns. from my investigation, All observation of trees with status as Dead have null values in the aformentioned columns whiles others with status as Alive don't have missing values.

# check record with null entries
null_entries = tree_info[tree_info.isnull().any(axis = 1)]
null_entries