Which tree species should the city plant?
📖 Background
You work for a nonprofit organization advising the planning department on ways to improve the quantity and quality of trees in New York City. The urban design team believes tree size (using trunk diameter as a proxy for size) and health are the most desirable characteristics of city trees.
The city would like to learn more about which tree species are the best choice to plant on the streets of Manhattan.
💾 The data
The team has provided access to the 2015 tree census and geographical information on New York City neighborhoods (trees, neighborhoods):
Tree Census
- "tree_id" - Unique id of each tree.
- "tree_dbh" - The diameter of the tree in inches measured at 54 inches above the ground.
- "curb_loc" - Location of the tree bed in relation to the curb. Either along the curb (OnCurb) or offset from the curb (OffsetFromCurb).
- "spc_common" - Common name for the species.
- "status" - Indicates whether the tree is alive or standing dead.
- "health" - Indication of the tree's health (Good, Fair, and Poor).
- "root_stone" - Indicates the presence of a root problem caused by paving stones in the tree bed.
- "root_grate" - Indicates the presence of a root problem caused by metal grates in the tree bed.
- "root_other" - Indicates the presence of other root problems.
- "trunk_wire" - Indicates the presence of a trunk problem caused by wires or rope wrapped around the trunk.
- "trnk_light" - Indicates the presence of a trunk problem caused by lighting installed on the tree.
- "trnk_other" - Indicates the presence of other trunk problems.
- "brch_light" - Indicates the presence of a branch problem caused by lights or wires in the branches.
- "brch_shoe" - Indicates the presence of a branch problem caused by shoes in the branches.
- "brch_other" - Indicates the presence of other branch problems.
- "postcode" - Five-digit zip code where the tree is located.
- "nta" - Neighborhood Tabulation Area (NTA) code from the 2010 US Census for the tree.
- "nta_name" - Neighborhood name.
- "latitude" - Latitude of the tree, in decimal degrees.
- "longitude" - Longitude of the tree, in decimal degrees.
Neighborhoods' geographical information
- "ntacode" - NTA code (matches Tree Census information).
- "ntaname" - Neighborhood name (matches Tree Census information).
- "geometry" - Polygon that defines the neighborhood.
Tree census and neighborhood information from the City of New York NYC Open Data.
import pandas as pd
import geopandas as gpd
trees = pd.read_csv('data/trees.csv')
trees
neighborhoods = gpd.read_file('data/nta.shp')
neighborhoods
💪 Challenge
Create a report that covers the following:
- What are the most common tree species in Manhattan?
- Which are the neighborhoods with the most trees?
- A visualization of Manhattan's neighborhoods and tree locations.
- What ten tree species would you recommend the city plant in the future?
Introduction
Trees are an important part of our lives.
With their wonderful presence, trees:
- create a peaceful, aesthetic environment for us,
- increase our quality of life
- provide good shade
- clean the air by removing the dust and absorbing the pollutions
- consume carbon-dioxide and produce oxygen
- moderate the effects of the sun, rain and wind
- conserve water
- preserve soil
- and support wildlife.
To be able to enjoy the positive effect of the trees not only in the forests and our backyards, but also on the streets, parks, playgrounds, the urban design team aim to improve the quantity and quality of trees in New York City.
The most important characteristics of the urban trees are the good health, adaptability to the environment and a potential for the long life.
The following 10 species would be a great choice to plant on the streets of Manhattan.
- Honeylocust
- Willow oak
- Pin oak
- Hawthorn
- Kentucky coffeetree
- Sawtooth oak
- Golden raintree
- Siberian elm
- Schumard's oak
- American elm
Although oaks dominate the list of recommendations, Honeylocust is the number one.
That's why I chose an opening image where Honeylocus makes the streets of Manhattan look stunning in autumn.
The goal of this report to provide an insight why these tree species are recommended.
Improting packages, modules:
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#Presetting the color palette for graphs
sns.set_palette('colorblind')
Checking and cleaning the data:
# Read CSV as DataFrame called trees
trees = pd.read_csv('data/trees.csv')
# Print the head of the trees data
trees.head()
# Print information about trees data
trees.info()
# Make tree species names consistent in the spc_common column by capitalizing initial letters
trees['spc_common']=trees['spc_common'].str.capitalize()
print(trees['spc_common'].head())
# Check that every data are in the proper format
trees.dtypes
# Change the data type of 'tree_id' to str
trees['tree_id']=trees['tree_id'].astype('str')
# Check if are there any duplicates in the 'tree_id' column
print(trees['tree_id'].duplicated().any)
# Detect and count missing values
missing_values = trees.isnull().sum()
# Set color based on the threshold (0.05 * length of the data frame)
colors = np.where(missing_values <= 0.05 * len(trees), 'gray', 'red')
# Plot missing values
plt.bar(trees.columns, missing_values, color=colors)
plt.title("Missing Values")
plt.xlabel("Columns")
plt.ylabel("Number of missing values")
# Rotate the column names on the x-axis by 45 degrees
plt.xticks(rotation=45, ha='right')
# Add legend at the top
plt.legend(['<= 5% of total rows', '> 5% of total rows'], loc='upper center')
# Show the plot
plt.show()
import missingno as msno
‌
‌