Skip to content
Trees in New York City
Trees in New York City
Trees are an essential part of our environment. Studies show how they play a key role in cities, improving air quality and people's mood (nothing better than a walk in a park after work!). So, in this workbook I will analyse trees role in New York City and help covering these four aspects:
- What are the most common tree species in Manhattan?
- Which are the neighborhoods with the most trees?
- A visualization of Manhattan's neighborhoods and tree locations.
- What ten tree species would you recommend the city plant in the future?
We will not consider dead trees in our analysis as they don't have a species name and a health status assigned.
Setup
In this section we'll import libraries and import & clean datasets.
- Import libraries.
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
- Import and clean trees dataset (contains New York City trees informations).
trees = pd.read_csv('data/trees.csv')
display(trees.head())
print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(trees.shape, trees.duplicated().sum(), trees.isnull().values.any()))
NaN values in trees dataset all refer to dead trees. These trees, except one, also don't have a species name assigned. We can drop these values.
# Check for NaN values
print('Tree health status with NaN values: {}'.format(trees[trees.health.isna()].status.unique()))
print('Tree species name with NaN values: {}'.format(trees[trees.health.isna()].spc_common.unique()))
# Drop NaN values
trees.dropna(inplace = True)
print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(trees.shape, trees.duplicated().sum(), trees.isnull().values.any()))
Also, we can drop tree species with trunk diamenter equal to zero.
trees = trees[trees['tree_dbh'] > 0]
print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(trees.shape, trees.duplicated().sum(), trees.isnull().values.any()))
- Import nbhd shape file (we'll use this file to plot Manhattan map).
nbhd = gpd.read_file('data/nta.shp')
display(nbhd.head())
print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(nbhd.shape, nbhd.duplicated().sum(), nbhd.isnull().values.any()))
1. What are the most common tree species in Manhattan?