Skip to content
New Workbook
Sign up
Trees in New York City
0

Trees in New York City

Trees are an essential part of our environment. Studies show how they play a key role in cities, improving air quality and people's mood (nothing better than a walk in a park after work!). So, in this workbook I will analyse trees role in New York City and help covering these four aspects:

  1. What are the most common tree species in Manhattan?
  2. Which are the neighborhoods with the most trees?
  3. A visualization of Manhattan's neighborhoods and tree locations.
  4. What ten tree species would you recommend the city plant in the future?

We will not consider dead trees in our analysis as they don't have a species name and a health status assigned.

Setup

In this section we'll import libraries and import & clean datasets.

  1. Import libraries.
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
  1. Import and clean trees dataset (contains New York City trees informations).
trees = pd.read_csv('data/trees.csv')

display(trees.head())
print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(trees.shape, trees.duplicated().sum(), trees.isnull().values.any()))

NaN values in trees dataset all refer to dead trees. These trees, except one, also don't have a species name assigned. We can drop these values.

# Check for NaN values
print('Tree health status with NaN values: {}'.format(trees[trees.health.isna()].status.unique()))

print('Tree species name with NaN values: {}'.format(trees[trees.health.isna()].spc_common.unique()))
# Drop NaN values
trees.dropna(inplace = True)

print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(trees.shape, trees.duplicated().sum(), trees.isnull().values.any()))

Also, we can drop tree species with trunk diamenter equal to zero.

trees = trees[trees['tree_dbh'] > 0]

print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(trees.shape, trees.duplicated().sum(), trees.isnull().values.any()))
  1. Import nbhd shape file (we'll use this file to plot Manhattan map).
nbhd = gpd.read_file('data/nta.shp')

display(nbhd.head())
print('Shape: {} \t Duplicated: {} \t NaN values: {}'.format(nbhd.shape, nbhd.duplicated().sum(), nbhd.isnull().values.any()))

1. What are the most common tree species in Manhattan?