Skip to content
0

import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 
import numpy as np
from scipy import stats
import  plotly.graph_objects as go
import plotly.express as px
import pandas as pd

💾 The data

The team has provided access to the 2015 tree census and geographical information on New York City neighborhoods (trees, neighborhoods):

Tree Census
  • "tree_id" - Unique id of each tree.
  • "tree_dbh" - The diameter of the tree in inches measured at 54 inches above the ground.
  • "curb_loc" - Location of the tree bed in relation to the curb. Either along the curb (OnCurb) or offset from the curb (OffsetFromCurb).
  • "spc_common" - Common name for the species.
  • "status" - Indicates whether the tree is alive or standing dead.
  • "health" - Indication of the tree's health (Good, Fair, and Poor).
  • "root_stone" - Indicates the presence of a root problem caused by paving stones in the tree bed.
  • "root_grate" - Indicates the presence of a root problem caused by metal grates in the tree bed.
  • "root_other" - Indicates the presence of other root problems.
  • "trunk_wire" - Indicates the presence of a trunk problem caused by wires or rope wrapped around the trunk.
  • "trnk_light" - Indicates the presence of a trunk problem caused by lighting installed on the tree.
  • "trnk_other" - Indicates the presence of other trunk problems.
  • "brch_light" - Indicates the presence of a branch problem caused by lights or wires in the branches.
  • "brch_shoe" - Indicates the presence of a branch problem caused by shoes in the branches.
  • "brch_other" - Indicates the presence of other branch problems.
  • "postcode" - Five-digit zip code where the tree is located.
  • "nta" - Neighborhood Tabulation Area (NTA) code from the 2010 US Census for the tree.
  • "nta_name" - Neighborhood name.
  • "latitude" - Latitude of the tree, in decimal degrees.
  • "longitude" - Longitude of the tree, in decimal degrees.
Neighborhoods' geographical information
  • "ntacode" - NTA code (matches Tree Census information).
  • "ntaname" - Neighborhood name (matches Tree Census information).
  • "geometry" - Polygon that defines the neighborhood.

Tree census and neighborhood information from the City of New York NYC Open Data.

import pandas as pd
import geopandas as gpd
trees = pd.read_csv('data/trees.csv')
trees
trees.sample(10)
trees.info()
trees.isna().sum()

We have 64229 row of data so if we removed empty cells it will make any sense to our results

trees.dropna(inplace =True,axis = 0)
trees.isna().sum()
Hidden output
neighborhoods = gpd.read_file('data/nta.shp')
neighborhoods.to_csv('nighb.csv')
neighborhoods.value_counts(['borocode'])
neighborhoods.isna().sum()
neighborhoods.info()
Hidden output


df = trees.merge(neighborhoods, left_on='nta_name', right_on='ntaname')
‌
‌
‌