Which tree species should the city plant?
๐ Background
You work for a nonprofit organization advising the planning department on ways to improve the quantity and quality of trees in New York City. The urban design team believes tree size (using trunk diameter as a proxy for size) and health are the most desirable characteristics of city trees.
The city would like to learn more about which tree species are the best choice to plant on the streets of Manhattan.
๐พ The data
The team has provided access to the 2015 tree census and geographical information on New York City neighborhoods (trees, neighborhoods):
Tree Census
- "tree_id" - Unique id of each tree.
- "tree_dbh" - The diameter of the tree in inches measured at 54 inches above the ground.
- "curb_loc" - Location of the tree bed in relation to the curb. Either along the curb (OnCurb) or offset from the curb (OffsetFromCurb).
- "spc_common" - Common name for the species.
- "status" - Indicates whether the tree is alive or standing dead.
- "health" - Indication of the tree's health (Good, Fair, and Poor).
- "root_stone" - Indicates the presence of a root problem caused by paving stones in the tree bed.
- "root_grate" - Indicates the presence of a root problem caused by metal grates in the tree bed.
- "root_other" - Indicates the presence of other root problems.
- "trunk_wire" - Indicates the presence of a trunk problem caused by wires or rope wrapped around the trunk.
- "trnk_light" - Indicates the presence of a trunk problem caused by lighting installed on the tree.
- "trnk_other" - Indicates the presence of other trunk problems.
- "brch_light" - Indicates the presence of a branch problem caused by lights or wires in the branches.
- "brch_shoe" - Indicates the presence of a branch problem caused by shoes in the branches.
- "brch_other" - Indicates the presence of other branch problems.
- "postcode" - Five-digit zip code where the tree is located.
- "nta" - Neighborhood Tabulation Area (NTA) code from the 2010 US Census for the tree.
- "nta_name" - Neighborhood name.
- "latitude" - Latitude of the tree, in decimal degrees.
- "longitude" - Longitude of the tree, in decimal degrees.
Neighborhoods' geographical information
- "ntacode" - NTA code (matches Tree Census information).
- "ntaname" - Neighborhood name (matches Tree Census information).
- "geometry" - Polygon that defines the neighborhood.
Tree census and neighborhood information from the City of New York NYC Open Data.
import pandas as pd
import geopandas as gpd
trees = pd.read_csv('data/trees.csv')
treesneighborhoods = gpd.read_file('data/nta.shp')
neighborhoods๐ช Challenge
Create a report that covers the following:
- What are the most common tree species in Manhattan?
- Which are the neighborhoods with the most trees?
- A visualization of Manhattan's neighborhoods and tree locations.
- What ten tree species would you recommend the city plant in the future?
๐งโโ๏ธ Judging criteria
| CATEGORY | WEIGHTING | DETAILS |
|---|---|---|
| Response quality | 85% |
|
| Presentation | 15% |
|
In the event of a tie, earlier submission time will be used as a tie-breaker.
๐ Rules
To be eligible to win, you must:
- Submit your response before the deadline. All responses must be submitted in English.
Entrants must be:
- 18+ years old.
- Allowed to take part in a skill-based competition from their country. Entrants can not:
- Be in a country currently sanctioned by the U.S. government.
โ
Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria, so the workbook is focused on your work.
- Check that all the cells run without error.
โ๏ธ Time is ticking. Good luck!
Summary
Data from the 2015 New York City tree census were analyzed to assess patterns of tree location and health and to determine recommendations for future tree plantings.
The most common tree varieties in New York City are...
Distribution of trees around the city is... The neighborhoods in Manhattan with the most trees are ...
The ten tree varieties I would recommend for future planting are... These varieties had the biggest trunk size and least trunk and branch problems.
Approach
Data sources are ...
Tree Data were joined to the neighborhood location data set by NTA code to create a mappable combination.
Columns for trunk and branch condition were used to develop a combined metric for overall tree health. These data were analyzed categorically by tree variety to determine the healthiest varieties.
Analysis
Most common tree varieties
Quick pivot table & count plot, sorting presented varieties in descending order.
Is there a logical cutoff for "most common"? Otherwise pick an arbitrary cutoff of 5 or 10.
# write code hereDistribution of Trees around New York City
Join tree census and neighborhood data on NTA code. Count trees in each neighborhood (sum? pivot? probably pivot...).
Neighborhoods with most trees, citywide. List both neighborhood and borough for clarity.
Map of trees in Manhattan. Size of dot on count of trees.
Recommending Future Plantings
Future plantings should ideally be trees which are the healthiest for the longest amount of time.
Per the source data brief, tree diameter can be used as a size estimate. I will be using size as a proxy for tree age. should this be normalized to median per variety?
There are multiple condition markers: alive, health, and specific problems by part of the tree. The root, trunk, and branch problems data are stored across multiple columns according to the specific problem. Summing these columns can be used to estimate health of trees. should any columns be excluded from the sum? probably shoes in branches should be excluded
Mapping penalties for dead trees and poor health, noting that dead trees are NOT subject to health grade. dead tree penalty: alive=0, dead=10; poor health penalty: good=0, fair=3, poor=5
scatterplot diameter vs bad-health-metric, categorized by tree type
which trees cluster at high girth and low bad-health?
Check for confounding factors: location vs bad health (dead, condition, and sum-problems) independent of variety.
How to numerically estimate "top" varieties? Weight health scores for count of variety, then... Regression analysis? Bar plot, sorting for lowest bad-health?
Ultimately need to pick "top 10" tree varieties to recommend.