Skip to content
Course Notes: Joining Data with pandas
Course Notes
Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.
# Merge the licenses and biz_owners table on account
licenses_owners = licenses.merge(biz_owners, on="account")
# Group the results by title then count the number of accounts
counted_df = licenses_owners.groupby("title").agg({'account':'count'})
# Sort the counted_df in desending order
sorted_df = counted_df.sort_values("account",ascending=False)
# Use .head() method to print the first few rows of sorted_df
print(sorted_df.head())Take Notes
Add notes here about the concepts you've learned and code cells with code you want to keep.
Total riders in a month
# Merge the ridership, cal, and stations tables
ridership_cal_stations = ridership.merge(cal, on=['year','month','day']) \
							.merge(stations, on='station_id')
# Create a filter to filter ridership_cal_stations
filter_criteria = ((ridership_cal_stations['month'] == 7) 
                   & (ridership_cal_stations['day_type'] == "Weekday") 
                   & (ridership_cal_stations['station_name'] == "Wilson"))
# Use .loc and the filter to select for rides
print(ridership_cal_stations.loc[filter_criteria, 'rides'].sum())# Add your code snippets here
One-to-many merge with multiple tables
import pandas as pd
# Define land_use DataFrame
land_use = pd.DataFrame({'ward': [1, 2, 3],
                         'land_area': [100, 200, 300]})
# Merge land_use and census and merge result with licenses including suffixes
land_cen_lic = land_use.merge(census, on='ward') \
                    .merge(licenses, on='ward', suffixes=('_cen','_lic'))
# Group by ward, pop_2010, and vacant, then count the # of accounts
pop_vac_lic = land_cen_lic.groupby(['ward','pop_2010','vacant'], 
                                   as_index=False).agg({'account':'count'})
# Sort pop_vac_lic and print the results
sorted_pop_vac_lic = pop_vac_lic.sort_values(["vacant","account","pop_2010"], 
                                             ascending=[False,True,True])
# Print the top few rows of sorted_pop_vac_lic
print(sorted_pop_vac_lic.head())print(lice