Bikeshare Insights: Summer in the Windy City

This dataset contains information on Divvy Bikes, a bikeshare program that provides residents and visitors of Chicago with a convenient way to explore the city.

The workspace is set up with one CSV file containing bikeshare activities at the peak of the summer-July 2023. Columns include ride ID, bike type, start and end times, station names and IDs, location coordinates, and member type. Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.

Source: Divvy Bikes

🌎 Some guiding questions to help you explore this data:

How many observations are in the dataset? Are there null values?
How would you clean and prepare the data for analysis?
Which bike types are popular and which ones aren't? Check if being a member or casual rider makes a difference in bike choice.
Time check! What are the peak and off-peak riding times during the day?

📊 Visualization ideas

Bar chart: Display the number of times each bike type is used to identify the most and least used bikes.
Grouped bar chart: Compare bike usage by member type (member vs. casual) to see if it affects bike choice.
Heatmap: Vividly illustrate the popularity of bikes at different times during the day and week.

You can query the pre-loaded CSV files using SQL directly. Here’s a sample query:

DataFrameas

df

variable

SELECT *
FROM '202307-divvy-tripdata.parquet'
LIMIT 10

DataFrameas

df4

variable

SELECT start_station_name, COUNT(start_station_name)
FROM '202307-divvy-tripdata.parquet'
GROUP BY start_station_name
ORDER BY COUNT(start_station_name) DESC
LIMIT 10

DataFrameas

df5

variable

SELECT end_station_name, COUNT(end_station_name)
FROM '202307-divvy-tripdata.parquet'
GROUP BY end_station_name
ORDER BY COUNT(end_station_name) DESC
LIMIT 10

DataFrameas

df1

variable

SELECT COUNT(ride_id)
FROM '202307-divvy-tripdata.parquet'
WHERE ride_id IS NOT NULL;

DataFrameas

df2

variable

SELECT rideable_type, COUNT(rideable_type)
FROM '202307-divvy-tripdata.parquet'
GROUP BY rideable_type;

DataFrameas

df3

variable

SELECT member_casual, COUNT(member_casual)
FROM '202307-divvy-tripdata.parquet'
GROUP BY member_casual;

import pandas as pd
divvy_jan2023 = pd.read_parquet("202307-divvy-tripdata.parquet")
divvy_jan2023.head()

import pygwalker as pyg
df = pd.read_parquet("202307-divvy-tripdata.parquet")
pyg.walk(df, hideDataSourceConfig= True, vegaTheme ='vega')

# Round up the time to the nearest hour in the 'started_at' column

import pandas as pd

divvy_jan2023['started_at'] = pd.to_datetime(divvy_jan2023['started_at'])
divvy_jan2023['started_at'] = divvy_jan2023['started_at'].apply(lambda x: x.replace(minute=0, second=0, microsecond=0) + pd.Timedelta(hours=1) if x.minute >= 30 else x.replace(minute=0, second=0, microsecond=0))


divvy_jan2023.head()

# Create a histogram of the time in the started_at column

import matplotlib.pyplot as plt

# Extract the hour from the started_at column
divvy_jan2023['hour'] = divvy_jan2023['started_at'].dt.hour

# Plot the histogram
plt.hist(divvy_jan2023['hour'], bins=24)
plt.xlabel('Hour of the day')
plt.ylabel('Frequency')
plt.title('Distribution of Start Time')
plt.show()

‌
‌
‌

Bikeshare Insights: Summer in the Windy City

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Bikeshare Insights: Summer in the Windy City

🌎 Some guiding questions to help you explore this data:

📊 Visualization ideas

Bikeshare Insights: Summer in the Windy City