Live Training: Data[…]lity=public

Data Visualization for absolute beginners

This live training covers the basics of how to create an interactive plot using Plotly. We will visualize Seoul bike sharing data using bar plots, scatter plots, and line plots using Plotly as well as DataCamp Workspace's no-code chart cell. In the process, we’ll tease out how Seoul weather is impacting bike sharing trends.

Load in required packages

import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px

Load and clean the data

The dataset consists of the number of public bikes rented in Seoul's bike sharing system at each hour. It also includes information about the weather and the time, such as whether it was a public holiday. Source of dataset.

# Import CSV with renamed columns
df = pd.read_csv('data/seoul_bike_data_renamed.csv')
    
# Clean up some columns
df["date"] = pd.to_datetime(df["date"], format="%d/%m/%Y")
df["datetime"] = df.apply(
    lambda row: row["date"] + timedelta(hours=row["hour"]), axis=1
)
df["is_holiday"] = df["is_holiday"].map({"No Holiday": False, "Holiday": True})

# Similar to is_holiday, map is_functioning to True and False
df["is_functioning"] = df["is_functioning"].map({"Yes": True, "No": False})

# Only keep observations where the system is functioning
df = df.query('is_functioning')

# Print out the result
df

Visualize bike rentals over time

# Create a line plot of rented bikes over time
px.line(df, x="datetime", y="n_rented_bikes")

# Calculate the total number of rented bikes per day
by_day = df \
	.groupby(by="date", as_index=False) \
	.sum("n_rented_bikes") \
	[["date", "n_rented_bikes"]]

# Create a line plot showing total number of bikes per day over time
px.line(by_day, x='date', y='n_rented_bikes')

# Copy the previous chain of manipulations and add season as a variable to group by
by_day_season = df \
	.groupby(by=['date', 'season'], as_index=False) \
	.sum("n_rented_bikes") \
	[['date', 'n_rented_bikes', 'season']]

# Copy the code for the previous line plot and map season to color
px.line(by_day_season, x='date', y='n_rented_bikes', color='season')

Explore the relation between weather and rentals

# Query df to only keep observations at noon
noon_rides = df.query('hour == 12')

# Create a scatter plot showing temperature against number of rented bikes
# Add a trendline if you feel like it
px.scatter(noon_rides, x='temperature_celsius', y='n_rented_bikes', trendline='lowess')

# Copy and update the code for the previous scatter plot 
# to investigate relation with other weather parameters
# px.scatter(noon_rides, x='wind_speed_mps', y='n_rented_bikes', trendline='lowess')
# px.scatter(noon_rides, x='humidity_pct', y='n_rented_bikes', trendline='lowess')
# px.scatter(noon_rides, x='visibility_10m', y='n_rented_bikes', trendline='lowess')
# px.scatter(noon_rides, x='rainfall_mm', y='n_rented_bikes', trendline='lowess')
# px.scatter(noon_rides, x='snowfall_cm', y='n_rented_bikes', trendline='lowess')

Explore typical daily usage pattern

# Calculate the average number of rented bikes per hour
time_of_day = df \
	.groupby(by = ['hour'], as_index=False) \
	.mean("n_rented_bikes") \
	[['hour', 'n_rented_bikes']]

# Create a bar chart showing the usage pattern
px.bar(time_of_day, x='hour', y='n_rented_bikes')

# Copy and adapt the previous query to take into account the season
time_of_day_season = df \
	.groupby(by = ['hour', 'season'], as_index=False) \
	.mean("n_rented_bikes") \
	[['hour', 'season', 'n_rented_bikes']]

# Copy and adapt the code for the previous bar chart to show usage pattern per season
px.bar(time_of_day_season, x='hour', y='n_rented_bikes', color='season', facet_col="season")

‌
‌
‌