Skip to content
1 hidden cell
3 hidden cells
2 hidden cells
Data Visualization in Python for Absolute Beginners
Data Visualization for absolute beginners
This live training covers the basics of how to create an interactive plot using Plotly. We will visualize Seoul bike sharing data using bar plots, scatter plots, and line plots using Plotly as well as DataCamp Workspace's no-code chart cell. In the process, we’ll tease out how Seoul weather is impacting bike sharing trends.
Load in required packages
import pandas as pd
from datetime import datetime, timedelta
import plotly.express as pxLoad and clean the data
The dataset consists of the number of public bikes rented in Seoul's bike sharing system at each hour. It also includes information about the weather and the time, such as whether it was a public holiday. Source of dataset.
# Import CSV with renamed columns
df = pd.read_csv('data/seoul_bike_data_renamed.csv')
# Clean up some columns
df["date"] = pd.to_datetime(df["date"], format="%d/%m/%Y")
df["datetime"] = df.apply(
lambda row: row["date"] + timedelta(hours=row["hour"]), axis=1
)
df["is_holiday"] = df["is_holiday"].map({"No Holiday": False, "Holiday": True})
# Similar to is_holiday, map is_functioning to True and False
df["is_functioning"] = df["is_functioning"].map({'No': False, 'Yes': True})
# Only keep observations where the system is functioning
df = df.query('is_functioning')
# Print out the result
dfVisualize bike rentals over time
1 hidden cell
# Copy and adapt the previous query to take into account the season
by_season = df \
.groupby(by=['hour', 'season'], as_index=False) \
.sum("n_rented_bikes") \
[["hour", "season", "n_rented_bikes"]]
# Copy and adapt the code for the previous bar chart to show usage pattern per season
px.bar(by_season, x='hour', y='n_rented_bikes', color="season", facet_col="season")3 hidden cells
Explore the relation between weather and rentals
2 hidden cells
Explore typical daily usage pattern
Extra: is New Year's Eve different?
# New Years dates
new_years_start = datetime(2017, 12, 31, 12)
new_years_end = datetime(2018, 1, 1, 12)
# Create data frame with new year's data
new_year = df[(df["datetime"] >= new_years_start) & (df["datetime"] <= new_years_end)]
# Show usage pattern
px.bar(new_year, x="datetime", y="n_rented_bikes")Hidden output
# Create a new column indicating whether the rental is on New Year's Eve
df['is_nye'] = (df['datetime'] >= new_years_start) & (df['datetime'] <= new_years_end)
# Create a DataFrame comparing winter usage with New Year's Eve usage
time_of_day = df \
.query('season == "Winter"') \
.groupby(by=['hour', 'is_nye'], as_index=False) \
.sum("n_rented_bikes") \
[["hour", "is_nye", "n_rented_bikes"]]
# Build a bar plot that compares New Year's usage with standard winter usage
px.bar(time_of_day, x='hour', y='n_rented_bikes', color="is_nye", barmode="group")