Regression: Bike Sharing Demand

This dataset consists of the number of public bikes rented in Seoul's bike sharing system at each hour. It also includes information about the weather and the time, such as whether it was a public holiday. Source of dataset.

Attribute Information:

Column	Explanation
Date	month-day
Rented Bike count	Count of bikes rented at each hour
Hour	Hour of the day
Temperature	Temperature in Celsius
Humidity	%
Windspeed	m/s
Visibility	10m
Dew point temperature	Celsius
Solar radiation	MJ/m2
Rainfall	mm
Snowfall	cm
Seasons	Winter, Spring, Summer, Autumn
Holiday	Holiday/No holiday
Functional Day	NoFunc(Non Functional Hours), Fun(Functional hours)

Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks', palette='magma')
plt.rc('xtick', labelsize=9) 
plt.rc('xtick.major', width=0.5)
plt.rc('ytick', labelsize=9)
plt.rc('ytick.major', width=0.5)
plt.rc('axes', linewidth=0.5)

data = pd.read_csv("data/SeoulBikeData.csv").drop('Date', axis=1)
data.columns = ['Rented Bike Count', 'Hour', 'Temperature', 'Humidity',
       'Wind speed', 'Visibility', 'Dew point temperature',
       'Solar Radiation', 'Rainfall', 'Snowfall', 'Seasons',
       'Holiday', 'Functioning Day']
data.columns = data.columns.str.replace(' ', '_').str.lower()
print('Data:')
display(data)
print('Data statistics:')
display(data.describe())

Exploratory data analysis

Target variable

Let's first take a closer look to the variable of interest — Rented Bike Count:

print('Rented Bike Count: mean {:.2f}, median {}, std {:.2f}'.format(data.rented_bike_count.mean(),
                                                                     data.rented_bike_count.median(),
                                                                     data.rented_bike_count.std()))
print('.'*75)
figure, ax = plt.subplots(1, 2, figsize=(16, 6))
sns.histplot(ax=ax[0], data=data, x='rented_bike_count', kde=True)
ax[0].set_title('Histogram')
sns.boxplot(ax=ax[1], data=data, x='rented_bike_count')
ax[1].set_title('Boxplot')
plt.show()

Most of values lay in range of 0 to 1200. The distribution is notably left skewed with a considerable difference between mean and median. Values above 2400 are probably outliers, but needs further investigation.

Target vs independant variables

Now we see how distributed other variables in the dataset and how they correlate with Rented Bike count.

Hour

Start with Hour — hour of a day the bike was rented:

figure, ax = plt.subplots(1, 2, figsize=(16, 6))
sns.histplot(ax=ax[0], data=data, x='hour', y='rented_bike_count')
ax[0].set_title('Histogram')
sns.boxplot(ax=ax[1], data=data, x='hour', y='rented_bike_count', palette='magma')
ax[1].set_title('Boxplot')
plt.show()

There is a strong (and not really surprising) non-linear correlation between Hour and Rented Bike count. Demand raises from 6 am to 6 pm.

‌
‌
‌

Regression: Bike Sharing Demand

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Regression: Bike Sharing Demand

Attribute Information:

Dataset

Exploratory data analysis

Target variable

Target vs independant variables

Hour

Regression: Bike Sharing Demand