Skip to content

The US Government's Alternative Fuels Data Center collects records of electric vehicle (EV) charging infrastructure, including charging ports and station locations, as well as sales of electric vehicles. With the EV market rapidly evolving, understanding trends in charging facilities and sales is essential to inform strategic planning.

As a data scientist working for a leading EV charging network operator, you recognize the potential in this data and start wrangling and visualizing the aggregated yearly data.

This yearly data captured in December of each year encompasses a record of EV charging port installations and station localities spanning roughly ten years, capturing both public and private charging environments.


The Data

 

private_ev_charging.csv

VariableDescription
yearYear of data collection
private_portsThe number of available charging ports owned by private companies in a given year
private_station_locationsThe number of privately owned station locations for EV charging

public_ev_charging.csv

VariableDescription
yearYear of data collection
public_portsThe number of available charging ports under public ownership in a given year
public_station_locationsThe number of publicly owned station locations for EV charging

The sales information is available for each model and year in the ev_sales.csv file:

VariableDescription
VehicleElectric vehicle model
yearYear of data collection
salesThe number of vehicles sold in the US
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Start coding here

# Read the data as Pandas Dataframes

private_charging = pd.read_csv('private_ev_charging.csv')
public_charging = pd.read_csv('public_ev_charging.csv')
sales_data = pd.read_csv('ev_sales.csv')

# Question 1 - How many vehicles were sold in 2018 in total?

sales_per_year = sales_data.groupby('year')['sales'].sum()
sales_dict = sales_per_year.to_dict()
key_to_locate = 2018
for key, value in sales_dict.items():
    if key == key_to_locate:
        ev_sales_2018 = value
        print('Question 1 - The number of vehicles sold in', key, 'is:', ev_sales_2018)


# Merge the charging port dataframes with the sales per year dataframe

sales_per_year = sales_per_year.reset_index()
merged_df = private_charging.merge(public_charging, on='year').sort_values('year', ascending=True)
merged_df = merged_df.merge(sales_per_year, on='year', how='left')
merged_df.dropna(inplace=True)

# Question 2 - Plot trends for private ports, public ports, and sales; saving this as fig, ax objects.

fig, ax = plt.subplots()

sns.lineplot(x='year', y='private_ports', data=merged_df, label='Private Ports', ax=ax)
sns.lineplot(x='year', y='public_ports', data=merged_df, label='Public Ports', ax=ax)
sns.lineplot(x='year', y='sales', data=merged_df, label='Sales', linestyle=':', ax=ax)

ax.set_xlabel('Year')
ax.set_ylabel('Values')
ax.set_title('Yearly trend of EV ports and sales')

plt.grid(False)
plt.tight_layout()
plt.show()

# Question 3 - Did vehicle sales and number of private and public ports show the same trend between 2015 and 2018?

# Calculate trends for each variable from 2015 to 2018

sales_trend = merged_df.loc[(merged_df['year'] >= 2015) & (merged_df['year'] <= 2018), 'sales'].diff().sum()
private_ports_trend = merged_df.loc[(merged_df['year'] >= 2015) & (merged_df['year'] <= 2018), 'private_ports'].diff().sum()
public_ports_trend = merged_df.loc[(merged_df['year'] >= 2015) & (merged_df['year'] <= 2018), 'public_ports'].diff().sum()

# Determine if trends are same or different

if (sales_trend > 0 and private_ports_trend > 0 and public_ports_trend > 0) or (sales_trend < 0 and private_ports_trend < 0 and public_ports_trend < 0):
    trend = "same"
else:
    trend = "different"

print("\nQuestion 3 - Did vehicle sales and number of private and public ports show the same trend between 2015 and 2018? Answer:", trend)