In today’s fast-paced, data-driven world, companies rely heavily on accurate analyses of sales data to make informed decisions. However, the quality of the insights derived from such analyses depend significantly on the integrity of the underlying code.
You have been given some starting code with two functions: one that extracts and flattens JSON data into a structured format and the other that transforms electricity sales data by cleaning, filtering, and extracting relevant features. The company plans to use your revised code to improve the accuracy of sales analytics.
Your task is to identify potential errors in the functions and the underlying data that might result in logic and runtime errors, such as missing values, incorrect data types, or incompatible values (e.g., negatives). Enhance the custom functions provided by implementing exceptions to catch data quality issues and edge cases.
The data is available here ("sales_data_sample.csv") for the analyses. This data has 25 columns, but only two columns are analyzed, namely, quantity_ordered
and price_each
. A sample of the data is shown below.
# Import libraries
import pandas as pd
# Load data
sales_df = pd.read_csv("data/sales_data_sample.csv")
sales_df.head()
# Identify errors and add exceptions to the `get_quantity_ordered_sum()` function
def get_quantity_ordered_sum(sales_quantity_ordered):
"""Calculates the total sum on the 'quantity_ordered' column.
Args:
sales_quantity_ordered (pd.core.series.Series): The pandas Series for the 'quantity_ordered' column.
Returns:
total_quantity_ordered (int): The total sum of the 'quantity_ordered' column.
"""
total_quantity_ordered = 0
for quantity in sales_quantity_ordered:
if quantity < 0:
total_quantity_ordered += (quantity * -1)
else:
total_quantity_ordered += quantity
return total_quantity_ordered
# Identify errors and add exceptions to the `get_price_each_average()` function
def get_price_each_average(sales_price_each, num_places=2):
"""Calculates the average on the 'price_each' column
using pandas built in methods and rounds to the desired number of places.
Args:
sales_price_each (pd.core.series.Series): The pandas Series for the 'price_each' column.
num_of_places (int): The number of decimal places to round.
Returns:
average_price_each (float): The average of the 'price_each' column.
"""
try:
sales_price_each = pd.to_numeric(sales_price_each, errors='coerce').dropna()
if sales_price_each.empty:
return 0
total_of_price_each = sales_price_each.sum()
len_of_price_each = len(sales_price_each)
average_price_each = round(
total_of_price_each / len_of_price_each, num_places
)
return average_price_each
except TypeError as e:
print(e)
except Exception as e:
print(e)
total_quantity_ordered = get_quantity_ordered_sum(sales_df["quantity_ordered"])
average_price_each = get_price_each_average(sales_df['price_each'])