Exploratory and Data Analysis Project

Practical Exam: Spectrum Shades LLC

Spectrum Shades LLC is a prominent supplier of concrete color solutions, offering a wide range of pigments and coloring systems used in various concrete applications, including decorative concrete, precast concrete, and concrete pavers. The company prides itself on delivering high-quality colorants that meet the unique needs of its diverse clientele, including contractors, architects, and construction companies.

The company has recently observed a growing number of customer complaints regarding inconsistent color quality in their products. The discrepancies have led to a decline in customer satisfaction and a potential increase in product returns. By identifying and mitigating the factors causing color variations, the company can enhance product reliability, reduce customer complaints, and minimize return rates.

You are part of the data analysis team tasked with providing actionable insights to help Spectrum Shades LLC address the issues of inconsistent color quality and improve customer satisfaction.

1 hidden cell

# Write your answer to Task 1 here
import pandas as pd
import numpy as np

# lets take a look into the data
production_data = pd.read_csv('production_data.csv')
production_data.head()
# production_data.info()

clean_data = production_data.copy()

# convert the data types as required
clean_data['production_date'] = pd.to_datetime(clean_data['production_date'])

# checking for nulled first before categorizing the columsn entries
# clean_data['raw_material_supplier'].value_counts()

def check_cat(n):
    if n == 1:
        return 'national_supplier'
    elif n ==2:
        return 'international_supplier'
    else:
        return 'national_supplier'
        
clean_data['raw_material_supplier'] = clean_data['raw_material_supplier'].apply(check_cat)

clean_data[['raw_material_supplier', 'mixing_speed', 'pigment_type']] = clean_data[['raw_material_supplier', 'mixing_speed', 'pigment_type']].astype('category')

clean_data.info()

# lets work on the missing columns in the categorical
for cat in clean_data.select_dtypes('category').columns:
    print(f'Values in {cat} are -->\n{clean_data[cat].value_counts()}')

Hidden output

# lets work on errors within the affected columns
clean_data['pigment_type'] = clean_data['pigment_type'].str.lower()
print(clean_data['pigment_type'].value_counts())

Hidden output

# Replace missing values in the categoricals
clean_data['mixing_speed'] = clean_data['mixing_speed'].str.replace('-', 'Not Specified')
print(clean_data['mixing_speed'].value_counts())

Hidden output

# Replace missing values in the numericals
clean_data['mixing_time'].fillna(clean_data['mixing_time'].mean().round(2), inplace=True)
print(clean_data['mixing_time'].isna().sum())

Task 2

You want to understand how the supplier type and quantity of materials affect the final product attributes.

Calculate the average product_quality_score and pigment_quantity grouped by raw_material_supplier.

You should start with the data in the file 'production_data.csv'.
Your output should be a DataFrame named aggregated_data.
It should include the three columns: raw_material_supplier, avg_product_quality_score, and avg_pigment_quantity.
Your answers should be rounded to 2 decimal places.

# Write your answer to Task 2 here
aggregated_data = production_data.groupby('raw_material_supplier')[['product_quality_score', 'pigment_quantity']].mean().round(2).rename(columns={
    'raw_material_supplier': 'raw_material_supplier',
    'product_quality_score': 'avg_product_quality_score',
    'pigment_quantity': 'avg_pigment_quantity'
}).reset_index()

aggregated_data

Hidden output

Task 3

To get more insight into the factors behind product quality, you want to filter the data to see an average product quality score for a specified set of results.

Identify the average product_quality_score for batches with a raw_material_supplier of 2 and a pigment_quantity greater than 35 kg.

Write a query to return the average avg_product_quality_score for these filtered batches. Use the original production data table, not the output of Task 2.

You should start with the data in the file 'production_data.csv'.
Your output should be a DataFrame named pigment_data.
It should consist of a 1-row DataFrame with 3 columns: raw_material_supplier, pigment_quantity, and avg_product_quality_score.
Your answers should be rounded to 2 decimal places where appropriate.

# Write your answer to Task 3 here
filtered = production_data[
    (production_data['raw_material_supplier'] == 2) &
    (production_data['pigment_quantity'] > 35)
]

# Create the required 1-row DataFrame
pigment_data = pd.DataFrame({
    'raw_material_supplier': [2],
    'pigment_quantity': [round(filtered['pigment_quantity'].mean(), 2)],
    'avg_product_quality_score': [round(filtered['product_quality_score'].mean(), 2)]
})

pigment_data

‌
‌
‌

Exploratory and Data Analysis Project

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Practical Exam: Spectrum Shades LLC

Task 2

Task 3

Practical Exam: Spectrum Shades LLC