Project: Spectrum Shades LLC EDA

#Spectrum Shades LLC

Spectrum Shades LLC is a prominent supplier of concrete color solutions, offering a wide range of pigments and coloring systems used in various concrete applications, including decorative concrete, precast concrete, and concrete pavers. The company prides itself on delivering high-quality colorants that meet the unique needs of its diverse clientele, including contractors, architects, and construction companies.

The company has recently observed a growing number of customer complaints regarding inconsistent color quality in their products. The discrepancies have led to a decline in customer satisfaction and a potential increase in product returns. By identifying and mitigating the factors causing color variations, the company can enhance product reliability, reduce customer complaints, and minimize return rates.

You are part of the data analysis team tasked with providing actionable insights to help Spectrum Shades LLC address the issues of inconsistent color quality and improve customer satisfaction.

Task 1

Before you can start any analysis, you need to confirm that the data is accurate and reflects what you expect to see.

It is known that there are some issues with the production_data table, and the data team have provided the following data description.

Write a query to return data matching this description. You must match all column names and description criteria.
Create a cleaned version of the dataframe.

You should start with the data in the file "production_data.csv".
Your output should be a dataframe named clean_data.
All column names and values should match the table below.

Column Name	Criteria
batch_id	Discrete. Identifier for each batch. Missing values are not possible.
production_date	Date. Date when the batch was produced.
raw_material_supplier	Categorical. Supplier of the raw materials. (1='national_supplier', 2='international_supplier'). Missing values should be replaced with 'national_supplier'.
pigment_type	Nominal. Type of pigment used. ['type_a', 'type_b', 'type_c']. Missing values should be replaced with 'other'.
pigment_quantity	Continuous. Amount of pigment added (in kilograms) (Range: 1 - 100). Missing values should be replaced with median.
mixing_time	Continuous. Duration of the mixing process (in minutes). Missing values should be replaced with mean.
mixing_speed	Categorical. Speed of the mixing process represented as categories: 'Low', 'Medium', 'High'. Missing values should be replaced with 'Not Specified'.
product_quality_score	Continuous. Overall quality score of the final product (rating on a scale of 1 to 10). Missing values should be replaced with mean.

import pandas as pd

# Load the data from the CSV file

data = pd.read_csv('production_data.csv')

# Replace missing values

df['raw_material_supplier'] = df['raw_material_supplier'].fillna('national_supplier')

df['pigment_type'].replace(['Type_C'], ['type_c'], inplace=True)
df['pigment_type'] = df['pigment_type'].fillna('other')

df['pigment_quantity'] = df['pigment_quantity'].fillna(df['pigment_quantity'].median())

df['mixing_time'] = df['mixing_time'].fillna(df['mixing_time'].mean())


df['mixing_speed'].replace(['-'], ['Not Specified'], inplace=True)
df['mixing_speed'] = df['mixing_speed'].fillna('Not Specified')


df['product_quality_score'] = df['product_quality_score'].fillna(df['product_quality_score'].mean())

# Ensure data types are correct

df['batch_id'] = df['batch_id'].astype(int)

df['production_date'] = pd.to_datetime(df['production_date'])

df['raw_material_supplier'] = df['raw_material_supplier'].astype('category')


df['pigment_type'] = df['pigment_type'].astype('category')
df['pigment_type'].replace(['-'], ['Not Specified'], inplace=True)


df['mixing_speed'] = df['mixing_speed'].astype('category')

# Filter pigment_quantity to be within the range 1 - 100

df = df[(df['pigment_quantity'] >= 1) & (df['pigment_quantity'] <= 100)]

# Create the cleaned dataframe

clean_data = df

# Display the cleaned dataframe

clean_data

Task 2

You want to understand how the supplier type and quantity of materials affect the final product attributes.

Calculate the average product_quality_score and pigment_quantity grouped by raw_material_supplier.

You should start with the data in the file 'production_data.csv'.
Your output should be a data frame named aggregated_data.
It should include the three columns: raw_material_supplier, avg_product_quality_score, and avg_pigment_quantity.
Your answers should be rounded to 2 decimal places.

import pandas as pd

# Load the data from the CSV file

data = pd.read_csv('production_data.csv')

# Group by raw_material_supplier and calculate the average product_quality_score and pigment_quantity

aggregated_data = data.groupby('raw_material_supplier').agg(

    avg_product_quality_score=('product_quality_score', 'mean'),

    avg_pigment_quantity=('pigment_quantity', 'mean')

).reset_index()

# Round the results to 2 decimal places

aggregated_data = aggregated_data.round({'avg_product_quality_score': 2, 'avg_pigment_quantity': 2})

# Display the resulting DataFrame

aggregated_data

Task 3

Identify all product_quality_score values for batches with a raw_material_supplier of 2 and a pigment_quantity greater than 35 kg. Use the original production data table, not the output of Task 2.

You should start with the data in the file 'production_data.csv'.
Your output should be a data frame named pigment_data.
It should include the three columns: raw_material_supplier, pigment_quantity, and product_quality_score.
Your answers should be rounded to 3 decimal places.

import pandas as pd

# Load the data from the CSV file

data = pd.read_csv('production_data.csv')

# Filter the data based on the given conditions

filtered_data = data[(data['raw_material_supplier'] == 2) & (data['pigment_quantity'] > 35)]

# Select the required columns and round the values to 3 decimal places

pigment_data = filtered_data[['raw_material_supplier', 'pigment_quantity', 'product_quality_score']].round(3)

# Display the resulting DataFrame

pigment_data

Task 4

In order to proceed with further analysis later, you need to analyze how various factors relate to product quality. Start by calculating the mean and standard deviation for the following columns: pigment_quantity, and product_quality_score.
These statistics will help in understanding the central tendency and variability of the data related to product quality.

Next, calculate the Pearson correlation coefficient between the following variables: pigment_quantity, and product_quality_score.
These correlation coefficients will provide insights into the strength and direction of the relationships between the factors and overall product quality.

You should start with the data in the file 'production_data.csv'.
Calculate the mean and standard deviation for the columns pigment_quantity and product_quality_score as: product_quality_score_mean, product_quality_score_sd, pigment_quantity_mean, pigment_quantity_sd.
Calculate the Pearson correlation coefficient between pigment_quantity and product_quality_score as: corr_coef
Your output should be a data frame named product_quality.
It should include the columns: product_quality_score_mean, product_quality_score_sd, pigment_quantity_mean, pigment_quantity_sd, corr_coef.
Ensure that your answers are rounded to 2 decimal places.

import pandas as pd

import numpy as np

# Load the data

data = pd.read_csv('production_data.csv')

# Calculate means

product_quality_score_mean = data['product_quality_score'].mean()

pigment_quantity_mean = data['pigment_quantity'].mean()

# Calculate standard deviations

product_quality_score_sd = data['product_quality_score'].std()

pigment_quantity_sd = data['pigment_quantity'].std()

# Calculate Pearson correlation coefficient

corr_coef = data['product_quality_score'].corr(data['pigment_quantity'])

# Create the output DataFrame

product_quality = pd.DataFrame({

    'product_quality_score_mean': [round(product_quality_score_mean, 2)],

    'product_quality_score_sd': [round(product_quality_score_sd, 2)],

    'pigment_quantity_mean': [round(pigment_quantity_mean, 2)],

    'pigment_quantity_sd': [round(pigment_quantity_sd, 2)],

    'corr_coef': [round(corr_coef, 2)]

})

product_quality

Project: Spectrum Shades LLC EDA

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Task 1

Task 2

Task 3

Task 4

Task 1