Data Quality Review - KPMG Dataset
To ensure that the KPMG dataset is ready for analysis, we need to perform a data quality review. This review involves checking for missing values, data types, outliers, and any other issues that may affect the integrity and reliability of the data.
Let's start by loading the KPMG dataset and examining its structure and content.
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
echo = TRUE,
warning = FALSE,
message = FALSE)# Load the KPMG_VI_New_raw_data_update_final.xlsx dataset
library(readxl)
kpmg_data_transactions <- read_excel('KPMG_VI_New_raw_data_update_final.xlsx', sheet = "Transactions", skip = 1)
kpmg_data_customers <- read_excel('KPMG_VI_New_raw_data_update_final.xlsx', sheet = "NewCustomerList", skip = 1)
kpmg_data_demographics <- read_excel('KPMG_VI_New_raw_data_update_final.xlsx', sheet = "CustomerDemographic", skip = 1)
kpmg_data_addresses <- read_excel('KPMG_VI_New_raw_data_update_final.xlsx', sheet = "CustomerAddress", skip = 1)
# Print the header
head(kpmg_data_transactions)Data Quality Evaluation - kpmg_data_transactions
To evaluate the data quality of the kpmg_data_transactions variable, we can perform various checks and analyses. Let's start by examining the structure and content of the dataset.
Data Quality Evaluation - kpmg_data_transactions
To evaluate the data quality of the kpmg_data_transactions variable, we can perform various checks and analyses. Let's start by examining the structure and content of the dataset.
# Check the structure of the dataset
str(kpmg_data_transactions)# Check for missing values
sum(is.na(kpmg_data_transactions))# Check data types
sapply(kpmg_data_transactions, class)# Check for outliers
boxplot(kpmg_data_transactions$list_price,
main = "List Price Range")
boxplot(kpmg_data_transactions$standard_cost,
main = "Standard Cost Range")Data Quality Evaluation - kpmg_data_customers
To evaluate the data quality of the kpmg_data_customers variable, we can perform various checks and analyses. Let's start by examining the structure and content of the dataset.
Structure of the Dataset
Let's first check the structure of the kpmg_data_customers dataset.
# Check the structure of the dataset
str(kpmg_data_customers)Missing Values
Next, let's check for missing values in the kpmg_data_customers dataset.
# Check for missing values
sum(is.na(kpmg_data_customers))Data Types
Now, let's examine the data types of the variables in the kpmg_data_customers dataset.