Data Quality Assessment: Task 1
Sprocket Central Pty Ltd
Source: https://tinyurl.com/yckpfhf5
1 hidden cell
Standard Data Quality
-
Correct Values (Accuracy)
-
Data Fields with Values (Completeness)
-
Values Free from Contradiction (Consistency)
-
Values up to Date (Currency)
-
Data Items with Value Meta-data (Relevancy)
-
Data Containing Allowable Values (Validity)
-
Records that are Duplicated (Uniqueness)
# |-----------------Names of all worksheets-----------------|
excel_sheets("./KPMG Customer Demo.xlsx")
1 hidden cell
c_demo6 hidden cells
Customer Demography
The initial data contained 4,000 Observations and 13 Variables .
Note
- 5 customers with missing data in their last name, job_title, job_industry_category were dropped because of the limited information.
- 2 deceased customers were also dropped.
- the default column was removed because of its irrelevant content.
Cleaned and validated customer demographic dataset contains 3,993 Rows and 12 Column
- customer_id: (Complete).
- first_name: (Complete).
- last_name: (Incomplete). Customers can be contacted to provide their last names.
- gender: (Inconsistent). Format of data collection should be given adhered to.
- past_3_years_bike_related_purchases: (Complete).
- DOB: (Complete).
- job_title: (Incomplete): Customers can be contacted to provide their job title.
- job_industry_category: (Incomplete). Customers can be contacted to provide their job title.
- wealth segment: (Complete)
- default: (Irrelevant). This column should be dropped.
- owns_car: (Complete).
- tenure: (Incomplete).
demo_clean1numeric <- as.character(19644)
convert <- as.Date(numeric, origin = "1899-12-30")
convertTransaction Data
The initial data contained 20,000 rows and 13 columns.
Note
-
Due to the limited information available, 197 customers who had missing data in their transaction details were excluded from the analysis.
-
Cleaned and validated customer demographic dataset contains 19,803 rows and 13 columns
-
All columns: (Complete).
Customer Address
The initial data contained 3,999 Observations and 6 Variables
Below is a cleaned and validated data.
- customer_id: (Complete).
- address: (Complete).
- postcode: (Complete).
- state: (Inconsitent). Data collection format should be given adhered to.
- Country: (Accurate).
- Property_valuation: (Complete)