Skip to content

Data Quality Assessment: Task 1

Sprocket Central Pty Ltd

Source: https://tinyurl.com/yckpfhf5


1 hidden cell

Standard Data Quality

  • Correct Values (Accuracy)

  • Data Fields with Values (Completeness)

  • Values Free from Contradiction (Consistency)

  • Values up to Date (Currency)

  • Data Items with Value Meta-data (Relevancy)

  • Data Containing Allowable Values (Validity)

  • Records that are Duplicated (Uniqueness)

# |-----------------Names of all worksheets-----------------|

excel_sheets("./KPMG Customer Demo.xlsx")
Hidden output

1 hidden cell
c_demo
Hidden output

6 hidden cells

Customer Demography

The initial data contained 4,000 Observations and 13 Variables .

Note

  • 5 customers with missing data in their last name, job_title, job_industry_category were dropped because of the limited information.
  • 2 deceased customers were also dropped.
  • the default column was removed because of its irrelevant content.

Cleaned and validated customer demographic dataset contains 3,993 Rows and 12 Column

  • customer_id: (Complete).
  • first_name: (Complete).
  • last_name: (Incomplete). Customers can be contacted to provide their last names.
  • gender: (Inconsistent). Format of data collection should be given adhered to.
  • past_3_years_bike_related_purchases: (Complete).
  • DOB: (Complete).
  • job_title: (Incomplete): Customers can be contacted to provide their job title.
  • job_industry_category: (Incomplete). Customers can be contacted to provide their job title.
  • wealth segment: (Complete)
  • default: (Irrelevant). This column should be dropped.
  • owns_car: (Complete).
  • tenure: (Incomplete).
demo_clean1
numeric <- as.character(19644)
convert <- as.Date(numeric, origin = "1899-12-30")
convert

Transaction Data

The initial data contained 20,000 rows and 13 columns.

Note

  • Due to the limited information available, 197 customers who had missing data in their transaction details were excluded from the analysis.

  • Cleaned and validated customer demographic dataset contains 19,803 rows and 13 columns

  • All columns: (Complete).

Hidden code

Customer Address

The initial data contained 3,999 Observations and 6 Variables

Below is a cleaned and validated data.

  • customer_id: (Complete).
  • address: (Complete).
  • postcode: (Complete).
  • state: (Inconsitent). Data collection format should be given adhered to.
  • Country: (Accurate).
  • Property_valuation: (Complete)
Hidden code