### Task 1

The original data is 1500 rows and 8 columns.

- product_id: contains unique values and no missing values found. Therefore, this column is alinged with the criteria.
- category: this column meets the criteria and 25 found missing and replaced with "Unknown".
- animal: this column meets the criteria.
- size: although the values are correct, they are not consistent in terms of the lower/uppercases. No missing values found. Therefore, the values are modified to be consistent.
- price: the values are positive number and have 2 decimal places but the data type is string. There are 150 missing values that described as "unlisted". Firstly, the data type is changed into numeric(double) then replaced the missing values with the overall median sales.
- sales: this column meets the criteria. No missing values found.
- rating: the data type is double therefore it has to be changed into integer in order to meet the criteria as determined a discrete number. There are 150 missing values which are replaced with 0.
- repeat_purchase: the data type is double therefore it has to be changed into integer in order to set the nominal value. No missing values found.

### Task 2

### Data Discovery and Visualization

Equipment is the most observations with 221 repeat purchases or about 24% of the total repeat purchases. In general, the observations are balanced across categories of the variable repeat purchases; food, housing, medicine and toys; in average of 150 repeat purchases ro about 16%. While accessory has significantly low repeat purchases at 70 repeat purchases and unknown category has the least repeat purchases.

The pie chart suggests that almost 50% of the repeat purchases come from the small animal owners. Therefore, we investigated more and found that the most observations of 115 repeat purchases equivalent to approximately 13% of the total repeat purchases generated by small cat owners.

### Task 3

### Distribution of the sales

The highest sales is from equipment following by toys with the sales about 23% and 21% respectively. The lowest sales of almost 2% come from unknown category while the category of accessory has the sales about 8%.

Generally, the small cat owners tend to be the majority of customers as spending about 28%. On the other hand, the large cat owners are the least spenders which is about 2%.

Nevertheless, based on the animal size, it suggests that small animal owners have spent about 50% of the total sales while the large animal owners spent the least with about 23%.

### Task 4

### Relationship between repeat purchases and sales

The estimate is very close to zero, thus, a very weak relationship between sales and repeat_purchase. Moreoever, the p-value for the coefficient of 'sales' is 2.180470e-03 which is less than the general pratice of the significance level of 0.05. Therefore, there is evidence to reject the relationship. Consequently, it could not consider this as a strong relationship.