Estimating a used cars listing prices.
Overview
Discount Motors is a used car dealership in the UK. They want to lead the way in used cars. Selling to customers who want the latest and greatest features, without the price tag of a brand new car.
The UK Government has now announced that from 2030 all new cars will be required to be zero emissions. Although this won’t impact the used car market, it is expected that buyers will give more consideration to the future value of their cars. And petrol and diesel will likely have a much lower value after 2030.
The Head of Data Science has received a request from the Sales Team to predict the selling price of cars. The Sales Team wants to automate the process of estimating the selling price and they want the predictions to be within 10% of the listed price. The current team members are around 30% off the price the car will sell for. The Head of Data Science has assigned the task to the recipient and has provided more details in the "Guide to Data Science Projects" and "Data Information" sections. The Head of Data Science will be on vacation for the next few weeks but has asked the recipient to include any decisions in their work. The recipient will have to present their findings to the Sales Team in a presentation.
Dataset
The sales team has pulled some data from the website listings from the last 6 months. They haven’t told us if the cars sold or how long it took to sell if it did, we just know they were listed and the price they were listed at. Also not all information from advert has been extracted.
| Column Name | Details |
|---|---|
| model | Character, the model of the car, 18 possible values |
| year | Numeric, year of registration from 1998 to 2020 |
| price | Numeric, listed value of the car in GBP |
| transmission | Character, one of "Manual", "Automatic", "Semi-Auto" or "Other" |
| mileage | Numeric, listed mileage of the car at time of sale |
| fuelType | Character, one of "Petrol", "Hybrid", "Diesel" or "Other" |
| tax | Numeric, road tax in GBP. Calculated based on CO2 emissions or a fixed price depending on the age of the car. |
| mpg | Numeric, miles per gallon as reported by manufacturer |
| engineSize | Numeric, listed engine size, one of 16 possible values |
Data validation
- Let's take a look
import pandas as pd
toyota = pd.read_csv('toyota.csv')
display(toyota.head())- Initial dataset information
display(toyota.info())Columns: model | transmission | fuelType
toyota.model.unique()toyota.fuelType.unique()- Removed leading whitespace from
model - Converted
Dtypefromobjecttocategory