Madrid Properties Predictive Model
Abstract
The real estate industry is one of the industries that generates the highest income worldwide. In Spain, this industry generated a GDP of €50,000 million, indirectly and directly employing 1.3 million people. Madrid, a large European city, is no stranger to this. The considerable investment attractiveness for both local and foreign private entities, together with the purchase and sale of assets by the residents of this city, make creating a predictive model for properties in Madrid extremely attractive for any company that works within the sector.
1. Objective
Build a predictive price property model based on information obtained from a dataset with real property values in the Spanish capital.
2. Commercial Context
The model is going to be very attractive for its commercialization. On one hand, it serves as a marketing attraction tool for sellers who want to know the price of their property in exchange of leaving their contact information. This information can then be used for targeted promotion of the real estate asset.
Secondly, the model can be a valuable tool for industry professionals involved in property valuation. It provides a preliminary estimate of a property's value, allowing professionals to assess its worth before conducting on-site visits.
Lastly, the model can empower buyers by offering insights into the value of properties they are interested in purchasing. This helps them make informed decisions and ensures they have a better understanding of the properties they are considering.
3. Hypotheses
- The value of the property is determined by its location.
- Having built-in wardrobes increases the price.
- If the property has a pool, the price increases considerably.
- The price of exterior properties is 30% higher than interior properties.
- Properties with an elevator increase the price by 20%.
- Having balconies significantly increases the price.
- Having more rooms increases the price of the property.
- The size of the property determines its price.
4. Analytical Context
Variables:
- id: property id
- title: propertys' title on the website
- subtitle: description about the property on the website
- sq_mt_built: build squared meters
- sq_mt_useful: build squared meters except for the walls
- n_rooms: number of rooms
- n_bathrooms: number of bathrooms
- n_floors: how many levels the property has.
- sq_mt_allotment: squared meters in total
- latitude: location
- longitude: location
- raw_address: exact property's address
- is_exact_address_hidden: hidden information or not
- street_name: street name
- street_number: building's number
- portal: in case there's an appartment complex would have a portal number
- floor: level of the floor
- is_floor_under: for basement floor
- door: doors number
- neighborhood_id: Madrid assigned id by neighborhood
- operation: rento or sale
- rent_price: price for rent
- rent_price_by_area: rent by m2
- is_rent_price_known: additional information
- buy_price: cost of the property (target variable)
- buy_price_by_area: price by m2
- is_buy_price_known: additional information
- house_type_id: apartment, house, building
- is_renewal_needed: renewal for the property
- is_new_development: new construction
- built_year: year of construction
- has_central_heating: common heating for the whole building
- has_individual_heating: individual heating
- are_pets_allowed: for rent properties
- has_ac: air conditioner
- has_fitted_wardrobes: wardrobes that cannot be moved
- has_lift: lift
- is_exterior: exterior property
- has_garden: yes or no
- has_pool: yes or no
- has_terrace: yes or no
- has_balcony: yes or no
- has_storage_room: large storage room
- is_furnished: yes or no
- is_kitchen_equipped: yes or no
- is_accessible: yes or no
- has_green_zones: yes or no
- energy_certificate: certificato to know consume
- has_parking: yes or no
- has_private_parking: yes or no
- has_public_parking: yes or no
- is_parking_included_in_price: yes or no
- parking_price: price of parking
- is_orientation_north: yes or no
- is_orientation_west: yes or no
- is_orientation_south: yes or no
- is_orientation_east: yes or no
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
!pip install catboost
madrid_df = pd.read_csv('https://raw.githubusercontent.com/juanarangio/madrid_houses/main/houses_Madrid.csv', index_col=0)
madrid_df
### Deleting not important columns. This was decided as consequence of previous data exploration.
madrid_df = madrid_df.drop(columns=['is_exact_address_hidden', 'street_number', 'portal', 'is_floor_under', 'door', 'operation', 'rent_price_by_area','is_buy_price_known', 'are_pets_allowed', 'has_ac', 'energy_certificate', 'has_public_parking', 'has_public_parking', 'is_orientation_north', 'is_orientation_west', 'is_orientation_south', 'is_orientation_east', 'latitude', 'longitude', 'is_rent_price_known','is_furnished', 'is_kitchen_equipped', 'has_private_parking', 'subtitle', 'raw_address', 'street_name', 'sq_mt_useful', 'sq_mt_allotment', 'n_floors' ])
madrid_df.drop_duplicates(inplace=True)
madrid_df.info()
Data Exploration
5. Load and deleting columns
After carefully reviewing the column types and null values, it was decided to delete those columns that could interfere with the exploration and data modeling process. Additionally, there are some columns that are not important for my predictive model.
madrid_df.isnull().sum()
##Dropping 126 rows with no sqm2 as there's no info in the title about the sqm2.
missing_sq_mt_built = madrid_df[madrid_df['sq_mt_built'].isna()]
for title in missing_sq_mt_built['title']:
print(title)