Skip to content
New Workbook
Sign up
Madrid Properties Predictive Model

Madrid Properties Predictive Model

Abstract

The real estate industry is one of the industries that generates the highest income worldwide. In Spain, this industry generated a GDP of €50,000 million, indirectly and directly employing 1.3 million people. Madrid, a large European city, is no stranger to this. The considerable investment attractiveness for both local and foreign private entities, together with the purchase and sale of assets by the residents of this city, make creating a predictive model for properties in Madrid extremely attractive for any company that works within the sector.

1. Objective

Build a predictive price property model based on information obtained from a dataset with real property values in the Spanish capital.

2. Commercial Context

The model is going to be very attractive for its commercialization. On one hand, it serves as a marketing attraction tool for sellers who want to know the price of their property in exchange of leaving their contact information. This information can then be used for targeted promotion of the real estate asset.

Secondly, the model can be a valuable tool for industry professionals involved in property valuation. It provides a preliminary estimate of a property's value, allowing professionals to assess its worth before conducting on-site visits.

Lastly, the model can empower buyers by offering insights into the value of properties they are interested in purchasing. This helps them make informed decisions and ensures they have a better understanding of the properties they are considering.

3. Hypotheses

  1. The value of the property is determined by its location.
  2. Having built-in wardrobes increases the price.
  3. If the property has a pool, the price increases considerably.
  4. The price of exterior properties is 30% higher than interior properties.
  5. Properties with an elevator increase the price by 20%.
  6. Having balconies significantly increases the price.
  7. Having more rooms increases the price of the property.
  8. The size of the property determines its price.

4. Analytical Context

Variables:

  1. id: property id
  2. title: propertys' title on the website
  3. subtitle: description about the property on the website
  4. sq_mt_built: build squared meters
  5. sq_mt_useful: build squared meters except for the walls
  6. n_rooms: number of rooms
  7. n_bathrooms: number of bathrooms
  8. n_floors: how many levels the property has.
  9. sq_mt_allotment: squared meters in total
  10. latitude: location
  11. longitude: location
  12. raw_address: exact property's address
  13. is_exact_address_hidden: hidden information or not
  14. street_name: street name
  15. street_number: building's number
  16. portal: in case there's an appartment complex would have a portal number
  17. floor: level of the floor
  18. is_floor_under: for basement floor
  19. door: doors number
  20. neighborhood_id: Madrid assigned id by neighborhood
  21. operation: rento or sale
  22. rent_price: price for rent
  23. rent_price_by_area: rent by m2
  24. is_rent_price_known: additional information
  25. buy_price: cost of the property (target variable)
  26. buy_price_by_area: price by m2
  27. is_buy_price_known: additional information
  28. house_type_id: apartment, house, building
  29. is_renewal_needed: renewal for the property
  30. is_new_development: new construction
  31. built_year: year of construction
  32. has_central_heating: common heating for the whole building
  33. has_individual_heating: individual heating
  34. are_pets_allowed: for rent properties
  35. has_ac: air conditioner
  36. has_fitted_wardrobes: wardrobes that cannot be moved
  37. has_lift: lift
  38. is_exterior: exterior property
  39. has_garden: yes or no
  40. has_pool: yes or no
  41. has_terrace: yes or no
  42. has_balcony: yes or no
  43. has_storage_room: large storage room
  44. is_furnished: yes or no
  45. is_kitchen_equipped: yes or no
  46. is_accessible: yes or no
  47. has_green_zones: yes or no
  48. energy_certificate: certificato to know consume
  49. has_parking: yes or no
  50. has_private_parking: yes or no
  51. has_public_parking: yes or no
  52. is_parking_included_in_price: yes or no
  53. parking_price: price of parking
  54. is_orientation_north: yes or no
  55. is_orientation_west: yes or no
  56. is_orientation_south: yes or no
  57. is_orientation_east: yes or no
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
!pip install catboost
madrid_df = pd.read_csv('https://raw.githubusercontent.com/juanarangio/madrid_houses/main/houses_Madrid.csv', index_col=0)

madrid_df
### Deleting not important columns. This was decided as consequence of previous data exploration.

madrid_df = madrid_df.drop(columns=['is_exact_address_hidden', 'street_number', 'portal', 'is_floor_under', 'door', 'operation', 'rent_price_by_area','is_buy_price_known', 'are_pets_allowed', 'has_ac', 'energy_certificate', 'has_public_parking', 'has_public_parking', 'is_orientation_north', 'is_orientation_west', 'is_orientation_south', 'is_orientation_east', 'latitude', 'longitude', 'is_rent_price_known','is_furnished', 'is_kitchen_equipped', 'has_private_parking', 'subtitle', 'raw_address', 'street_name', 'sq_mt_useful', 'sq_mt_allotment', 'n_floors' ])
madrid_df.drop_duplicates(inplace=True)
madrid_df.info()

Data Exploration

5. Load and deleting columns

After carefully reviewing the column types and null values, it was decided to delete those columns that could interfere with the exploration and data modeling process. Additionally, there are some columns that are not important for my predictive model.

madrid_df.isnull().sum()
##Dropping 126 rows with no sqm2 as there's no info in the title about the sqm2. 

missing_sq_mt_built = madrid_df[madrid_df['sq_mt_built'].isna()]

for title in missing_sq_mt_built['title']:
    print(title)