Skip to content

Housing in Brazil πŸ‡§πŸ‡·

# Import libraries

import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px

Import and Prepare data

# Read the CSV with dollar sign and commas
df1 = pd.read_csv('brasil-real-estate-1.csv', encoding='latin1')
df1.head()
df1.drop(df1.columns[0], axis=1, inplace=True)
df1.info()
df1.dropna(inplace=True)

df1.info()

Task 1.5.3: Use the "lat-lon" column to create two separate columns in df1: "lat" and "lon". Make sure that the data type for these new columns is float.

df1[["lat", "lon"]] = df1["lat-lon"].str.split(",", expand=True).astype(float)
df1.head()

Task 1.5.4: Use the "place_with_parent_names" column to create a "state" column for df1. (Note that the state name always appears after "|Brasil|" in each string.)

df1["state"] = df1["place_with_parent_names"].str.split("|", expand=True)[2]
df1.head()

Task 1.5.5: Transform the "price_usd" column of df1 so that all values are floating-point numbers instead of strings.

df1["price_usd"] = df1["price_usd"].str.replace(r"[\$,]", "", regex=True).astype(float)
df1.head()

Task 1.5.6: Drop the "lat-lon" and "place_with_parent_names" columns from df1.

df1.drop(columns=["lat-lon", "place_with_parent_names"], inplace=True)
df1.head()

Now that you have cleaned data/brasil-real-estate-1.csv and created df1, you are going to import and clean the data from the second file, brasil-real-estate-2.csv.

Task 1.5.7: Import the CSV file brasil-real-estate-2.csv into the DataFrame df2.

β€Œ
β€Œ
β€Œ