City of Rome Weather Analysis & Prediction
In this workbook I will analyze how Rome climate changed across the last four decades. Particularly, I will focus about temperature related aspects: annual average temperature and number of fog days. At the end, we will see how Rome climate could change in the near future.
Unfortunately, rain precipitation data are not reliable, so I will not use them in the analysis.
1. Configuration, Data Mining & Data Cleaning
In this section we will collect and clean data.
Config
Set variables for city, start year / month and end year / month.
city = 'Roma'
start_year = 1980
start_month = 1
end_year = 2020
end_month = 12
Config #2
Import libraries and data structures.
# Import libraries
import requests, csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set seaborn parameters
sns.set(rc = {'figure.figsize': (8, 4)}, font = 'calibri')
sns.set_context('notebook')
sns.set_style('whitegrid', {'grid.linestyle': ':', 'axes.spines.right': False, 'axes.spines.top': False})
# Import data structures
month_list = ['Gennaio', 'Febbraio', 'Marzo', 'Aprile', 'Maggio', 'Giugno', 'Luglio', 'Agosto', 'Settembre', 'Ottobre', 'Novembre', 'Dicembre']
header = ['city', 'date', 't_avg_c', 't_min_c', 't_max_c', 'dew_point_c', 'humidity_%', 'visibility_km', 'wind_avg_kmh', 'wind_max_kmh', 'gust_kmh', 'air_pressure_asl_mb', 'air_pressure_avg_mb', 'rain_mm', 'phenomena']
convert_dict = {'t_avg_c': float, 't_min_c': float, 't_max_c': float, 'dew_point_c': float, 'humidity_%': int, 'visibility_km': int, 'wind_avg_kmh': int, 'wind_max_kmh': int, 'gust_kmh': int, 'air_pressure_asl_mb': int, 'air_pressure_avg_mb': int, 'rain_mm': float}
Data Scraping
Download weather data from www.ilmeteo.it and store them into weather_list list (it takes a while).
weather_list = []
for year in range(start_year, end_year + 1):
for month in range(start_month - 1, end_month):
CSV_URL = 'https://www.ilmeteo.it/portale/archivio-meteo/' + city + '/' + str(year) + '/' + month_list[month] + '?format=csv'
with requests.Session() as s:
download = s.get(CSV_URL) # Set the connection
decoded_content = download.content.decode('utf-8') # Decode csv content
records = csv.reader(decoded_content.splitlines(), delimiter = ';') # Read csv content
weather_list += list(records)[1::] # Convert "csv.reader" object in list and glue it into "weather_list" with no header
Data Cleaning
Glue weather data into weather_df DataFrame and clean them.
weather_df = pd.DataFrame(weather_list, columns = header)
# Replace empty cells with NaN, NaN phenomena with zeros, commas with dots
weather_df = weather_df.replace(r'^\s*$', np.nan, regex = True)
weather_df['phenomena'] = weather_df['phenomena'].fillna('none')
weather_df = weather_df.replace(',', '.', regex = True)
# Drop NaN
weather_df.dropna(inplace = True)
# Convert with correct data types
weather_df['date'] = pd.to_datetime(weather_df['date'], dayfirst = True)
weather_df = weather_df.astype(convert_dict)
display(weather_df.head())
Data Cleaning #2
Clean data on 'phenomena' column.
weather_df.loc[weather_df['phenomena'].str.contains('temporale|grandine'), 'phenomena'] = 'storm'
weather_df.loc[weather_df['phenomena'].str.contains('pioggia'), 'phenomena'] = 'rain'
weather_df.loc[weather_df['phenomena'].str.contains('neve'), 'phenomena'] = 'snow'
weather_df.loc[weather_df['phenomena'].str.contains('nebbia'), 'phenomena'] = 'fog'
print(weather_df['phenomena'].unique())
2. Exploratory Analysis
In this section we will analyze correlation between variables.
Analysis
Group weather data by year.