Introduction
This is a dataset that I found on Kaggle. It documents various weather data across a 10 year span in various cities across Australia. The goal with this data is to build models to predict if there will be rain tomorrow. The dataset contains a target variable called RainTomorrow with a No or Yes (1mm or more).
Since this dataset contains multiple cities that span across an entire continent, I will focus on one specific city to help predict more localized weather events. We don't want to see other weather areas affecting our predictions. I have chosen the city of Sydney to make my predicitons on.
Source - https://www.kaggle.com/jsphyg/weather-dataset-rattle-package
Import Modules and Data
#Import modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import preprocessing
import os as os
from sklearn.metrics import mean_squared_error
%matplotlib inline
import sys
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.model_selection import GridSearchCV#import the weather data csv
weather = pd.read_csv('weatherAUS.csv', sep=',', engine='python')
weather.shapeThe Dataset
# Move Target Variable to front of dataframe
targetName = 'RainTomorrow'
targetSeries = weather[targetName]
del weather[targetName]
weather.insert(0, targetName, targetSeries)weather.head()weather.tail()Since weather is best predictable locally, I want to focus on one city in Australia, Sydney. I will now filter the dataset down to the Sydney observations.
#Create dataframe where Location is Sydney
weather_syd=weather.query('Location == "Sydney"')
weather_syd.head()weather_syd.shape#Drop date and location fields, they are not needed.
weather_syd=weather_syd.drop(['Date', 'Location'],axis=1)weather_syd.dtypesBreakdown of the attributes from source -
- MinTemp - Minimum temperature in the 24 hours to 9am in degrees Celsius
- MaxTemp - Maximum temperature in the 24 hours from 9am in degrees Celsius
- Rainfall - Precipitation (rainfall) in the 24 hours to 9am in millimeters
- Evaporation - "Class A" pan evaporation in the 24 hours to 9am in millimeters
- Sunshine - Bright sunshine in the 24 hours to midnight in hours
- WindGustDir - Direction of strongest gust in the 24 hours to midnight in compass points
- WindGustSpeed - Speed of strongest wind gust in the 24 hours to midnight in kilometers per hour
- WindDir9am - Wind direction averaged over 10 minutes prior to 9 am in compass points
- WindDir3pm - Wind direction averaged over 10 minutes prior to 3 pm in compass points
- WindSpeed9am - Wind speed averaged over 10 minutes prior to 9 am in kilometers per hour
- WindSpeed3pm - Wind speed averaged over 10 minutes prior to 3 pm in kilometers per hour
- Humidity9am - Relative humidity at 9 am in percent
- Humidity3pm - Relative humidity at 3 pm in percent
- Pressure9am - Atmospheric pressure reduced to mean sea level at 9 am in hectopascals
- Pressure3pm - Atmospheric pressure reduced to mean sea level at 3 pm in hectopascals
- Cloud9am - Fraction of sky obscured by cloud at 9 am in eighths
- Cloud3pm - Fraction of sky obscured by cloud at 3 pm in eighths
- Temp9am - Temperature at 9 am in degrees Celsius
- Temp3pm - Temperature at 3 pm in degrees Celsius
- RainToday - Yes/No if rained today more than 1mm+
TARGET VARIABLE
- RainTomorrow - Yes/No if rained tomorrow more than 1mm
Data was compiled and sourced from the Australian Government Bureau of Meteorology
#Check for Null Values
#weather_syd.isna().any() - omitted for space