Next Day Rain Prediction in Australia
VerifiedAdded on 2023/04/03
|5
|1136
|58
AI Summary
This case study analyzes various attributes to predict whether it will rain in Australia tomorrow. It explores the dataset, analyzes methodologies, and evaluates models.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
CASE STUDY NAME :
NEXT DAY RAIN PREDICTION IN
AUSTRALIA
Contents:
1. Introduction
2. The Dataset
3. Analysis of Methodologies
4. Model Algorithms & Model Evaluations
5. Result & Conclusion
NEXT DAY RAIN PREDICTION IN
AUSTRALIA
Contents:
1. Introduction
2. The Dataset
3. Analysis of Methodologies
4. Model Algorithms & Model Evaluations
5. Result & Conclusion
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
1. Introduction
In this case study, we will analyze the various attributes to predict whether the rain will come on
the next day in Australia or not. We need to consider the various cities to accomplish this case
study. The data contains a number of meteorological observations as attributes and the class
attributes which forecasts that whether will rain come tomorrow or not.
2. The Dataset
The dataset considered for this case study was available on kaggle:
https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.
The data attributes are as follows:
Day, Month, Year of the observation
Location: the location of the observation
MinTemp: the daily minimum temperature in degrees Celsius
MaxTemp: the daily maximum temperature in degrees Celsius
Rainfall: the rainfall recorded for the day in mm
Evaporation: the evaporation (mm) in the 24 hours to 9am
Sunshine: hours of bright sunshine over the day.
WindGust: the direction of the strongest wind gust over the day.
WindGustSpeed: speed (km/h) of the strongest wind gust over the day.
WindDir9am: the direction of the wind at 9am
WindDir3pm: the direction of the wind at 3pm
WindSpeed9am: speed (km/hr) averaged over 10 minutes prior to 9am
WindSpeed3pm: speed (km/hr) averaged over 10 minutes prior to 3pm
Humidity9am: humidity (percent) at 9am
Humidity3pm: humidity (percent) at 3pm
Pressure9am: atmospheric pressure (hpa) reduced to mean sea level at 9am
Pressure3pm: atmospheric pressure (hpa) reduced to mean sea level at 3pm
Cloud9am: the fraction of sky obscured by cloud at 9am. This is measured in "oktas",
which are a unit of eigths. It records how many eigths of the sky are obscured by cloud. A
0 measure indicates completely clear sky whilst an 8 indicates that it is completely
overcast.
Cloud3pm: the fraction of sky obscured by cloud at 3pm.
Temp9am: temperature (degrees C) at 9am
Temp3pm: temperature (degrees C) at 3pm
RainToday: boolean: 1 if precipitation (mm) in the 24 hours to 9am exceeds 1mm,
otherwise 0
RainTomorrow: the target variable. Did it rain tomorrow?
In this case study, we will analyze the various attributes to predict whether the rain will come on
the next day in Australia or not. We need to consider the various cities to accomplish this case
study. The data contains a number of meteorological observations as attributes and the class
attributes which forecasts that whether will rain come tomorrow or not.
2. The Dataset
The dataset considered for this case study was available on kaggle:
https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.
The data attributes are as follows:
Day, Month, Year of the observation
Location: the location of the observation
MinTemp: the daily minimum temperature in degrees Celsius
MaxTemp: the daily maximum temperature in degrees Celsius
Rainfall: the rainfall recorded for the day in mm
Evaporation: the evaporation (mm) in the 24 hours to 9am
Sunshine: hours of bright sunshine over the day.
WindGust: the direction of the strongest wind gust over the day.
WindGustSpeed: speed (km/h) of the strongest wind gust over the day.
WindDir9am: the direction of the wind at 9am
WindDir3pm: the direction of the wind at 3pm
WindSpeed9am: speed (km/hr) averaged over 10 minutes prior to 9am
WindSpeed3pm: speed (km/hr) averaged over 10 minutes prior to 3pm
Humidity9am: humidity (percent) at 9am
Humidity3pm: humidity (percent) at 3pm
Pressure9am: atmospheric pressure (hpa) reduced to mean sea level at 9am
Pressure3pm: atmospheric pressure (hpa) reduced to mean sea level at 3pm
Cloud9am: the fraction of sky obscured by cloud at 9am. This is measured in "oktas",
which are a unit of eigths. It records how many eigths of the sky are obscured by cloud. A
0 measure indicates completely clear sky whilst an 8 indicates that it is completely
overcast.
Cloud3pm: the fraction of sky obscured by cloud at 3pm.
Temp9am: temperature (degrees C) at 9am
Temp3pm: temperature (degrees C) at 3pm
RainToday: boolean: 1 if precipitation (mm) in the 24 hours to 9am exceeds 1mm,
otherwise 0
RainTomorrow: the target variable. Did it rain tomorrow?
3. Analysis of Methodologies
Data Analysis and preprocessing
Before going for any model building part there is a necessity to do data preprocessing and
exploration. The statistics play an important role to depict the attribute behaviors.
I have performed the following data exploration and steps on data:
1. Find the view of the data & understand the data carefully
2. Calculate the dimension of the dataset
3. Find a summary of the data
4. Calculate the missing value and the missing value count
5. Replace NA value from the continuous-valued column by mean
6. Converting categorical values
7. After that omit NA value from a few of the observations which will not be going to
impact much on the result.
Splitting of the data
There is a standard methodology which shows how the model can be fitted. If we have two
different datasets like train and test then we can leave this part but if we don’t have then we need
to split data into two part train and test, for model building or fitting and model prediction. Because
of that's the model necessity.
In this case study, I have divided the whole dataset in the ratio of 70% in the train set and
30% in test set i,e validation set.
Feature Selection
There are few attributes which are not having much impact on target variable that's why we
remove them from the analysis object variable. The list of variables is Day, Month, Year &
Location. The reason to remove them is we can fit this to correlation plot to see the behavior of
the model else they are not much impactful.
Data Analysis and preprocessing
Before going for any model building part there is a necessity to do data preprocessing and
exploration. The statistics play an important role to depict the attribute behaviors.
I have performed the following data exploration and steps on data:
1. Find the view of the data & understand the data carefully
2. Calculate the dimension of the dataset
3. Find a summary of the data
4. Calculate the missing value and the missing value count
5. Replace NA value from the continuous-valued column by mean
6. Converting categorical values
7. After that omit NA value from a few of the observations which will not be going to
impact much on the result.
Splitting of the data
There is a standard methodology which shows how the model can be fitted. If we have two
different datasets like train and test then we can leave this part but if we don’t have then we need
to split data into two part train and test, for model building or fitting and model prediction. Because
of that's the model necessity.
In this case study, I have divided the whole dataset in the ratio of 70% in the train set and
30% in test set i,e validation set.
Feature Selection
There are few attributes which are not having much impact on target variable that's why we
remove them from the analysis object variable. The list of variables is Day, Month, Year &
Location. The reason to remove them is we can fit this to correlation plot to see the behavior of
the model else they are not much impactful.
4. Models/Algorithms & Model Evaluations
We considered all the attributes for the model building part instead of above mentioned four.
❏ Decision Tree
We have used the decision tree classifier to classify whether tomorrow will rain in
Australia or not. The decision tree used the tree-like structure to classify. Decision tree
splits the results based on the features. The confusion matrix shows the model accuracy
is 0.8315 i,e 83.15% that is a good one. The sensitivity and the specificity are 0.9517
and 0.4167 respectively. We have created one more model for Decision Tree but we get
the accuracy of same as above like 83% approximate.
❏ Naive Bayes Model
Naive Bayes works on probabilities to predict or to classify. We have used naiveBayes()
to fit the model. The confusion matrix shows the model accuracy is 0.8202 i,e 82.02%.
The sensitivity and the specificity are 0.8986 and 0.5500 respectively.
❏ Random Forest
The Random Forest Model uses the ensemble learning method to summaries the
number of decision trees and classifies the result. The confusion matrix shows the model
accuracy is 0.8427 i,e 84.27%. The sensitivity and the specificity are 0.9420 and 0.5000
respectively.
❏ Bagging (Bootstrap Aggregation)
The actual Bagging is used to reduce the variance of a decision tree. The Bagging uses
all the features under consideration. The confusion matrix shows the model accuracy is
0.8427 i,e 84.27%. The sensitivity and the specificity are 0.9469 and 0.4833 respectively.
❏ Boosting
Boosting provides the kind of a boost to machine learning models to improve their
accuracy of prediction. The confusion matrix shows the model accuracy is 0.8504 i,e
85.04%. The sensitivity and the specificity are 0.9565 and 0.4833 respectively. We can
conclude for this model that it has really helped to boost the accuracy of the model by 3%
approximate.
❏ Artificial Neural Network
In the ANN, we can feed all the attributes exclude the target one to the input layer and
We considered all the attributes for the model building part instead of above mentioned four.
❏ Decision Tree
We have used the decision tree classifier to classify whether tomorrow will rain in
Australia or not. The decision tree used the tree-like structure to classify. Decision tree
splits the results based on the features. The confusion matrix shows the model accuracy
is 0.8315 i,e 83.15% that is a good one. The sensitivity and the specificity are 0.9517
and 0.4167 respectively. We have created one more model for Decision Tree but we get
the accuracy of same as above like 83% approximate.
❏ Naive Bayes Model
Naive Bayes works on probabilities to predict or to classify. We have used naiveBayes()
to fit the model. The confusion matrix shows the model accuracy is 0.8202 i,e 82.02%.
The sensitivity and the specificity are 0.8986 and 0.5500 respectively.
❏ Random Forest
The Random Forest Model uses the ensemble learning method to summaries the
number of decision trees and classifies the result. The confusion matrix shows the model
accuracy is 0.8427 i,e 84.27%. The sensitivity and the specificity are 0.9420 and 0.5000
respectively.
❏ Bagging (Bootstrap Aggregation)
The actual Bagging is used to reduce the variance of a decision tree. The Bagging uses
all the features under consideration. The confusion matrix shows the model accuracy is
0.8427 i,e 84.27%. The sensitivity and the specificity are 0.9469 and 0.4833 respectively.
❏ Boosting
Boosting provides the kind of a boost to machine learning models to improve their
accuracy of prediction. The confusion matrix shows the model accuracy is 0.8504 i,e
85.04%. The sensitivity and the specificity are 0.9565 and 0.4833 respectively. We can
conclude for this model that it has really helped to boost the accuracy of the model by 3%
approximate.
❏ Artificial Neural Network
In the ANN, we can feed all the attributes exclude the target one to the input layer and
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
analyze the output of the target variable at the output layer.
The accuracy which we got for this particular model was 0.8876 means 88.76% which is
really very good as compare to other models.
❏ ROC & AUC
The ROC & AUC Curve are estimation parameters for checking the performance of the
classification problem. The ROC Indicates the probability curve where AUC measures the
degree of separability or variance. We have calculated the ROC & AUC of the model in
question 5.
5. Result & Conclusion
The expected performance of the Artificial Neural Network is worked well as compare to all other
model and boosting model also. Both models are performed well. It is always been important to
have use of ANN for classification for this kind of approach and result. So here we can conclude
to say that to classify whether tomorrow will be rain in Australia for that selection of attributes and
the right modeling methods are really very important factor.
The accuracy which we got for this particular model was 0.8876 means 88.76% which is
really very good as compare to other models.
❏ ROC & AUC
The ROC & AUC Curve are estimation parameters for checking the performance of the
classification problem. The ROC Indicates the probability curve where AUC measures the
degree of separability or variance. We have calculated the ROC & AUC of the model in
question 5.
5. Result & Conclusion
The expected performance of the Artificial Neural Network is worked well as compare to all other
model and boosting model also. Both models are performed well. It is always been important to
have use of ANN for classification for this kind of approach and result. So here we can conclude
to say that to classify whether tomorrow will be rain in Australia for that selection of attributes and
the right modeling methods are really very important factor.
1 out of 5
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.