Predicting Temperature using Linear Regression Model

Verified

Added on  2022/12/27

|17
|3505
|68
AI Summary
This study focuses on developing a single point temperature forecast model using Multiple Linear Regression (MLR). The research analyzes time series data from April 2006 to September 2016 and examines the impact of factors like humidity, wind speed, wind bearing, visibility, and pressure on temperature. The results show significant relationships between these factors and temperature.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Predicting Temperature using linear regression model
Research Methodology
Student Name:
Instructor Name:
Course Number:
5th May 2019
1

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
ABSTRACT
The main goal of this particular study was to develop a single point temperature forecast model
utilizing Multiple Linear Regression (MLR). Time series data spanning from April 2006 to
September 2016 was utilized. About 96453 observational cases were considered for analysis in
this study. Results showed that all the five factors considered significantly impacted on the
earth’s surface temperature. Three of the factors considered had negative relationship with the
dependent variable (temperature). The factors that had inverse relationship with the dependent
variable (temperature) include humidity, wind speed and pressure. The other two factors (wind
bearing and visibility) had positive relationship with dependent variable (temperature).
INTRODUCTION
Our day to day lives (including those of other living creatures/organisms) are greatly influenced
by the weather and climate. Especially temperature has significant effect on our lives. Thousands
of lives all over the world are taken away every year more so during summer. More than five
hundred thousand chickens died in Georgia alone amid a two-day time span at the pinnacle of the
summer heat (Donald, 2011).
Estimation of timely and accurate temperature is necessary in helping to take prudent steps
(Christoph, et al., 2009). Precise count of what the atmosphere will do in the coming days is
quite challenging based on the fact that the atmospheric environment is dynamic in influencing
the observed of the earth surface (Shengpan, et al., 2012). The goal of this investigation is to
create single point temperature forecast model utilizing Multiple Linear Regression (MLR).
Academic Community has over the past recommended numerous both linear and no-linear
2
Document Page
techniques of predicting temperature, yet at the same time MLR has always been chosen based
on the fact that linear models frequently produce preferable estimates over non-linear models
notwithstanding when the given data is non-linear (Chatfield, 2009) and furthermore factual
plans require little calculation time to make a prediction (Dhawal & Mishra, 2016).
METHODOLOGY
A time series data was collected to enable answer the research question. Multiple linear
regression (MLR) was employed to try and predict the temperature using factors such as
humidity, wind speed, wind bearing, visibility and pressure.
Other statistical measures performed include the Pearson correlation test between the variables.
The following regression equation model was estimated.
y=β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 +ε
Where we have the variables as follows;
y=Temperature , x1 =humidity , x2=wind speed ( km
h ), x3=wind bearing ( degrees ) , x4 =visibility ( km )x5= pressur
β0=Intercpt coefficient , β1=coefficient for the humidity , β2=coefficient for the wind speed , β3=coefficient for win
DATA
Time series data was used to predict the temperature for this particular study. The data was
spanning from April 2006 to September 2016 (with a total of 96,453 data points). Table 1 below
presents a section (first 10 cases) of the data.
Table 1: Data
Formatted Date Temperature Humidity Wind Wind Visibilit Pressure
3
Document Page
(C) Speed
(km/h)
Bearing
(degrees)
y (km) (millibars)
2006-04-01 00:00:00.000 +0200 9.472222 0.89 14.1197 251 15.8263 1015.13
2006-04-01 01:00:00.000 +0200 9.355556 0.86 14.2646 259 15.8263 1015.63
2006-04-01 02:00:00.000 +0200 9.377778 0.89 3.9284 204 14.9569 1015.94
2006-04-01 03:00:00.000 +0200 8.288889 0.83 14.1036 269 15.8263 1016.41
2006-04-01 04:00:00.000 +0200 8.755556 0.83 11.0446 259 15.8263 1016.51
2006-04-01 05:00:00.000 +0200 9.222222 0.85 13.9587 258 14.9569 1016.66
2006-04-01 06:00:00.000 +0200 7.733333 0.95 12.3648 259 9.982 1016.72
2006-04-01 07:00:00.000 +0200 8.772222 0.89 14.1519 260 9.982 1016.84
2006-04-01 08:00:00.000 +0200 10.82222 0.82 11.3183 259 9.982 1017.37
2006-04-01 09:00:00.000 +0200 13.77222 0.72 12.5258 279 9.982 1017.22
STATISTICAL DATA ANALYSIS:
Descriptive Statistics
Table 2 below presents the descriptive statistics for the six variables (including the dependent
variable- temperature). We can see from the table that, the average temperature is 11.93 with a
standard deviation of 9.55 and a median temperature of 12.00. The maximum and the minimum
temperature values are given as 39.91 and -21.82 respectively. The skewness value for the
temperature is 0.094 (a value very close to zero), this suggests that the distribution for
temperature is close to normal distribution. The average humidity was 0.735 (SD = 0.195) with a
median humidity of 0.780. The skewness value for humidity was -0.716; this shows that the
variable humidity is slightly negatively skewed.
Table 2: Descriptive statistics
Temperature
(C)
Humidity Wind Speed
(km/h)
Wind Bearing
(degrees)
Visibility
(km)
Pressure
(millibars)
Mean 11.933 0.735 10.811 187.509 10.347 1003.236
Standard Error 0.031 0.001 0.022 0.346 0.013 0.377
Median 12.000 0.780 9.966 180.000 10.046 1016.450
Mode 7.222 0.930 3.220 0.000 9.982 0.000
Standard 9.552 0.195 6.914 107.383 4.192 116.970
4

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Deviation
Sample Variance 91.232 0.038 47.797 11531.201 17.574 13681.959
Kurtosis -0.567 -0.462 1.769 -1.132 -0.260 69.269
Skewness 0.094 -0.716 1.113 -0.155 -0.499 -8.423
Range 61.728 1.000 63.853 359.000 16.100 1046.380
Minimum -21.822 0.000 0.000 0.000 0.000 0.000
Maximum 39.906 1.000 63.853 359.000 16.100 1046.380
Sum 1150943 70883.21 1042719 18085828 998030.5 96765118
Count 96453 96453 96453 96453 96453 96453
Histograms
This section presents the histograms for the various variables in the study.
Figure 1: Histogram for temperature
Figure 1 above shows the histogram for temperature where we can observe that the distribution
for temperature is close to normal distribution (due to the bell-shaped curve).
5
Document Page
Figure 2: Histogram for humidity
Figure 2 above shows the histogram for humidity where we can observe that the distribution for
humidity is far from normal distribution but rather skewed to the left (negatively skewed).
Figure 3: Histogram for wind speed
Figure 3 above shows the histogram for wind speed where we can observe that the distribution
for wind speed is far from normal distribution but rather skewed to the right (positively skewed).
6
Document Page
Figure 4: Histogram for wind bearing
Figure 4 above shows the histogram for wind bearing where we can observe that the distribution
for wind bearing is close to normal distribution (due to the bell-shaped curve).
Figure 5: Histogram for visibility
Figure 5 above shows the histogram for visibility where we can observe that the distribution for
visibility is far from normal distribution but rather skewed to the left (negatively skewed).
7

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Figure 6: Histogram for pressure
Figure 6 above shows the histogram for pressure where we can observe that the distribution for
pressure is close to normal distribution (due to the bell-shaped curve).
CORRELATION
This section presents Pearson correlation test between the various variables being studied (Boddy
& Smith, 2013). This test is used to show existing relationship between two or more continuous
variables (Nikolić, et al., 2012). The coefficient values for the Pearson correlation ranges from -1
to +1. The relationship between the two variables is regarded to be strong when the correlation
coefficient (r) is closer to either +1 or -1 (Mahdavi , 2012). A correlation coefficient with a
negative sign is considered to imply a negative relationship between the variables while a
positive relationship is when the sign of the correlation coefficient is positive (Nikolić, et al.,
2012).
Table 3: Correlation matrix
Temperature
(C)
Humidity Wind Speed
(km/h)
Wind Bearing
(degrees)
Visibility
(km)
Pressure
(millibars)
Temperature (C) 1 -0.632 0.009 0.030 0.393 -0.005
Humidity -0.632 1 -0.225 0.001 -0.369 0.005
Wind Speed (km/h) 0.009 -0.225 1 0.104 0.101 -0.049
8
Document Page
Wind Bearing
(degrees)
0.030 0.001 0.104 1 0.048 -0.012
Visibility (km) 0.393 -0.369 0.101 0.048 1 0.060
Pressure (millibars) -0.005 0.005 -0.049 -0.012 0.060 1
Results of table 3 above shows that a moderately strong negative relationship exists between
temperature and humidity (r = -0.632). There was also a weak positive relationship between
temperature and visibility (r = 0.393). The other three variables however had close to relationship
with the temperature as the correlation coefficients were close to zero.
Next, we provide the scatter plots showing the relationship between the variables (Daniela, et al.,
2009).
Figure 7: Scatter plot of temperature against humidity
Figure 7 above shows a scatter plot of temperature against humidity. As can be seen, there is a
negative relationship between humidity and temperature. This means that an increase in humidity
is expected to result to a decrease in the temperature. Similarly, a decrease in the humidity levels
9
Document Page
is expected to result in an increase in the prevailing temperatures of an area (Kannan, et al.,
2010).
Figure 8: Scatter plot of temperature against wind speed
The above plot (figure 8) presents a scatter plot of temperature against wind speed. As can be
seen from the plot, there seems to be no relationship between the two variables (temperature and
wind speed).
Figure 9: Scatter plot of temperature against wind bearing
10

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The above plot (figure 9) presents a scatter plot of temperature against wind bearing. As can be
seen from the plot, there seems to be no relationship between the two variables (temperature and
wind bearing).
Figure 10: Scatter plot of temperature against visibility
Figure 10 above shows a scatter plot of temperature against visibility. As can be seen, there is a
positive relationship between temperature and visibility. This means that an increase in visibility
is expected to result to an increase in the temperature. Similarly, a decrease in the visibility levels
is expected to result in a decrease in the prevailing temperatures of an area.
COVARIANCE
Covariance refers to a measure of variability between the two random variables (Sahidullah &
Kinnunen, 2016). In this way, covariance helps tell the relationship in terms of variability
between the variables in a study (Yuli , et al., 2012).
Table 4: Covariance matrix
Temperature
(C)
Humidity Wind Speed
(km/h)
Wind
Bearing
(degrees)
Visibility
(km)
Pressure
(millibars)
11
Document Page
Temperature (C) 91.231 -1.180 0.591 30.758 15.730 -6.086
Humidity -1.180 0.038 -0.304 0.015 -0.303 0.125
Wind Speed (km/h) 0.591 -0.304 47.797 77.077 2.920 -39.837
Wind Bearing (degrees) 30.758 0.015 77.077 11531.081 21.425 -146.341
Visibility (km) 15.730 -0.303 2.920 21.425 17.574 29.332
Pressure (millibars) -6.086 0.125 -39.837 -146.341 29.332 13681.817
Regression Analysis
We performed a regression analysis in an attempt to predict the temperature of the earth surface
based on factors such as humidity, wind speed, wind bearing, visibility and pressure (Tofallis,
2013).
The following regression equation model was estimated (Stone, 2015).
y=β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 +ε
Where we have the variables as follows;
y=Temperature , x1 =humidity , x2=wind speed ( km
h ), x3=wind bearing ( degrees ) , x4 =visibility ( km )x5= pressur
β0=Intercpt coefficient , β1=coefficient for the humidity , β2=coefficient for the wind speed , β3=coefficient for win
The results of the regression analysis are presented below;
Table 5: SUMMARY OUTPUT
Regression Statistics
Multiple R 0.671249
R Square 0.450575
Adjusted R Square 0.450547
Standard Error 7.080093
Observations 96453
12
Document Page
Table 5 above presents the summary output of the regression. From the table, it can be seen that
the value of R-squared (R2) is 0.4506; this means that 45.06% of the variation in the dependent
variable (temperature) is explained by the 5 independent variables in the model. However, it is
also clear that majority of the proportion of the variation (54.94%) in the dependent variable is
explained by other factors outside the model (error term).
Table 6: ANOVA table
df SS MS F
Significanc
e F
Regression 5 3964844 792968.8 15818.97 0.000
Residual 96447 4834668 50.12772
Total 96452 8799512
Table 6 above presents the ANOVA table where we can see that the p-value of the F statistic is
0.000 (a value less than 5% level of significance), we therefore reject the null hypothesis and
conclude that the model is significant and fit to predict the temperature of the earth surface at 5%
level of significance.
Table 7: Regression coefficients
Coefficient
s
Standar
d Error t Stat
P-
value
Lower
95%
Upper
95%
Intercept 32.220 0.240 134.131 0.000 31.749 32.691
Humidity -29.153 0.128 -227.236 0.000 -29.404 -28.901
Wind Speed (km/h) -0.206 0.003 -60.365 0.000 -0.212 -0.199
Wind Bearing (degrees) 0.003 0.000 15.289 0.000 0.003 0.004
Visibility (km) 0.426 0.006 72.566 0.000 0.415 0.438
Pressure (millibars) -0.002 0.000 -8.471 0.000 -0.002 -0.001
Lastly, we are presented with the regression coefficients table. From the table, we can see that all
the five independent (predictor) variables are significant in the model (p < 0.05). This means that
13

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
there is significant relationship between the predictor variables and the dependent variable
(temperature).
The intercept (constant) coefficient is given as 32.22; this implies that holding all the other
factors constant (zero values for the predictor variables), we would expect the temperature of the
earth surface to be 32.22.
The coefficient of humidity is -29.15; this implies that increasing humidity by one unit would
result to a decline in the temperature levels of the earth surface by 29.15. Similarly, decreasing
the humidity by one unit is expected to result in an increase in the temperature levels of the earth
surface by 29.15.
The coefficient of wind speed is -0.206; this implies that increasing the wind speed by one unit
would result to a decrease in the temperature of the earth surface by 0.206. Similarly, decreasing
the wind speed by one unit is expected to result in an increase in the temperature levels of the
earth surface by 0.206.
The coefficient of wind bearing is 0.003; this implies that increasing the wind bearing by one
unit is expected to result in an increase in the temperature level of the earth surface by 0.003.
Similarly, decreasing the wind bearing by one unit is expected to result in a decrease in the
temperature level of the earth surface by 0.003.
The coefficient of visibility is 0.426; this implies that increasing the visibility by one unit is
expected to result in an increase in the temperature level of the earth surface by 0.426. Similarly,
decreasing the visibility level by one unit is expected to result in a decrease in the temperature
level of the earth surface by 0.426.
14
Document Page
The coefficient of pressure is -0.002; this implies that increasing the pressure by one unit is
expected to result in a decrease in the temperature level of the earth surface by 0.002. Similarly,
decreasing the pressure level by one unit is expected to result in an increase in the temperature of
the earth surface by 0.002.
Based on the above findings, the final regression equation model would be as follows;
y=32.2229.15 x10.206 x2 +0.003 x3+0.426 x40.002 x5
Where we have the variables as follows;
y=Temperature , x1 =humidity , x2=wind speed ( km
h ), x3=wind bearing ( degrees ) , x4 =visibility ( km )x5= pressur
CONCLUSION
The goal of this investigation was to develop a single point temperature forecast model utilizing
Multiple Linear Regression (MLR). Time series data spanning from April 2006 to September
2016 was utilized. Results showed that all the five factors considered significantly impacted on
the temperature of the earth’s surface. Three of the factors considered had negative relationship
with the dependent variable (temperature). The factors that had negative relationship with the
dependent variable (temperature) include humidity, wind speed and pressure. The other two
factors (wind bearing and visibility) had positive relationship with dependent variable
(temperature).
Bibliography
15
Document Page
Boddy, R. & Smith, G., 2013. Statistical methods in practice: for scientists and technologists. pp.
95-96.
Chatfield, C., 2009. The Analysis of Time Series: An Introduction. Journal of Statistics, 5(2), pp.
56-63.
Christoph, C., Georg, B., Klaus, F. & Edilbert, K., 2009. Statistical single-station short-term
forecasting of temperature and probability of precipitation: Area interpolation and NWP
combination. Weather Forecast., 14: , 4(34-47), pp. 203-214.
Daniela, Ş., Georgiana, P. E. & Catalina, N., 2009. Weather Forecast using SPSS Statistical
Methods: This paper presents a case study of using SPSS 13.0 in weather prediction. Gas
University of Ploiesti, 43(1).
Dhawal, H. & Mishra, N., 2016. A Survey on Rainfall Prediction Techniques. International
Journal of Computer Application , 6(2), pp. 1001-1015.
Donald, A., 2011. Essentials of Meteorology: An Invitation to the Atmosphere. 6th Edition.
Kannan, M., Prabhakaran, S. & Ramchandran, P., 2010. Rainfall Forecasting Using DataMining
Techniques. International Journal of Engineering and Technology , 2(6), pp. 397-401.
Mahdavi , D. B., 2012. The Misleading Value of Measured Correlation. Wilmott, 1(1), p. 64–73.
Nikolić, D., Muresan, R. C., Feng, W. & Singer, W., 2012. Scaled correlation analysis: a better
way to compute a cross-correlogram. European Journal of Neuroscience, 35(5), p. 1–21.
Sahidullah, M. & Kinnunen, T., 2016. Local spectral variability features for speaker verification.
Digital Signal Processing, Volume 50, p. 1–11.
16

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Shengpan, L. et al., 2012. Evaluation of estimating daily maximum and minimum air
temperature with MODIS data in east Africa. International Journal of Applied Earth
Observations, 18(5), pp. 128-140.
Stone, C. J., 2015. Adaptive maximum likelihood estimators of a location parameter. The Annals
of Statistics, 3(2), p. 267–284.
Tofallis, C., 2013. Least Squares Percentage Regression. Journal of Modern Applied Statistical
Methods, 7(3), p. 526–534.
Willcockson, I., Johnson, C., Hersh, W. & Bernstam, E., n.d. Predictors of student success in
graduate biomedical informatics training: Introductory course and program success. Journal of
the American Medical Informatics Association, 16(6), p. 837–846.
Yuli , Z., Huaiyu, W. & Lei, C., 2012. Some new deformation formulas about variance and
covariance. Proceedings of 4th International Conference on Modelling, Identification and
Control(ICMIC2012), p. 987–992.
17
1 out of 17
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]