SIT718 Real World Analysis Project: Data Analysis and Model Comparison
VerifiedAdded on 2022/08/31
|9
|1403
|18
Project
AI Summary
This document presents a comprehensive analysis of a real-world dataset, focusing on predicting energy usage of appliances. The project encompasses various stages, including data exploration, visualization, and the application of different aggregation functions. The student analyzes histograms and scatter plots to understand the relationships between independent variables (temperature, humidity, visibility) and the dependent variable (energy use). The analysis involves calculating error and correlation measures, comparing models like Weighted Arithmetic Mean (WAM), Ordered Weighted Averaging (OWA), and Choquet integral. The project then moves on to linear regression, comparing its performance with the OWA model. The student interprets regression results, identifies significant variables, and discusses the strengths and weaknesses of each model. The findings highlight the importance of specific variables like humidity in kitchen area (HKA) and visibility outside the Weather Station (VO) for predicting appliance energy consumption. The assignment also focuses on R programming for data analysis and model development, demonstrating the student's ability to apply theoretical concepts to practical problem-solving in the domain of data science.

Running head: REAL WORLD ANALYSIS
Real World Analysis
Name of the Student:
Name of the University:
Author Note:
Real World Analysis
Name of the Student:
Name of the University:
Author Note:
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

1REAL WORLD ANALYSIS
Table of Contents
Section 1...................................................................................................................................2
Part iv...................................................................................................................................2
Section 2...................................................................................................................................4
Part i.....................................................................................................................................4
Part ii....................................................................................................................................4
Section 3...................................................................................................................................5
Part iii...................................................................................................................................5
Part iv...................................................................................................................................5
Section 4...................................................................................................................................5
Part i.....................................................................................................................................5
Part ii....................................................................................................................................6
Part iii...................................................................................................................................6
Section 5...................................................................................................................................6
Part i.....................................................................................................................................6
Part ii....................................................................................................................................6
Part iii...................................................................................................................................6
Reference and Bibliography.....................................................................................................7
Table of Contents
Section 1...................................................................................................................................2
Part iv...................................................................................................................................2
Section 2...................................................................................................................................4
Part i.....................................................................................................................................4
Part ii....................................................................................................................................4
Section 3...................................................................................................................................5
Part iii...................................................................................................................................5
Part iv...................................................................................................................................5
Section 4...................................................................................................................................5
Part i.....................................................................................................................................5
Part ii....................................................................................................................................6
Part iii...................................................................................................................................6
Section 5...................................................................................................................................6
Part i.....................................................................................................................................6
Part ii....................................................................................................................................6
Part iii...................................................................................................................................6
Reference and Bibliography.....................................................................................................7

2REAL WORLD ANALYSIS
0
10
20
30
40
50
15.0 17.5 20.0 22.5 25.0
Temperature in Kitchen Area (TKA)
Count
Frequency histogram of TKA
0
20
40
60
30 40 50
Hummidity in Kitchen Area (HKA)
Count
Frequency histogram of HKA
0
20
40
60
0 2 4 6
Temperature Outside (TO)
Count
Frequency histogram of TO
0
20
40
60
60 70 80 90 100
Hummidity Outside (HO)
Count
Frequency histogram of HO
0
25
50
75
20 40 60
Visibility
Count
Frequency histogram of Visibility
0
50
100
150
0 200 400 600 800
Energy use of appliances
Count
Frequency histogram of Energy use of appliances
Section 1
Part iv
0
10
20
30
40
50
15.0 17.5 20.0 22.5 25.0
Temperature in Kitchen Area (TKA)
Count
Frequency histogram of TKA
0
20
40
60
30 40 50
Hummidity in Kitchen Area (HKA)
Count
Frequency histogram of HKA
0
20
40
60
0 2 4 6
Temperature Outside (TO)
Count
Frequency histogram of TO
0
20
40
60
60 70 80 90 100
Hummidity Outside (HO)
Count
Frequency histogram of HO
0
25
50
75
20 40 60
Visibility
Count
Frequency histogram of Visibility
0
50
100
150
0 200 400 600 800
Energy use of appliances
Count
Frequency histogram of Energy use of appliances
Section 1
Part iv
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

3REAL WORLD ANALYSIS
Figure 1: Histogram of all the independent variables and the dependent variable.
In the above diagram, histogram of all 5 independent variables, temperature in
kitchen area (TKA), humidity in kitchen area (HKA), Temperature outside Weather Station
(TO), Humidity outside Weather Station (HO), Visibility outside the Weather Station (VO)
and the histogram of dependent variable, Energy use of Appliance (EA) is presented.
Histogram TKA is almost normally distributed. Histogram of HKA is slightly right skewed.
Histogram of TO is nearly normally distributed. Histogram of HO is slightly left skewed.
Histogram of VO is right skewed. Finally, the histogram of EA is fully right skewed (Shao et
al., 2017).
Figure 1: Histogram of all the independent variables and the dependent variable.
In the above diagram, histogram of all 5 independent variables, temperature in
kitchen area (TKA), humidity in kitchen area (HKA), Temperature outside Weather Station
(TO), Humidity outside Weather Station (HO), Visibility outside the Weather Station (VO)
and the histogram of dependent variable, Energy use of Appliance (EA) is presented.
Histogram TKA is almost normally distributed. Histogram of HKA is slightly right skewed.
Histogram of TO is nearly normally distributed. Histogram of HO is slightly left skewed.
Histogram of VO is right skewed. Finally, the histogram of EA is fully right skewed (Shao et
al., 2017).
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

4REAL WORLD ANALYSIS
200
400
600
800
15.0 17.5 20.0 22.5 25.0
Temperature in Kitchen Area
E n e r g y u s e o f a p p l i a n c e s Energy use of appliances according to Temperature in Kitchen Area
200
400
600
800
30 35 40 45 50 55
Humidity in kitchen area
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Humidity in kitchen area
200
400
600
800
0 2 4 6
Outside Temperature
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Outside Temperature
200
400
600
800
15.0 17.5 20.0 22.5 25.0
Temperature in Kitchen Area
E n e r g y u s e o f a p p l i a n c e s Energy use of appliances according to Temperature in Kitchen Area
200
400
600
800
30 35 40 45 50 55
Humidity in kitchen area
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Humidity in kitchen area
200
400
600
800
0 2 4 6
Outside Temperature
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Outside Temperature

5REAL WORLD ANALYSIS
200
400
600
800
60 70 80 90 100
Outside Humidity
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Outside Humidity
200
400
600
800
60 70 80 90 100
Visibility
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Visibility
Figure 2: Scatter Plot of Energy Use of Appliance against all the independent variables.
The scatter plot of EA against TKA is spread so this shows a weak relationship
between EA and TKA. The scatter plot of EA against HKA shows a weak positive
relationship between EA and HKA. The scatter plot of EA against TO shows a moderate
positive relationship between EA and TO. The scatter plot of EA against HO shows a weak
relationship between EA and HO. The scatter plot of EA against VO shows a moderate
positive relationship between EA and VO.
Section 2
Part i
The selected four variables are TKA, HKA, TO and VO.
Part ii
The scatter plot has shown a weaker relation between TKA and EA as a rise in TKA
is very low. This is seen by the maximum dots which are not rising where EA is in below 200.
The standard error cannot be insignificant due to the spread of the scatter plot.
The scatter plot has shown a weak positive relation between TKA and HKA as a rise
in HKA raises the EA. However, the dots are slightly spread. It can be explained better by
correlation analysis and the regression analysis.
200
400
600
800
60 70 80 90 100
Outside Humidity
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Outside Humidity
200
400
600
800
60 70 80 90 100
Visibility
E n e r g y u s e o f a p p l i a n c e s
Energy use of appliances according to Visibility
Figure 2: Scatter Plot of Energy Use of Appliance against all the independent variables.
The scatter plot of EA against TKA is spread so this shows a weak relationship
between EA and TKA. The scatter plot of EA against HKA shows a weak positive
relationship between EA and HKA. The scatter plot of EA against TO shows a moderate
positive relationship between EA and TO. The scatter plot of EA against HO shows a weak
relationship between EA and HO. The scatter plot of EA against VO shows a moderate
positive relationship between EA and VO.
Section 2
Part i
The selected four variables are TKA, HKA, TO and VO.
Part ii
The scatter plot has shown a weaker relation between TKA and EA as a rise in TKA
is very low. This is seen by the maximum dots which are not rising where EA is in below 200.
The standard error cannot be insignificant due to the spread of the scatter plot.
The scatter plot has shown a weak positive relation between TKA and HKA as a rise
in HKA raises the EA. However, the dots are slightly spread. It can be explained better by
correlation analysis and the regression analysis.
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide

6REAL WORLD ANALYSIS
The moderate positive relation is seen between TKA and TO. This is seen that the
dots are at higher EA when the TO is higher and vice versa. This also can be explained by
calculating the correlation.
The moderate positive relation is seen between TKA and VO similar to the previous
one. This is seen that the dots are at higher EA when the HO is higher and vice versa.
However, the dots are slightly spread. This also can be explained by calculating the
correlation.
Section 3
Part iii
Table 1: Error and correlation measures
Weighted
Arithmetic
Mean
Weighted
power mean
(p = 0.5)
Weighted
power mean
(p = 2)
Ordered
Weighted
averaging
function
Choquet
integral
RMSE 244.0783 275.7446 2886925 242.1922 242.1922
Av. Abs error 205.5604 241.8418 2503070 203.9783 203.9783
Pearson
correlation 0.1932 0.1862 0.2166 0.3412 0.3412
Spearman
correlation 0.0991 0.0991 0.0991 0.257477 0.257477
Orness 1 0.7778
Table 2: Weights/parameters learned from the analysis
I Weighted
Arithmetic Mean
Weighted
power mean
(p = 0.5)
Weighted
power mean
(p = 2)
Ordered Weighted
averaging function
Choquet
integral
1 0 0 0 0 0
2 1 1 1 0 0.5
3 0 0 0 0 0
4 0 0 0 1 0.5
Part iv
a) The RMSE for WAM, WAM with p=0.5 and WAM with p=2 is higher than 242.1922.
Whereas these values are lower for OWA and the Choquet integral. On the basis of orness
value OWA is granted to be the best model.
b) The table for the weights in models shows that the most weighted variable is HKA and the
most redundant variable is TO and TKO. This is because the models have fgiven more weight
to the HKA and not weight is found for the TO and TKO in any model (Candanedo, Feldheim
and Deramaix 2017).
c) The relation between third variable that is outside temperature and the energy use of
appliance is redundant. The TO is weighted with 0 in all the models.
d) The model is said to be better or worse on the basis of the orness value which is found to
be 1 for the ordered weighted averaging function and 0.5 for the Choquet integral. The lower
value of orness implies lower inputs favor the better model and the higher value of orness
implies higher inputs favor the better model.
Section 4
Part i
The best fit model shows that the EA is 0.3224.
Part ii
The variable HKA is the most important variable. However the best fitted model does
not consider the HKA. So, the estimated result may have a higher amount of probability of
predicting the minimum amount of EA.
The moderate positive relation is seen between TKA and TO. This is seen that the
dots are at higher EA when the TO is higher and vice versa. This also can be explained by
calculating the correlation.
The moderate positive relation is seen between TKA and VO similar to the previous
one. This is seen that the dots are at higher EA when the HO is higher and vice versa.
However, the dots are slightly spread. This also can be explained by calculating the
correlation.
Section 3
Part iii
Table 1: Error and correlation measures
Weighted
Arithmetic
Mean
Weighted
power mean
(p = 0.5)
Weighted
power mean
(p = 2)
Ordered
Weighted
averaging
function
Choquet
integral
RMSE 244.0783 275.7446 2886925 242.1922 242.1922
Av. Abs error 205.5604 241.8418 2503070 203.9783 203.9783
Pearson
correlation 0.1932 0.1862 0.2166 0.3412 0.3412
Spearman
correlation 0.0991 0.0991 0.0991 0.257477 0.257477
Orness 1 0.7778
Table 2: Weights/parameters learned from the analysis
I Weighted
Arithmetic Mean
Weighted
power mean
(p = 0.5)
Weighted
power mean
(p = 2)
Ordered Weighted
averaging function
Choquet
integral
1 0 0 0 0 0
2 1 1 1 0 0.5
3 0 0 0 0 0
4 0 0 0 1 0.5
Part iv
a) The RMSE for WAM, WAM with p=0.5 and WAM with p=2 is higher than 242.1922.
Whereas these values are lower for OWA and the Choquet integral. On the basis of orness
value OWA is granted to be the best model.
b) The table for the weights in models shows that the most weighted variable is HKA and the
most redundant variable is TO and TKO. This is because the models have fgiven more weight
to the HKA and not weight is found for the TO and TKO in any model (Candanedo, Feldheim
and Deramaix 2017).
c) The relation between third variable that is outside temperature and the energy use of
appliance is redundant. The TO is weighted with 0 in all the models.
d) The model is said to be better or worse on the basis of the orness value which is found to
be 1 for the ordered weighted averaging function and 0.5 for the Choquet integral. The lower
value of orness implies lower inputs favor the better model and the higher value of orness
implies higher inputs favor the better model.
Section 4
Part i
The best fit model shows that the EA is 0.3224.
Part ii
The variable HKA is the most important variable. However the best fitted model does
not consider the HKA. So, the estimated result may have a higher amount of probability of
predicting the minimum amount of EA.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

7REAL WORLD ANALYSIS
-3 -2 -1 0 1 2 3
-200 -100 0 100 200 300
Normal Q-Q Plot
Theoretical Quantiles
Sample Quantiles
-3 -2 -1 0 1 2 3
30 32 34 36 38 40
Normal Q-Q Plot
Theoretical Quantiles
Sample Quantiles
Part iii
The lowest energy use of appliance can be found when the VO is lower. Moreover,
the value of HKA should be lower too which is the most important factor here.
Section 5
Part i
The following model is obtained from the linear regression:
Y =−870.871+22.0569V 1 −1.5494 V 2+66.754 6 V 3 +6.8016 V 5
Table 3: Regression Result
(Intercept) -870.871 99.943 -8.714 < 2e-16
V1 22.0569 2.923 7.546 5.61E-13
V2 -1.5494 1.2773 -1.213 0.226
V3 66.7546 5.6321 11.853 < 2e-16
V5 6.8016 0.8519 7.984 3.19E-14
All the variables in the above model are found to be significant except the variable V2
as the corresponding p-value of the coefficient of that variable is greater than 0.1. However,
the model is better explained by these variables which is found from the p-value of the F-stat
and the p-value of the F-stat is 0.000. The explanatory variables are able to explain the
variance in Y by 42.59%. In other words, the independent variables are able to predict Y with
42.59% accuracy (Gunst 2018).
Part ii
Linear regression model is found to be better model than the OWA model.
Figure 3: Q-Q plot of the linear regression model (left) and OWA (right)
Part iii
The major difference between the two models are that the linear regression considers all the
variables except HKA which is not considered in the OWA model. However, the OWA
model has considered only the VO.
-3 -2 -1 0 1 2 3
-200 -100 0 100 200 300
Normal Q-Q Plot
Theoretical Quantiles
Sample Quantiles
-3 -2 -1 0 1 2 3
30 32 34 36 38 40
Normal Q-Q Plot
Theoretical Quantiles
Sample Quantiles
Part iii
The lowest energy use of appliance can be found when the VO is lower. Moreover,
the value of HKA should be lower too which is the most important factor here.
Section 5
Part i
The following model is obtained from the linear regression:
Y =−870.871+22.0569V 1 −1.5494 V 2+66.754 6 V 3 +6.8016 V 5
Table 3: Regression Result
(Intercept) -870.871 99.943 -8.714 < 2e-16
V1 22.0569 2.923 7.546 5.61E-13
V2 -1.5494 1.2773 -1.213 0.226
V3 66.7546 5.6321 11.853 < 2e-16
V5 6.8016 0.8519 7.984 3.19E-14
All the variables in the above model are found to be significant except the variable V2
as the corresponding p-value of the coefficient of that variable is greater than 0.1. However,
the model is better explained by these variables which is found from the p-value of the F-stat
and the p-value of the F-stat is 0.000. The explanatory variables are able to explain the
variance in Y by 42.59%. In other words, the independent variables are able to predict Y with
42.59% accuracy (Gunst 2018).
Part ii
Linear regression model is found to be better model than the OWA model.
Figure 3: Q-Q plot of the linear regression model (left) and OWA (right)
Part iii
The major difference between the two models are that the linear regression considers all the
variables except HKA which is not considered in the OWA model. However, the OWA
model has considered only the VO.

8REAL WORLD ANALYSIS
Reference and Bibliography
Candanedo, L.M., Feldheim, V. and Deramaix, D., 2017. Data driven prediction models of
energy use of appliances in a low-energy house. Energy and buildings, 140, pp.81-97.
Gunst, R.F., 2018. Regression analysis and its application: a data-oriented approach.
Routledge.
Jia, K., Guo, G., Xiao, J., Zhou, H., Wang, Z. and He, G., 2019. Data compression approach
for the home energy management system. Applied Energy, 247, pp.643-656.
Kim, H., Choo, J., Park, H. and Endert, A., 2015. Interaxis: Steering scatterplot axes via
observation-level interaction. IEEE transactions on visualization and computer graphics,
22(1), pp.131-140.
Mesiar, R., Šipeky, L., Gupta, P. and LeSheng, J., 2017. Aggregation of OWA operators.
IEEE Transactions on Fuzzy Systems, 26(1), pp.284-291.
Shao, L., Mahajan, A., Schreck, T. and Lehmann, D.J., 2017, June. Interactive regression lens
for exploring scatter plots. In Computer Graphics Forum (Vol. 36, No. 3, pp. 157-166).
Reference and Bibliography
Candanedo, L.M., Feldheim, V. and Deramaix, D., 2017. Data driven prediction models of
energy use of appliances in a low-energy house. Energy and buildings, 140, pp.81-97.
Gunst, R.F., 2018. Regression analysis and its application: a data-oriented approach.
Routledge.
Jia, K., Guo, G., Xiao, J., Zhou, H., Wang, Z. and He, G., 2019. Data compression approach
for the home energy management system. Applied Energy, 247, pp.643-656.
Kim, H., Choo, J., Park, H. and Endert, A., 2015. Interaxis: Steering scatterplot axes via
observation-level interaction. IEEE transactions on visualization and computer graphics,
22(1), pp.131-140.
Mesiar, R., Šipeky, L., Gupta, P. and LeSheng, J., 2017. Aggregation of OWA operators.
IEEE Transactions on Fuzzy Systems, 26(1), pp.284-291.
Shao, L., Mahajan, A., Schreck, T. and Lehmann, D.J., 2017, June. Interactive regression lens
for exploring scatter plots. In Computer Graphics Forum (Vol. 36, No. 3, pp. 157-166).
⊘ This is a preview!⊘
Do you want full access?
Subscribe today to unlock all pages.

Trusted by 1+ million students worldwide
1 out of 9

Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
Copyright © 2020–2025 A2Z Services. All Rights Reserved. Developed and managed by ZUCOL.