SIT718 - Data Analysis: Aggregation Functions and R Programming

Verified

Added on 2023/01/23

AI Summary

This assignment solution for SIT718, a Real World Analytics course, addresses problem-solving using aggregation functions for data analysis. The solution includes an interpretation of data presented in tables, focusing on error measures, correlation, and model weights. It also provides R code, including data transformation, model building (linear regression), and the examination of model summaries. The analysis involves the use of various aggregation functions like Weighted Power Means, SMAPE, MSE, RMSE, and correlation calculations. The interpretation section analyzes the significance of variables, R-squared values, and the impact of interaction effects within the models. The solution also incorporates the use of the Choquet integral and other statistical functions to summarize and analyze the dataset, providing a comprehensive approach to the assignment's requirements. The assignment aims to test students' understanding of aggregation functions and their application in data summarization and prediction, along with their ability to use R programming for real-world analytics.

Question 3
Table 1: Error Measures and Correlation
Table 2: Weights and information from the models

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Interpretation of the data on tables
From table 2, the original model with X1, X2, X3, and X4 as the explanatory variables and Y as
the response variable has an F-statistic of 16.04 and a p-value of approximately 0.000 therefore,
at a confidence interval of 95%, we fail to reject the null hypothesis that the variables in the
model with no independent variables does not fit the data as well our model implying the model
is significant in predicting the response variable. When examining individual variables from the
model, it is evident that the X1, X2, and X4 are the only variables significant in predicting the
independent variable Y given that their p-value is lower that 0.05 at 0.05 level of significance.
The original model has an R-squared of 0.1788 implying that the model only accounts for
17.88% of variability in the variables which is relatively low given also that the model’s adjusted
R-squared is 0.1676 which increases to 0.1756 when introducing interaction between X1 and X2
in the second model implying the original model is not a good fit.
Further, the interaction effect between X1 and X2 is redundant since it reduces the significance
of the two variables as seen in model two where only X3 and X4 are significant in predicting the
response variable.

Appendix
R-code
source("AggWaFit718.R")
data1<-source("agresolution.R")
datast <- as.matrix(read.table("d_transformed.txt "))
#Converting the matrix to a dataframe so as to use it for model developing
datastfr <- as.data.frame(datast)
#Renaming the variables for easy identification
names(datastfr)[5] <- "Y"
names(datastfr)[1] <- "X1"
names(datastfr)[2] <- "X2"
names(datastfr)[3] <- "X3"
names(datastfr)[4] <- "X4"
#Examining correlation
correla <- cor(datastfr, use = "complete.obs")
round(correla, 3)
choquet(datast,2)
#Building the regression model

regmod1<-lm(Y~ X1+X2+X3+X4, data = datastfr)
#Examining the model
summary(regmod1)
regmod2<-lm(Y~ X1+X2+X3+X4+(X1*X2), data = datastfr)
summary(regmod)
library(DescTools)
summtable1<-
list("Weighted power means (WPM) with p = 0.5"=PM(datast,1/length(datast),0.5),
"Weighted power means (WPM) with p = 2"=PM(datast,1/length(datast),2),
"Symmetric Mean Absolute Percentage Error"=SMAPE(regmod,digits=2),
"Mean Square Error"=MSE(regmod,digits=2),
"Root Mean Square Error"=RMSE(regmod),
"Correlation" =cor(datastfr, use = "complete.obs"))
summtable1
summtable2<-
list("Weighted power means (WPM) with p = 0.5"=PM(datast,1/length(datast),0.5),
"Weighted power means (WPM) with p = 2"=PM(datast,1/length(datast),2),
"Ordered weighted averaging function (OWA)"=OWA(datast),

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

"weighted arithmetic mean (WAM)"=weighted.mean(datast),
"Choquet integral"=choquet(datast,2),
"Regression Model with no interaction"=summary(regmod),
"Regression Model with interaction"=summary(regmod2))
summtable2

1 out of 5

SIT718 - Data Analysis: Aggregation Functions and R Programming

Paraphrase This Document

⊘ This is a preview!⊘

Paraphrase This Document