R-Programming Assignment
VerifiedAdded on 2023/01/20
|5
|1206
|84
AI Summary
This document is an R-Programming assignment that includes questions related to data analysis, normality tests, multivariate analysis, MANOVA, PCA, and factor analysis. The assignment requires the use of R code and interpretation of the results.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Running head: R-PROGRAMMING 1
R – Programming Assignment
By (Name of Student)
(Institutional Affiliation)
(Date of Submission)
Page 1 of 5
R – Programming Assignment
By (Name of Student)
(Institutional Affiliation)
(Date of Submission)
Page 1 of 5
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
R-PROGRAMMING 2
Question 1 (25 marks)
a) Describe the structure of the film.txt data. (2 marks)
The film data is a data set that contains five variables namely;
TopLeft, TopRight, BottomRight, BottomLeft.
The five variables have numerical values of measurements
b) Produce and interpret univariate QQ plots and histograms and univariate
ShapiroWilks tests of normality for each of the four film thickness variables. Which is
the most non-normally distributed variable? (5 marks)
Histogram of TopRight
TopRight
F re q u e n c y
0 200 400 600 800 1000 1200
0 2 4 6 8 1 0
c) Produce and interpret perspective and contour plots for the top-right and top-left film
thickness variables. What is an inherent problem with using these plots to assess
MVN? (3 marks)
d) Do the analysis necessary to provide the results of the Mardia, Henze-Zirkler and
Royston tests of MVN based on all four film thickness variables. Include in your
interpretation: (10 marks)
• The Chi-Square QQ plot and describe how it is constructed and its relationship to
the univariate normal QQ plots as part of your interpretation.
• What is a key limitation of these MVN statistical tests?
e) One way to try and meet the MVN assumption could be to remove some of the
variables from the multivariate analysis (do not perform this analysis). Suggest three
Page 2 of 5
Question 1 (25 marks)
a) Describe the structure of the film.txt data. (2 marks)
The film data is a data set that contains five variables namely;
TopLeft, TopRight, BottomRight, BottomLeft.
The five variables have numerical values of measurements
b) Produce and interpret univariate QQ plots and histograms and univariate
ShapiroWilks tests of normality for each of the four film thickness variables. Which is
the most non-normally distributed variable? (5 marks)
Histogram of TopRight
TopRight
F re q u e n c y
0 200 400 600 800 1000 1200
0 2 4 6 8 1 0
c) Produce and interpret perspective and contour plots for the top-right and top-left film
thickness variables. What is an inherent problem with using these plots to assess
MVN? (3 marks)
d) Do the analysis necessary to provide the results of the Mardia, Henze-Zirkler and
Royston tests of MVN based on all four film thickness variables. Include in your
interpretation: (10 marks)
• The Chi-Square QQ plot and describe how it is constructed and its relationship to
the univariate normal QQ plots as part of your interpretation.
• What is a key limitation of these MVN statistical tests?
e) One way to try and meet the MVN assumption could be to remove some of the
variables from the multivariate analysis (do not perform this analysis). Suggest three
Page 2 of 5
R-PROGRAMMING 3
additional ways that you might improve univariate and multivariate normality for
data sets in general. (3 marks)
f) In part e) we suggested removing some variables to try and help the data approach
MVN. Suggest one other reason why reducing the number of variables used in
multivariate analysis may be important (this question does not relate to this
particular data set)? (2 marks)
Question 2 (25 marks):
The data file ‘iris.txt’ contains data for four flower characteristics variables for three
species of iris.
Provide R code, output and written interpretation for parts a) to f) of this question.
a) Produce a draftsman display for the 4 flower characteristics variables. Interpret these
plots, relating back to the original data where it may add to the interpretation. What
are the y and x axes on plot [3,2] of the draftsman plot? (4 marks)
b) In the context of MANOVA, list the dependent and independent variables and define
the relationship that the MANOVA would test. (2 marks)
c) Produce the correlation matrix for the flower characteristics variables. Provide an
interpretation of the correlations and indicate what they suggest about the potential
for the variables to be MVN distributed? (do not test for MVN) (4 marks)
d) Using MANOVA in R, test for differences in ‘flower characteristics’ between the three
species. Include tests using all four test statistics covered in this course and interpret
output (assume the assumption of MVN is met). (5 marks)
e) Why is a small Wilks’ lambda statistic likely to indicate significant differences
between at least some groups? Which of the four tests used in part d) would be the
best to interpret if there are concerns about multivariate normality or covariance
equality? (5 marks)
f) Produce output that specifically compares each of the Groups with each other (you
should have 3 comparisons) using Hotelling’s T2 t-test equivalent and a significance
level of 0.05. Determine the multiple test corrected significance level. Do not provide
R output; instead reproduce and complete the following table for all comparisons and
interpret. How may sample sizes have affected these results and those in part c)? (5
marks)
Comparison Hotelling’s
p-value
Significant
(Y/N)
Significant after
correction (Y/N)Species 1 Species 2
Page 3 of 5
additional ways that you might improve univariate and multivariate normality for
data sets in general. (3 marks)
f) In part e) we suggested removing some variables to try and help the data approach
MVN. Suggest one other reason why reducing the number of variables used in
multivariate analysis may be important (this question does not relate to this
particular data set)? (2 marks)
Question 2 (25 marks):
The data file ‘iris.txt’ contains data for four flower characteristics variables for three
species of iris.
Provide R code, output and written interpretation for parts a) to f) of this question.
a) Produce a draftsman display for the 4 flower characteristics variables. Interpret these
plots, relating back to the original data where it may add to the interpretation. What
are the y and x axes on plot [3,2] of the draftsman plot? (4 marks)
b) In the context of MANOVA, list the dependent and independent variables and define
the relationship that the MANOVA would test. (2 marks)
c) Produce the correlation matrix for the flower characteristics variables. Provide an
interpretation of the correlations and indicate what they suggest about the potential
for the variables to be MVN distributed? (do not test for MVN) (4 marks)
d) Using MANOVA in R, test for differences in ‘flower characteristics’ between the three
species. Include tests using all four test statistics covered in this course and interpret
output (assume the assumption of MVN is met). (5 marks)
e) Why is a small Wilks’ lambda statistic likely to indicate significant differences
between at least some groups? Which of the four tests used in part d) would be the
best to interpret if there are concerns about multivariate normality or covariance
equality? (5 marks)
f) Produce output that specifically compares each of the Groups with each other (you
should have 3 comparisons) using Hotelling’s T2 t-test equivalent and a significance
level of 0.05. Determine the multiple test corrected significance level. Do not provide
R output; instead reproduce and complete the following table for all comparisons and
interpret. How may sample sizes have affected these results and those in part c)? (5
marks)
Comparison Hotelling’s
p-value
Significant
(Y/N)
Significant after
correction (Y/N)Species 1 Species 2
Page 3 of 5
R-PROGRAMMING 4
Question 3 (25 marks):
The data file ‘usair.dat’ contains data for seven air quality variables measured across 41
United States cities. Provide R code, output and written interpretation for all analyses.
a) Produce the correlation and covariance matrices. Explain the difference between
these matrices in detail (i.e. explain clearly how the values are adjusted
mathematically and the effect of these changes). Would using the covariance matrix
in PCA on the USair data be appropriate? Why? (5 marks)
b) Perform PCA analysis on the 7 variables using the prcomp function. Discuss the
eigenvalues, %variation and scree plot and how they influence your decision on how
many PCs to interpret from this analysis. Remember to keep in mind the overall
purpose of PCA (5 marks).
c) Interpret the first PC. Include the Z equation and a plot of the loadings on the first
PC in your answer. (4 marks)
d) What is the correlation between the first and second PCs and what does this tell you?
(2 marks)
e) Produce and interpret a biplot based on the first 2 PCs. In particular, explain your
interpretation of the air quality variables in city 1 compared to city 11 and city 9.
Relate your interpretation back to the original data.
(5 marks)
f) Was this a useful analysis for this data set? Explain. (4 marks)
Question 4 (25 marks):
For this question you will continue to use the data file ‘usair.dat’ from Question 3.
Provide R code, output and written interpretation for all analyses.
a) Perform parallel analysis and evaluate how many PC’s should be used in FA. Compare
to your choice of number of PC’s in Q3b). (3 marks)
b) Explain in your own words how the parallel analysis works. (5 marks)
c) Perform a Factor Analysis on all 7 variables (apply no rotation) using the number of
factors you identified in part a). Interpret the output including the (10 marks):
• Variance explained
• Chi-square test
• Variable loadings
• Difference in uniqueness values for the variables wind.speed and annual.precip
Page 4 of 5
Question 3 (25 marks):
The data file ‘usair.dat’ contains data for seven air quality variables measured across 41
United States cities. Provide R code, output and written interpretation for all analyses.
a) Produce the correlation and covariance matrices. Explain the difference between
these matrices in detail (i.e. explain clearly how the values are adjusted
mathematically and the effect of these changes). Would using the covariance matrix
in PCA on the USair data be appropriate? Why? (5 marks)
b) Perform PCA analysis on the 7 variables using the prcomp function. Discuss the
eigenvalues, %variation and scree plot and how they influence your decision on how
many PCs to interpret from this analysis. Remember to keep in mind the overall
purpose of PCA (5 marks).
c) Interpret the first PC. Include the Z equation and a plot of the loadings on the first
PC in your answer. (4 marks)
d) What is the correlation between the first and second PCs and what does this tell you?
(2 marks)
e) Produce and interpret a biplot based on the first 2 PCs. In particular, explain your
interpretation of the air quality variables in city 1 compared to city 11 and city 9.
Relate your interpretation back to the original data.
(5 marks)
f) Was this a useful analysis for this data set? Explain. (4 marks)
Question 4 (25 marks):
For this question you will continue to use the data file ‘usair.dat’ from Question 3.
Provide R code, output and written interpretation for all analyses.
a) Perform parallel analysis and evaluate how many PC’s should be used in FA. Compare
to your choice of number of PC’s in Q3b). (3 marks)
b) Explain in your own words how the parallel analysis works. (5 marks)
c) Perform a Factor Analysis on all 7 variables (apply no rotation) using the number of
factors you identified in part a). Interpret the output including the (10 marks):
• Variance explained
• Chi-square test
• Variable loadings
• Difference in uniqueness values for the variables wind.speed and annual.precip
Page 4 of 5
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
R-PROGRAMMING 5
d) Repeat the FA with a varimax rotation and calculate the communalities. Interpret (7
marks):
• Explain the aim and features of a varimax rotation
• Changes in the variable loadings
• The communalities.
Page 5 of 5
d) Repeat the FA with a varimax rotation and calculate the communalities. Interpret (7
marks):
• Explain the aim and features of a varimax rotation
• Changes in the variable loadings
• The communalities.
Page 5 of 5
1 out of 5
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.