R-Programming Assignment

Verified

Added on  2023/01/20

|5
|1206
|84
AI Summary
This document is an R-Programming assignment that includes questions related to data analysis, normality tests, multivariate analysis, MANOVA, PCA, and factor analysis. The assignment requires the use of R code and interpretation of the results.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: R-PROGRAMMING 1
R – Programming Assignment
By (Name of Student)
(Institutional Affiliation)
(Date of Submission)
Page 1 of 5
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
R-PROGRAMMING 2
Question 1 (25 marks)
a) Describe the structure of the film.txt data. (2 marks)
The film data is a data set that contains five variables namely;
TopLeft, TopRight, BottomRight, BottomLeft.
The five variables have numerical values of measurements
b) Produce and interpret univariate QQ plots and histograms and univariate
ShapiroWilks tests of normality for each of the four film thickness variables. Which is
the most non-normally distributed variable? (5 marks)
Histogram of TopRight
TopRight
F re q u e n c y
0 200 400 600 800 1000 1200
0 2 4 6 8 1 0
c) Produce and interpret perspective and contour plots for the top-right and top-left film
thickness variables. What is an inherent problem with using these plots to assess
MVN? (3 marks)
d) Do the analysis necessary to provide the results of the Mardia, Henze-Zirkler and
Royston tests of MVN based on all four film thickness variables. Include in your
interpretation: (10 marks)
The Chi-Square QQ plot and describe how it is constructed and its relationship to
the univariate normal QQ plots as part of your interpretation.
What is a key limitation of these MVN statistical tests?
e) One way to try and meet the MVN assumption could be to remove some of the
variables from the multivariate analysis (do not perform this analysis). Suggest three
Page 2 of 5
Document Page
R-PROGRAMMING 3
additional ways that you might improve univariate and multivariate normality for
data sets in general. (3 marks)
f) In part e) we suggested removing some variables to try and help the data approach
MVN. Suggest one other reason why reducing the number of variables used in
multivariate analysis may be important (this question does not relate to this
particular data set)? (2 marks)
Question 2 (25 marks):
The data file ‘iris.txt’ contains data for four flower characteristics variables for three
species of iris.
Provide R code, output and written interpretation for parts a) to f) of this question.
a) Produce a draftsman display for the 4 flower characteristics variables. Interpret these
plots, relating back to the original data where it may add to the interpretation. What
are the y and x axes on plot [3,2] of the draftsman plot? (4 marks)
b) In the context of MANOVA, list the dependent and independent variables and define
the relationship that the MANOVA would test. (2 marks)
c) Produce the correlation matrix for the flower characteristics variables. Provide an
interpretation of the correlations and indicate what they suggest about the potential
for the variables to be MVN distributed? (do not test for MVN) (4 marks)
d) Using MANOVA in R, test for differences in ‘flower characteristics’ between the three
species. Include tests using all four test statistics covered in this course and interpret
output (assume the assumption of MVN is met). (5 marks)
e) Why is a small Wilks’ lambda statistic likely to indicate significant differences
between at least some groups? Which of the four tests used in part d) would be the
best to interpret if there are concerns about multivariate normality or covariance
equality? (5 marks)
f) Produce output that specifically compares each of the Groups with each other (you
should have 3 comparisons) using Hotelling’s T2 t-test equivalent and a significance
level of 0.05. Determine the multiple test corrected significance level. Do not provide
R output; instead reproduce and complete the following table for all comparisons and
interpret. How may sample sizes have affected these results and those in part c)? (5
marks)
Comparison Hotelling’s
p-value
Significant
(Y/N)
Significant after
correction (Y/N)Species 1 Species 2
Page 3 of 5
Document Page
R-PROGRAMMING 4
Question 3 (25 marks):
The data file ‘usair.dat’ contains data for seven air quality variables measured across 41
United States cities. Provide R code, output and written interpretation for all analyses.
a) Produce the correlation and covariance matrices. Explain the difference between
these matrices in detail (i.e. explain clearly how the values are adjusted
mathematically and the effect of these changes). Would using the covariance matrix
in PCA on the USair data be appropriate? Why? (5 marks)
b) Perform PCA analysis on the 7 variables using the prcomp function. Discuss the
eigenvalues, %variation and scree plot and how they influence your decision on how
many PCs to interpret from this analysis. Remember to keep in mind the overall
purpose of PCA (5 marks).
c) Interpret the first PC. Include the Z equation and a plot of the loadings on the first
PC in your answer. (4 marks)
d) What is the correlation between the first and second PCs and what does this tell you?
(2 marks)
e) Produce and interpret a biplot based on the first 2 PCs. In particular, explain your
interpretation of the air quality variables in city 1 compared to city 11 and city 9.
Relate your interpretation back to the original data.
(5 marks)
f) Was this a useful analysis for this data set? Explain. (4 marks)
Question 4 (25 marks):
For this question you will continue to use the data file ‘usair.dat’ from Question 3.
Provide R code, output and written interpretation for all analyses.
a) Perform parallel analysis and evaluate how many PC’s should be used in FA. Compare
to your choice of number of PC’s in Q3b). (3 marks)
b) Explain in your own words how the parallel analysis works. (5 marks)
c) Perform a Factor Analysis on all 7 variables (apply no rotation) using the number of
factors you identified in part a). Interpret the output including the (10 marks):
Variance explained
Chi-square test
Variable loadings
Difference in uniqueness values for the variables wind.speed and annual.precip
Page 4 of 5
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
R-PROGRAMMING 5
d) Repeat the FA with a varimax rotation and calculate the communalities. Interpret (7
marks):
Explain the aim and features of a varimax rotation
Changes in the variable loadings
The communalities.
Page 5 of 5
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]