Multivariate Normal Distribution and Generalized Distance Analysis
Verified
Added on 2023/06/07
|16
|2867
|477
AI Summary
This article discusses the multivariate normal distribution and its generation using singular value decomposition. It also covers the generalized distance analysis and its application in univariate conditions. The article includes R code for generating random vectors, histograms, q-q plots, and power transformations.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Running head: Assignment 1 Assignment 1 Name of the student Name of the University Author Note
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Assignment 1 Question 1: a)The pdf of the multivariate normal distribution is given by, fx(xp∗1)=c∗e (−(x−μ)T ∑ (−1) (x−μ) 2} Here, μ = E(Xp*1) and ∑ = var(Xp*1) > 0 and c is a constant which is given by c=1 (2∗π) p 2¿¿ Here,¿∑∨¿is the value of the determinant of∑. Now, by the method of singular value decomposition∑=UDUT=UD(1 2)D(1 2)UT Here, U = eigenvector matrix of and D=diagonalofeigenvaluesλ1,λ2,λ3….λp Now, in order to generate the random vectors from multivariate n∑ormal distribution Np(μ,∑) the following methods must be followed. Step 1: At first the eigenvaluesλ1,λ2,λ3,…λpand the corresponding eigenvectors Uj of∑and then D^(0.5) = diag(λ10.5,λ20.5,λ30.5,…λp0.5¿is needed to be calculated. Then the matrix A =UD 1 2( or A = UD^(1/2)U^(T)) is formed. Step 2: Now, the z values z1,z2…zp from N(0,1) is obtained. Now, Let z = (z1,…,zp)T Step 3: Now, one random vector yp*1= Az + μ from the multivariate normal distribution Np(μ,∑) can be obtained. b)Now, the multivariate normal distribution is chosen as 4-variate normal distribution. The R code to generate a random vector from 4-variate normal distribution in SVD method is given below.
Assignment 1 R-code: m = c(2,3,5,1) #defining the means of quadravariate normal distribution s = cbind(c(2,-1.2,-1,1),c(-1.2,4,1.5,1),c(-1,1.5,1,-0.5),c(1,1,-0.5,4)) # defining the variance matrix S lam = eigen(s)$values #calculating eigen values of S U = eigen(s)$vectors #calculating eigen vectors of S A = U%*%diag(sqrt(lam)) # creating the matrix A z = rnorm(4) # generating 4 independent random numbers from standard normal distribution. A%*%z+m # generating one random vector from the quadravariate normal distribution Output: [,1] [1,] 2.9869710 [2,] 2.2991517 [3,] 5.2024936 [4,] -0.7154101 So, the generated random vector is (2.987, 2.299, 5.202, -0.715) c)Now, according to question 2 the 3-variate normal distribution is used to generate 100 random vector of size 3*1 from mean μ = (1,-2,2)^T and Variance = ∑ =(111 132 122). Here the target distribution is normal distribution and as multivariate normal distribution is used for random vector generation, hence each random vector must be a sample from a normal distribution.
Assignment 1 R code: m = c(1,-2,2) #defining the means of quadravariate normal distribution s = cbind(c(1,1,1),c(1,3,2),c(1,2,2)) # defining the variance matrix S lam = eigen(s)$values #calculating eigen values of S U = eigen(s)$vectors #calculating eigen vectors of S A = U%*%diag(sqrt(lam)) # creating the matrix A z = rnorm(3) # generating 3 independent random numbers from standard normal distribution. for (x in c(1:100)) print(A%*%z+m) Question 2: a)The linear transformation of the multivariate normal distribution x= (x1,x2,x3)^T with mean μ =(1 −2 2)and ∑ =(111 132 122)will also follow a multivariate normal distribution. Now, μ(x1,x2) =(1 −2), ∑(x1,x2) =(11 13) Now, here the linear combination is z= 2x1 – 3x2. Or, z = BX Where, B = (2 -3) So, the mean of z, E(z) = Bμ = (2 -3)(1 −2)= 2 + 6 = 8 Variance of z Var(z) =B∑(x1,x2)BT= (2 -3)(11 13)(2 −3)= (2 -3)(2−3 2−9) = (2 -3)(−1 −7)= -2 +21 = 19. So, the distribution of 2x1 -3x2 is a normal distribution with mean 8 and variance 19.
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Assignment 1 b)The joint distribution of y1 = x1 –x3 and y2 = x1 – 2x2 + 3x3 should follow the conditions i)f(y1,y2) >=0 ii)∫∫f(y1,y2)dy1∗dy2=1 The pdf of the joint distribution of y1 and y2 are be given by, P(y1,y2) =1 2πσ1σ2√1−p2∗e −z 2(1−p2) z =(y1−μ1)2 σ12–2p(y1−μ1)(y2−μ2) σ1σ2+(y2−μ2)2 σ22 p = corr(y1,y2) = cov(y1,y2)/σ1σ2 where,σ1=standarddeviationofy1 σ2 = standard deviation of y2. c)Now, if a sample size of 60 or 60 random numbers are selected from the mentioned 3- varaiate normal distribution and then arranged in vector form, then the distribution of x will be of 20-variate normal distribution with (20*1) mean and (20*20) variance- covariance matrix. This is because from 3-varaiate normal distribution at a time 3 random numbers can be generated and hence 60 numbers will be generated in 20 simulations. Question 3: a)The generalized distance in multivariate analysis is given by the following formula, d^2 = (x-xbar)^T S^(-1)(x-xbar) Where, xbar is the sample mean, S is the covariance matrix. So, for univariate condition the generalized distance will be d = sqrt((x1-x2)^2)
Assignment 1 The generalized distance for each data point is given in the following table. x1x2d 32.30.7 51.93.1 514 70.76.3 70.36.7 716 81.0 5 6.95 90.4 5 8.55 100.79.3 110.410.6 b)Now, individual histograms are constructed for each dataset x1 and x2 as given below. Frequency distribution of x1: class rang e of x1 Frequenc y 0 to 31 3 to 40 4 to2
Assignment 1 5 5 to 60 6 to 73 7 to 81 8 to 91 9 to 10 1 10 to 11 1 0 to 33 to 44 to 55 to 66 to 77 to 88 to 99 to 1010 to 11 0 0.5 1 1.5 2 2.5 3 3.5 Histogram of x1 Frequency class range of x1 Frequency Frequency distribution of x2: class range of x2 Frequenc y 0.31 0.62
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Assignment 1 0.92 1.23 1.50 1.80 2.11 2.41 0.30.60.91.21.51.82.12.4 0 0.5 1 1.5 2 2.5 3 3.5 Histogram of x2 Frequency class range of x2 Frequency So, from the above two histograms it is quite clear that the two variables x1 and x2 are approximately normally distributed. The normal distribution will be much more evident when the number of class intervals are increased or the class width becomes smaller. c)The normal q-q plot of the distance data from part (a) is created from the r code given below. R code fo q-q plot: d = c(0.7,3.1,4,6.3,6.7,6,6.95,8.55,9.3,10.6) qqnorm(d) qqline(d)
Assignment 1 Now, as most of the data points in q-q plot do not fall into the normal line hence the datasets x1 and x2 are not jointly normally distributed. d)The power transformation of data is transforming the data by some function and here the box-cox transformation is used to transform the variables into normally distributed variable is applied. Box-cox transformation of x1 R code: x1 = c(3,5,5,7,7,7,8,9,10,11) library(MASS) Box = boxcox(x1 ~ 1,# box-cox transformation to x1 lambda = seq(-6,6,0.1)# Trial of taking values -6 to 6 by 0.1
Assignment 1 ) Cox = data.frame(Box$x, Box$y)# Create a data frame with the results Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame Cox2[1,]# Display the lambda with maximum log likelihood lambda = Cox2[1, "Box.x"]# lambda extraction T_box = (x1 ^ lambda - 1)/lambda# Transformation of data T_box [1] 2 4 4 6 6 6 7 8 9 10 qqnorm(T_box) qqline(T_box) Q-Q plot of transformed x1:
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Assignment 1 Box-cox transformation of x2 R code: x2= c(2.3,1.9,1,0.7,0.3,1,1.05,0.45,0.7,0.4) library(MASS) Box = boxcox(x2 ~ 1,# box-cox transformation to x2 lambda = seq(-6,6,0.1)# Trial of taking values -6 to 6 by 0.1 ) Cox = data.frame(Box$x, Box$y)# Create a data frame with the results Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame Cox2[1,]# Display the lambda with maximum log likelihood lambda = Cox2[1, "Box.x"]# lambda extraction T_box = (x2 ^ lambda - 1)/lambda# Transformation of data T_box [1] 0.79916555 0.62168880 0.00000000 -0.36311210 -1.27944873 0.00000000 [7] 0.04867133 -0.83125420 -0.36311210 -0.95958226 qqnorm(T_box) qqline(T_box)
Assignment 1 Q-Q plot of transformed x2: So, from the above Q-Q plots it is visible that after power transformation (or specifically box- cox transformation) the transformed dataset x1 is marginally normally distributed as most of the points lie on the normal line. However, the transformed dataset of x2 is not marginally normally distributed as most of the points do not lie on the normal line of its Q-Q plot. Question 4: a)The results for chromium(x1) and strontium(x2) are pulled out from the “Peruhair.dat” and written in r file. Now, the 95% confidence interval from normal distribution assuming the population variance is equal to the sample variance and population means equal to the sample mean for the population mean vector μ = (μ1,μ2)^T is calculated by the following R code.
Assignment 1 R-code: > x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22) > x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08) > norm.interval = function(data, variance = var(data), conf.level = 0.95) + { + z = qnorm((1 - conf.level)/2, lower.tail = FALSE) + xbar = mean(data) + sdx = sqrt(variance/length(data)) + c(xbar - z * sdx, xbar + z * sdx) + } > norm.interval(x1,var(x1)) [1] 0.3650031 1.7838858 > norm.interval(x2,var(x2)) [1] 4.403621 15.345268 Hence, the 95% confidence interval for chromium(x1) data is [0.3650031, 1.7838858]. The 95% confidence interval for strontium(x2) data is [4.403621, 15.345268] b)Now, the 90% Bonferroni simultaneous confidence intervals for μ1 and μ2 can be calculated by the following R-code. R code: x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22) x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08) fit = lm(x1~x2) ci.sim <- function(model, newdata, type = c("B", "S"), alpha = 0.05) {
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Assignment 1 g <- nrow(newdata) CI <- predict(model, newdata, se.fit = TRUE) M <- ifelse(match.arg(type) == "B", qt(1 - alpha / (2*g), model$df), sqrt( g * qf( 1 - alpha, g, model$df))) spred <- sqrt(CI$residual.scale^2 + (CI$se.fit)^2 ) x <- data.frame( "x"= newdata, "spred" = spred, "fit"= CI$fit, "lower" = CI$fit - M * spred, "upper" = CI$fit + M * spred) return(x) } new <- data.frame(x = c(0,21)) ci.sim(fit, new, type = "S") c)Now, the R code for 90% T^2 simultaneous confidence intervals for μ1 and μ2 can be calculated as follows. R code: x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22) xbar = mean(x1) xbar xvar = var(x1)
Assignment 1 xvar # Apply the contrasts Cm<-matrix( c(-1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1), 3, 4, byrow=T) Cm Cxbar <- Cm%*%xbar Cxvar <- Cm%*%xvar%*%t(Cm) # Computation of Hotelling statistic p <- nrow(Cxvar) n <- nrow(x1) nullmean <- c(0, 0, 0) d <- Cxbar-nullmean t2<-n*t(d)%*%solve(Cxvar)%*%d; t2mod<- (n-p)*t2/(p*(n-1)) pval <- 1- pf(t2mod,p,n-p) cat("Hotelling T-squared statistic", fill=T) cat("p-value", fill=T) # simultaneous confidence intervals alpha<-0.05 wd<-sqrt(((n[1]+n[2]-2)*p/(n[1]+n[2]-p-1))*qf(1-alpha,p,n[1]+n[2]-p- 1))*sqrt(diag(Sp)*sum(1/n)) Cis<-cbind(d-wd,d+wd) cat("95% simultaneous confidence interval","\n") Cis Now, the similar code can calculate the T^2 confidence interval for the sample variable strontium(x2).
Assignment 1 d)The 95% confidence interval of strontium(x2) suggests that there is a 95% chance that the population mean lies between [4.403621, 15.345268]. Hence, there is only 5% chance that the population mean is 20 or it is highly unlikely that the population mean of strontium level is 20.0.