Multivariate Normal Distribution and Generalized Distance Analysis

Verified

Added on 2023/06/07

AI Summary

This article discusses the multivariate normal distribution and its generation using singular value decomposition. It also covers the generalized distance analysis and its application in univariate conditions. The article includes R code for generating random vectors, histograms, q-q plots, and power transformations.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.

Running head: Assignment 1
Assignment 1
Name of the student
Name of the University
Author Note

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Assignment 1
Question 1:
a) The pdf of the multivariate normal distribution is given by,
f x ( xp∗1 ) =c∗e
( − ( x−μ ) T
∑
(−1 )
( x−μ )
2 }
Here, μ = E(Xp*1) and ∑ = var(Xp*1) > 0 and c is a constant which is given by
c= 1
( 2∗π )
p
2 ¿ ¿
Here, ¿ ∑∨¿ is the value of the determinant of ∑.
Now, by the method of singular value decomposition ∑ = UD UT = U D (1
2 ) D (1
2 ) UT
Here, U = eigenvector matrix of and
D=diagonal of eigenvalues λ 1 , λ 2 , λ 3 … . λp
Now, in order to generate the random vectors from multivariate n∑ormal distribution
Np(μ,∑) the following methods must be followed.
Step 1: At first the eigenvalues λ 1 , λ 2 , λ 3 , … λp and the corresponding eigenvectors
Uj of ∑ and then D^(0.5) = diag( λ 10.5 , λ 20.5 , λ 30.5 , … λ p0.5 ¿ is needed to be
calculated.
Then the matrix A = U D
1
2 ( or A = UD^(1/2)U^(T)) is formed.
Step 2: Now, the z values z1,z2…zp from N(0,1) is obtained. Now, Let z =
( z 1 , … , zp )T
Step 3: Now, one random vector yp*1 = Az + μ from the multivariate normal
distribution Np(μ,∑) can be obtained.
b) Now, the multivariate normal distribution is chosen as 4-variate normal distribution.
The R code to generate a random vector from 4-variate normal distribution in SVD
method is given below.

Assignment 1
R-code:
m = c(2,3,5,1) #defining the means of quadravariate normal distribution
s = cbind(c(2,-1.2,-1,1),c(-1.2,4,1.5,1),c(-1,1.5,1,-0.5),c(1,1,-0.5,4)) # defining the
variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(4) # generating 4 independent random numbers from standard normal
distribution.
A%*%z+m # generating one random vector from the quadravariate normal
distribution
Output:
[,1]
[1,] 2.9869710
[2,] 2.2991517
[3,] 5.2024936
[4,] -0.7154101
So, the generated random vector is (2.987, 2.299, 5.202, -0.715)
c) Now, according to question 2 the 3-variate normal distribution is used to generate 100
random vector of size 3*1 from mean μ = (1,-2,2)^T and
Variance = ∑ = ( 1 1 1
1 3 2
1 2 2 ). Here the target distribution is normal distribution and as
multivariate normal distribution is used for random vector generation, hence each
random vector must be a sample from a normal distribution.

Assignment 1
R code:
m = c(1,-2,2) #defining the means of quadravariate normal distribution
s = cbind(c(1,1,1),c(1,3,2),c(1,2,2)) # defining the variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(3) # generating 3 independent random numbers from standard normal
distribution.
for (x in c(1:100)) print(A%*%z+m)
Question 2:
a) The linear transformation of the multivariate normal distribution x= (x1,x2,x3)^T
with mean μ = ( 1
−2
2 ) and ∑ = (1 1 1
1 3 2
1 2 2 ) will also follow a multivariate normal
distribution.
Now, μ(x1,x2) = ( 1
−2 ), ∑(x1,x2) = ( 1 1
1 3 )
Now, here the linear combination is z= 2x1 – 3x2.
Or, z = BX
Where, B = (2 -3)
So, the mean of z, E(z) = Bμ = (2 -3) ( 1
−2 ) = 2 + 6 = 8
Variance of z Var(z) = B ∑ ( x 1 , x 2 ) BT = (2 -3)( 1 1
1 3)( 2
−3 ) = (2 -3) ( 2−3
2−9 )
= (2 -3) (−1
−7 ) = -2 +21 = 19.
So, the distribution of 2x1 -3x2 is a normal distribution with mean 8 and variance 19.

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Assignment 1
b) The joint distribution of y1 = x1 –x3 and y2 = x1 – 2x2 + 3x3 should follow the
conditions
i) f(y1,y2) >=0
ii) ∫∫ f ( y 1 , y 2 ) dy 1∗dy 2=1
The pdf of the joint distribution of y1 and y2 are be given by,
P(y1,y2) = 1
2 πσ 1 σ 2 √1− p2 ∗e
−z
2 (1− p2 )
z = ( y 1−μ 1 )2
σ 12 – 2 p ( y 1−μ 1 ) ( y 2−μ 2 )
σ 1 σ 2 + ( y 2−μ 2 )2
σ 22
p = corr(y1,y2) = cov(y1,y2)/σ 1 σ 2
where, σ 1=standard deviation of y 1
σ2 = standard deviation of y2.
c) Now, if a sample size of 60 or 60 random numbers are selected from the mentioned 3-
varaiate normal distribution and then arranged in vector form, then the distribution of
x will be of 20-variate normal distribution with (20*1) mean and (20*20) variance-
covariance matrix. This is because from 3-varaiate normal distribution at a time 3
random numbers can be generated and hence 60 numbers will be generated in 20
simulations.
Question 3:
a) The generalized distance in multivariate analysis is given by the following formula,
d^2 = (x-xbar)^T S^(-1)(x-xbar)
Where, xbar is the sample mean, S is the covariance matrix.
So, for univariate condition the generalized distance will be d = sqrt((x1-x2)^2)

Assignment 1
The generalized distance for each data point is given in the following table.
x1 x2 d
3 2.3 0.7
5 1.9 3.1
5 1 4
7 0.7 6.3
7 0.3 6.7
7 1 6
8 1.0
5
6.95
9 0.4
5
8.55
10 0.7 9.3
11 0.4 10.6
b) Now, individual histograms are constructed for each dataset x1 and x2 as given
below.
Frequency distribution of x1:
class
rang
e of
x1
Frequenc
y
0 to 3 1
3 to 4 0
4 to 2

Assignment 1
5
5 to 6 0
6 to 7 3
7 to 8 1
8 to 9 1
9 to
10
1
10 to
11
1
0 to
3 3 to
4 4 to
5 5 to
6 6 to
7 7 to
8 8 to
9 9 to
10 10
to
11
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x1
Frequency
class range of x1
Frequency
Frequency distribution of x2:
class
range
of x2
Frequenc
y
0.3 1
0.6 2

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Assignment 1
0.9 2
1.2 3
1.5 0
1.8 0
2.1 1
2.4 1
0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x2
Frequency
class range of x2
Frequency
So, from the above two histograms it is quite clear that the two variables x1 and x2 are
approximately normally distributed. The normal distribution will be much more evident when
the number of class intervals are increased or the class width becomes smaller.
c) The normal q-q plot of the distance data from part (a) is created from the r code given
below.
R code fo q-q plot:
d = c(0.7,3.1,4,6.3,6.7,6,6.95,8.55,9.3,10.6)
qqnorm(d)
qqline(d)

Assignment 1
Now, as most of the data points in q-q plot do not fall into the normal line hence the
datasets x1 and x2 are not jointly normally distributed.
d) The power transformation of data is transforming the data by some function and here
the box-cox transformation is used to transform the variables into normally distributed
variable is applied.
Box-cox transformation of x1 R code:
x1 = c(3,5,5,7,7,7,8,9,10,11)
library(MASS)
Box = boxcox(x1 ~ 1, # box-cox transformation to x1
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1

Assignment 1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x1 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 2 4 4 6 6 6 7 8 9 10
qqnorm(T_box)
qqline(T_box)
Q-Q plot of transformed x1:

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.

Assignment 1
Box-cox transformation of x2 R code:
x2= c(2.3,1.9,1,0.7,0.3,1,1.05,0.45,0.7,0.4)
library(MASS)
Box = boxcox(x2 ~ 1, # box-cox transformation to x2
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x2 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 0.79916555 0.62168880 0.00000000 -0.36311210 -1.27944873 0.00000000
[7] 0.04867133 -0.83125420 -0.36311210 -0.95958226
qqnorm(T_box)
qqline(T_box)

Assignment 1
Q-Q plot of transformed x2:
So, from the above Q-Q plots it is visible that after power transformation (or specifically box-
cox transformation) the transformed dataset x1 is marginally normally distributed as most of
the points lie on the normal line. However, the transformed dataset of x2 is not marginally
normally distributed as most of the points do not lie on the normal line of its Q-Q plot.
Question 4:
a) The results for chromium(x1) and strontium(x2) are pulled out from the
“Peruhair.dat” and written in r file.
Now, the 95% confidence interval from normal distribution assuming the population
variance is equal to the sample variance and population means equal to the sample
mean for the population mean vector μ = (μ1,μ2)^T is calculated by the following R
code.

Assignment 1
R-code:
> x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
> x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
> norm.interval = function(data, variance = var(data), conf.level = 0.95)
+ {
+ z = qnorm((1 - conf.level)/2, lower.tail = FALSE)
+ xbar = mean(data)
+ sdx = sqrt(variance/length(data))
+ c(xbar - z * sdx, xbar + z * sdx)
+ }
> norm.interval(x1,var(x1))
[1] 0.3650031 1.7838858
> norm.interval(x2,var(x2))
[1] 4.403621 15.345268
Hence, the 95% confidence interval for chromium(x1) data is [0.3650031,
1.7838858].
The 95% confidence interval for strontium(x2) data is [4.403621, 15.345268]
b) Now, the 90% Bonferroni simultaneous confidence intervals for μ1 and μ2 can be
calculated by the following R-code.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
fit = lm(x1~x2)
ci.sim <- function(model, newdata, type = c("B", "S"), alpha = 0.05)
{

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Assignment 1
g <- nrow(newdata)
CI <- predict(model, newdata, se.fit = TRUE)
M <- ifelse(match.arg(type) == "B",
qt(1 - alpha / (2*g), model$df),
sqrt( g * qf( 1 - alpha, g, model$df)))
spred <- sqrt(CI$residual.scale^2 + (CI$se.fit)^2 )
x <- data.frame(
"x" = newdata,
"spred" = spred,
"fit" = CI$fit,
"lower" = CI$fit - M * spred,
"upper" = CI$fit + M * spred)
return(x)
}
new <- data.frame(x = c(0,21))
ci.sim(fit, new, type = "S")
c) Now, the R code for 90% T^2 simultaneous confidence intervals for μ1 and μ2 can be
calculated as follows.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
xbar = mean(x1)
xbar
xvar = var(x1)

Assignment 1
xvar
# Apply the contrasts
Cm<-matrix( c(-1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1), 3, 4, byrow=T)
Cm
Cxbar <- Cm%*%xbar
Cxvar <- Cm%*%xvar%*%t(Cm)
# Computation of Hotelling statistic
p <- nrow(Cxvar)
n <- nrow(x1)
nullmean <- c(0, 0, 0)
d <- Cxbar-nullmean
t2<-n*t(d)%*%solve(Cxvar)%*%d;
t2mod<- (n-p)*t2/(p*(n-1))
pval <- 1- pf(t2mod,p,n-p)
cat("Hotelling T-squared statistic", fill=T)
cat("p-value", fill=T)
# simultaneous confidence intervals
alpha<-0.05
wd<-sqrt(((n[1]+n[2]-2)*p/(n[1]+n[2]-p-1))*qf(1-alpha,p,n[1]+n[2]-p-
1))*sqrt(diag(Sp)*sum(1/n))
Cis<-cbind(d-wd,d+wd)
cat("95% simultaneous confidence interval","\n")
Cis
Now, the similar code can calculate the T^2 confidence interval for the sample
variable strontium(x2).

Assignment 1
d) The 95% confidence interval of strontium(x2) suggests that there is a 95% chance that
the population mean lies between [4.403621, 15.345268]. Hence, there is only 5%
chance that the population mean is 20 or it is highly unlikely that the population mean
of strontium level is 20.0.

1 out of 16

Your All-in-One AI-Powered Toolkit for Academic Success.

+13062052269

info@desklib.com

Available 24*7 on WhatsApp / Email

Company

Tools

Support