Multivariate Normal Distribution and Generalized Distance Analysis

Verified

Added on  2023/06/07

|16
|2867
|477
AI Summary
This article discusses the multivariate normal distribution and its generation using singular value decomposition. It also covers the generalized distance analysis and its application in univariate conditions. The article includes R code for generating random vectors, histograms, q-q plots, and power transformations.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Running head: Assignment 1
Assignment 1
Name of the student
Name of the University
Author Note
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Assignment 1
Question 1:
a) The pdf of the multivariate normal distribution is given by,
f x ( xp1 ) =ce
( ( xμ ) T

(1 )
( xμ )
2 }
Here, μ = E(Xp*1) and ∑ = var(Xp*1) > 0 and c is a constant which is given by
c= 1
( 2π )
p
2 ¿ ¿
Here, ¿ ¿ is the value of the determinant of .
Now, by the method of singular value decomposition = UD UT = U D (1
2 ) D (1
2 ) UT
Here, U = eigenvector matrix of and
D=diagonal of eigenvalues λ 1 , λ 2 , λ 3 . λp
Now, in order to generate the random vectors from multivariate normal distribution
Np(μ,∑) the following methods must be followed.
Step 1: At first the eigenvalues λ 1 , λ 2 , λ 3 , λp and the corresponding eigenvectors
Uj of and then D^(0.5) = diag( λ 10.5 , λ 20.5 , λ 30.5 , λ p0.5 ¿ is needed to be
calculated.
Then the matrix A = U D
1
2 ( or A = UD^(1/2)U^(T)) is formed.
Step 2: Now, the z values z1,z2…zp from N(0,1) is obtained. Now, Let z =
( z 1 , , zp )T
Step 3: Now, one random vector yp*1 = Az + μ from the multivariate normal
distribution Np(μ,∑) can be obtained.
b) Now, the multivariate normal distribution is chosen as 4-variate normal distribution.
The R code to generate a random vector from 4-variate normal distribution in SVD
method is given below.
Document Page
Assignment 1
R-code:
m = c(2,3,5,1) #defining the means of quadravariate normal distribution
s = cbind(c(2,-1.2,-1,1),c(-1.2,4,1.5,1),c(-1,1.5,1,-0.5),c(1,1,-0.5,4)) # defining the
variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(4) # generating 4 independent random numbers from standard normal
distribution.
A%*%z+m # generating one random vector from the quadravariate normal
distribution
Output:
[,1]
[1,] 2.9869710
[2,] 2.2991517
[3,] 5.2024936
[4,] -0.7154101
So, the generated random vector is (2.987, 2.299, 5.202, -0.715)
c) Now, according to question 2 the 3-variate normal distribution is used to generate 100
random vector of size 3*1 from mean μ = (1,-2,2)^T and
Variance = ∑ = ( 1 1 1
1 3 2
1 2 2 ). Here the target distribution is normal distribution and as
multivariate normal distribution is used for random vector generation, hence each
random vector must be a sample from a normal distribution.
Document Page
Assignment 1
R code:
m = c(1,-2,2) #defining the means of quadravariate normal distribution
s = cbind(c(1,1,1),c(1,3,2),c(1,2,2)) # defining the variance matrix S
lam = eigen(s)$values #calculating eigen values of S
U = eigen(s)$vectors #calculating eigen vectors of S
A = U%*%diag(sqrt(lam)) # creating the matrix A
z = rnorm(3) # generating 3 independent random numbers from standard normal
distribution.
for (x in c(1:100)) print(A%*%z+m)
Question 2:
a) The linear transformation of the multivariate normal distribution x= (x1,x2,x3)^T
with mean μ = ( 1
2
2 ) and ∑ = (1 1 1
1 3 2
1 2 2 ) will also follow a multivariate normal
distribution.
Now, μ(x1,x2) = ( 1
2 ), ∑(x1,x2) = ( 1 1
1 3 )
Now, here the linear combination is z= 2x1 – 3x2.
Or, z = BX
Where, B = (2 -3)
So, the mean of z, E(z) = Bμ = (2 -3) ( 1
2 ) = 2 + 6 = 8
Variance of z Var(z) = B ( x 1 , x 2 ) BT = (2 -3)( 1 1
1 3)( 2
3 ) = (2 -3) ( 23
29 )
= (2 -3) (1
7 ) = -2 +21 = 19.
So, the distribution of 2x1 -3x2 is a normal distribution with mean 8 and variance 19.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Assignment 1
b) The joint distribution of y1 = x1 –x3 and y2 = x1 – 2x2 + 3x3 should follow the
conditions
i) f(y1,y2) >=0
ii) f ( y 1 , y 2 ) dy 1dy 2=1
The pdf of the joint distribution of y1 and y2 are be given by,
P(y1,y2) = 1
2 πσ 1 σ 2 1 p2 e
z
2 (1 p2 )
z = ( y 1μ 1 )2
σ 12 2 p ( y 1μ 1 ) ( y 2μ 2 )
σ 1 σ 2 + ( y 2μ 2 )2
σ 22
p = corr(y1,y2) = cov(y1,y2)/σ 1 σ 2
where, σ 1=standard deviation of y 1
σ2 = standard deviation of y2.
c) Now, if a sample size of 60 or 60 random numbers are selected from the mentioned 3-
varaiate normal distribution and then arranged in vector form, then the distribution of
x will be of 20-variate normal distribution with (20*1) mean and (20*20) variance-
covariance matrix. This is because from 3-varaiate normal distribution at a time 3
random numbers can be generated and hence 60 numbers will be generated in 20
simulations.
Question 3:
a) The generalized distance in multivariate analysis is given by the following formula,
d^2 = (x-xbar)^T S^(-1)(x-xbar)
Where, xbar is the sample mean, S is the covariance matrix.
So, for univariate condition the generalized distance will be d = sqrt((x1-x2)^2)
Document Page
Assignment 1
The generalized distance for each data point is given in the following table.
x1 x2 d
3 2.3 0.7
5 1.9 3.1
5 1 4
7 0.7 6.3
7 0.3 6.7
7 1 6
8 1.0
5
6.95
9 0.4
5
8.55
10 0.7 9.3
11 0.4 10.6
b) Now, individual histograms are constructed for each dataset x1 and x2 as given
below.
Frequency distribution of x1:
class
rang
e of
x1
Frequenc
y
0 to 3 1
3 to 4 0
4 to 2
Document Page
Assignment 1
5
5 to 6 0
6 to 7 3
7 to 8 1
8 to 9 1
9 to
10
1
10 to
11
1
0 to
3 3 to
4 4 to
5 5 to
6 6 to
7 7 to
8 8 to
9 9 to
10 10
to
11
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x1
Frequency
class range of x1
Frequency
Frequency distribution of x2:
class
range
of x2
Frequenc
y
0.3 1
0.6 2
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Assignment 1
0.9 2
1.2 3
1.5 0
1.8 0
2.1 1
2.4 1
0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
0
0.5
1
1.5
2
2.5
3
3.5
Histogram of x2
Frequency
class range of x2
Frequency
So, from the above two histograms it is quite clear that the two variables x1 and x2 are
approximately normally distributed. The normal distribution will be much more evident when
the number of class intervals are increased or the class width becomes smaller.
c) The normal q-q plot of the distance data from part (a) is created from the r code given
below.
R code fo q-q plot:
d = c(0.7,3.1,4,6.3,6.7,6,6.95,8.55,9.3,10.6)
qqnorm(d)
qqline(d)
Document Page
Assignment 1
Now, as most of the data points in q-q plot do not fall into the normal line hence the
datasets x1 and x2 are not jointly normally distributed.
d) The power transformation of data is transforming the data by some function and here
the box-cox transformation is used to transform the variables into normally distributed
variable is applied.
Box-cox transformation of x1 R code:
x1 = c(3,5,5,7,7,7,8,9,10,11)
library(MASS)
Box = boxcox(x1 ~ 1, # box-cox transformation to x1
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
Document Page
Assignment 1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x1 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 2 4 4 6 6 6 7 8 9 10
qqnorm(T_box)
qqline(T_box)
Q-Q plot of transformed x1:
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Assignment 1
Box-cox transformation of x2 R code:
x2= c(2.3,1.9,1,0.7,0.3,1,1.05,0.45,0.7,0.4)
library(MASS)
Box = boxcox(x2 ~ 1, # box-cox transformation to x2
lambda = seq(-6,6,0.1) # Trial of taking values -6 to 6 by 0.1
)
Cox = data.frame(Box$x, Box$y) # Create a data frame with the results
Cox2 = Cox[with(Cox, order(-Cox$Box.y)),] # Ordering the new data frame
Cox2[1,] # Display the lambda with maximum log likelihood
lambda = Cox2[1, "Box.x"] # lambda extraction
T_box = (x2 ^ lambda - 1)/lambda # Transformation of data
T_box
[1] 0.79916555 0.62168880 0.00000000 -0.36311210 -1.27944873 0.00000000
[7] 0.04867133 -0.83125420 -0.36311210 -0.95958226
qqnorm(T_box)
qqline(T_box)
Document Page
Assignment 1
Q-Q plot of transformed x2:
So, from the above Q-Q plots it is visible that after power transformation (or specifically box-
cox transformation) the transformed dataset x1 is marginally normally distributed as most of
the points lie on the normal line. However, the transformed dataset of x2 is not marginally
normally distributed as most of the points do not lie on the normal line of its Q-Q plot.
Question 4:
a) The results for chromium(x1) and strontium(x2) are pulled out from the
“Peruhair.dat” and written in r file.
Now, the 95% confidence interval from normal distribution assuming the population
variance is equal to the sample variance and population means equal to the sample
mean for the population mean vector μ = (μ1,μ2)^T is calculated by the following R
code.
Document Page
Assignment 1
R-code:
> x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
> x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
> norm.interval = function(data, variance = var(data), conf.level = 0.95)
+ {
+ z = qnorm((1 - conf.level)/2, lower.tail = FALSE)
+ xbar = mean(data)
+ sdx = sqrt(variance/length(data))
+ c(xbar - z * sdx, xbar + z * sdx)
+ }
> norm.interval(x1,var(x1))
[1] 0.3650031 1.7838858
> norm.interval(x2,var(x2))
[1] 4.403621 15.345268
Hence, the 95% confidence interval for chromium(x1) data is [0.3650031,
1.7838858].
The 95% confidence interval for strontium(x2) data is [4.403621, 15.345268]
b) Now, the 90% Bonferroni simultaneous confidence intervals for μ1 and μ2 can be
calculated by the following R-code.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
x2 = c(12.57,18.12,11.13,20.03,20.09,0.78,4.64,0.43,1.08)
fit = lm(x1~x2)
ci.sim <- function(model, newdata, type = c("B", "S"), alpha = 0.05)
{
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Assignment 1
g <- nrow(newdata)
CI <- predict(model, newdata, se.fit = TRUE)
M <- ifelse(match.arg(type) == "B",
qt(1 - alpha / (2*g), model$df),
sqrt( g * qf( 1 - alpha, g, model$df)))
spred <- sqrt(CI$residual.scale^2 + (CI$se.fit)^2 )
x <- data.frame(
"x" = newdata,
"spred" = spred,
"fit" = CI$fit,
"lower" = CI$fit - M * spred,
"upper" = CI$fit + M * spred)
return(x)
}
new <- data.frame(x = c(0,21))
ci.sim(fit, new, type = "S")
c) Now, the R code for 90% T^2 simultaneous confidence intervals for μ1 and μ2 can be
calculated as follows.
R code:
x1 = c(0.48,3.53,2.19,0.55,0.74,0.66,0.93,0.37,0.22)
xbar = mean(x1)
xbar
xvar = var(x1)
Document Page
Assignment 1
xvar
# Apply the contrasts
Cm<-matrix( c(-1, -1, 1, 1, 1, -1, 1, -1, 1, -1, -1, 1), 3, 4, byrow=T)
Cm
Cxbar <- Cm%*%xbar
Cxvar <- Cm%*%xvar%*%t(Cm)
# Computation of Hotelling statistic
p <- nrow(Cxvar)
n <- nrow(x1)
nullmean <- c(0, 0, 0)
d <- Cxbar-nullmean
t2<-n*t(d)%*%solve(Cxvar)%*%d;
t2mod<- (n-p)*t2/(p*(n-1))
pval <- 1- pf(t2mod,p,n-p)
cat("Hotelling T-squared statistic", fill=T)
cat("p-value", fill=T)
# simultaneous confidence intervals
alpha<-0.05
wd<-sqrt(((n[1]+n[2]-2)*p/(n[1]+n[2]-p-1))*qf(1-alpha,p,n[1]+n[2]-p-
1))*sqrt(diag(Sp)*sum(1/n))
Cis<-cbind(d-wd,d+wd)
cat("95% simultaneous confidence interval","\n")
Cis
Now, the similar code can calculate the T^2 confidence interval for the sample
variable strontium(x2).
Document Page
Assignment 1
d) The 95% confidence interval of strontium(x2) suggests that there is a 95% chance that
the population mean lies between [4.403621, 15.345268]. Hence, there is only 5%
chance that the population mean is 20 or it is highly unlikely that the population mean
of strontium level is 20.0.
chevron_up_icon
1 out of 16
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]